分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 网页技术

php trim源码分析

发布时间:2023-09-06 01:30责任编辑:傅花花关键词:暂无标签

本文同时发表于https://github.com/zhangyachen/zhangyachen.github.io/issues/9

核心代码如下:

/* {{{ php_trim() * mode 1 : trim left * mode 2 : trim right * mode 3 : trim left and right * what indicates which chars are to be trimmed. NULL->default (‘ \t\n\r\v\0‘) */PHPAPI char *php_trim(char *c, int len, char *what, int what_len, zval *return_value, int mode TSRMLS_DC){ ???register int i; ???int trimmed = 0; ???char mask[256]; ???if (what) { ???????php_charmask((unsigned char*)what, what_len, mask TSRMLS_CC); ???} else { ???????php_charmask((unsigned char*)" \n\r\t\v\0", 6, mask TSRMLS_CC); ???} ???????//从左开始 ???if (mode & 1) { ???????for (i = 0; i < len; i++) { ???????????if (mask[(unsigned char)c[i]]) { ???????//该位置有第二个参数对应的值 ???????????????trimmed++; ???????????} else { ???????????????break; ???????????} ???????} ???????len -= trimmed; ???????c += trimmed; ???} ???if (mode & 2) { ???????for (i = len - 1; i >= 0; i--) { ???????????if (mask[(unsigned char)c[i]]) { ???????????????len--; ???????????} else { ???????????????break; ???????????} ???????} ???} ???if (return_value) { ???????????????//把c指针现在指向的位置以后的len个字符返回 ???????RETVAL_STRINGL(c, len, 1); ???} else { ???????return estrndup(c, len); ???} ???return "";}

可以看出,在php_trim函数内部调用了php_charmask函数

/* {{{ php_charmask * Fills a 256-byte bytemask with input. You can specify a range like ‘a..z‘, * it needs to be incrementing. * Returns: FAILURE/SUCCESS whether the input was correct (i.e. no range errors) */static inline int php_charmask(unsigned char *input, int len, char *mask TSRMLS_DC){ ???unsigned char *end; ???unsigned char c; ???int result = SUCCESS; ???memset(mask, 0, 256); ?????//初始化一个长度为256的hash表 ???for (end = input+len; input < end; input++) { ???????c=*input; ???????if ((input+3 < end) && input[1] == ‘.‘ && input[2] == ‘.‘ ???????????????&& input[3] >= c) { ???????????memset(mask+c, 1, input[3] - c + 1); ???????????input+=3; ???????} else if ((input+1 < end) && input[0] == ‘.‘ && input[1] == ‘.‘) { ???????????/* Error, try to be as helpful as possible: ??????????????(a range ending/starting with ‘.‘ won‘t be captured here) */ ???????????if (end-len >= input) { /* there was no ‘left‘ char */ ???????????????php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid ‘..‘-range, no character to the left of ‘..‘"); ???????????????result = FAILURE; ???????????????continue; ???????????} ???????????if (input+2 >= end) { /* there is no ‘right‘ char */ ???????????????php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid ‘..‘-range, no character to the right of ‘..‘"); ???????????????result = FAILURE; ???????????????continue; ???????????} ???????????if (input[-1] > input[2]) { /* wrong order */ ???????????????php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid ‘..‘-range, ‘..‘-range needs to be incrementing"); ???????????????result = FAILURE; ???????????????continue; ???????????} ???????????/* FIXME: better error (a..b..c is the only left possibility?) */ ???????????php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid ‘..‘-range"); ???????????result = FAILURE; ???????????continue; ???????} else { ???????????????????????//对应的位置为1 ???????????mask[c]=1; ???????} ???} ???return result;}

可以看出trim函数的逻辑:
1 声明一个长度为256的hash表。
2 将character_mask中每个字节转化为ascii码,将hash表中ascii码对应key的value设置为1。
3 从头部遍历str中每个字节,若遍历到字节对应的ascii码在hash表中存在,则str长度位置减1;若不存在,就中断循环。
4 从尾部遍历str中每个字节,逻辑同3。

案例分析:
trim("广东省","省")导致乱码

首先获得"广东省"的十六进制表示

<?php// 文件utf-8编码// 获取字符串编码function get_hex_str($str) { ???$hex_str = ‘‘; ???for ($i = 0; $i < strlen($str); $i ++) { ???????$ord = ord($str[$i]);//转换为ascii码 ???????$hex_str .= sprintf("\x%02X", $ord);//以十六进制输出,2为指定的输出字段的宽度.如果位数小于2,则左端补0 ???} ???return $hex_str;}

$str = "广东省";
printf("str[%s] hex_str[%s]\n", \(str, get_hex_str(\)str));

\(str = trim(\)str, "省");
printf("str[%s] hex_str[%s]\n", \(str, get_hex_str(\)str));

输出:
str[广东省] hex_str[\xE5\xB9\xBF\xE4\xB8\x9C\xE7\x9C\x81]
str[广hex_str[\xE5\xB9\xBF\xE4\xB8]

utf-8编码下汉字对应三个字节,“东”的编码为e4 b8 9c,“省”的编码为e7 9c 81。
trim("广东省", "省"); 函数处理时不是以我们看到的中文字符为一个单位,而是以字节为单位。
相等于从e5 b9 bf e4 b8 9c e7 9c 81开头和结尾去掉包含在e7 9c 81的字节,这样“东”的第三个字节就会被切掉,就会有上述的输出了。

如果想将中文字符串中部分字符去掉,建议使用str_replace。

php trim源码分析

原文地址:http://www.cnblogs.com/zhangyachen/p/8032957.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved