php中使用基于libcurl的curl函数,可以对目标url发起http请求并获取返回的响应内容。通常的请求方式类似如下的代码:
public function callFunction($url, $postData, $method, header=‘‘){ ???$maxRetryTimes = 3; ???$curl = curl_init(); ???/******初始化请求参数start******/ ???if(strtoupper($method) !== ‘GET‘ && $postData){ ???????curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($postData)); ???}elseif (strtoupper($method) === ‘GET‘ && $postData){ ???????$url .= ‘?‘. http_build_query($postData); ???} ???/******初始化请求参数end******/ ???curl_setopt_array($curl, array( ???????CURLOPT_URL => $url, ???????CURLOPT_TIMEOUT => 10, ???????CURLOPT_NOBODY => 0, ???????CURLOPT_RETURNTRANSFER => 1 ???)); ???if(method == ‘POST‘){ ???????curl_setopt($curl, CURLOPT_POST, true); ???} ???if(false == empty()){ ???????curl_setopt($curl, CURLOPT_HTTPHEADER, $header); ???} ???$response = false; ???while(($response === false) && (--$maxRetryTimes > 0)){ ???????$response = trim(curl_exec($curl)); ???} ???return $response;}
上面代码中的这个$response是curl发起的这次http请求从$url获取到的数据,如果没有在$header中通过range来指定要下载的大小,无论这个资源多大,那么都要请求完整的并返回的是这个URI的完整内容。通常只用curl来请求求一些接口或者远程调用一个函数获取数据,,所以这个场景下CURLOPT_TIMEOUT这个参数很重要。
对于curl的使用场景不止访问数据接口,还要对任意的url资源进行检测是否能提供正确的http服务。当用户填入的url是一个资源文件时,例如一个pdf或者ppt之类的,这时候如果网络状况较差的情况下用curl请求较大的资源,将不可避免的出现超时或者耗费更多的网络资源。之前的策略是完全下载(curl会下载存储在内存中),请求完后检查内容大小,当超过目标值就把这个监控的任务暂停。这样事发后限制其实治标不治本,终于客户提出了新的需求,不能停止任务只下载指定大小的文件并返回md5值由客户去校验正确性。
经过了一些尝试,解决了这个问题,记录过程如下文。
1、尝试使用 CURLOPT_MAXFILESIZE。
对php和libcurl的版本有版本要求,完全的事前处理,当发现目标大于设置时,直接返回了超过大小限制的错误而不去下载目标了,不符合要求。
2、使用curl下载过程的回调函数。
参考http://php.net/manual/en/function.curl-setopt-array.php,最终使用了CURLOPT_WRITEFUNCTION参数设置了on_curl_write,该函数将会1s中被回调1次。
$ch = curl_init();
$options = array(CURLOPT_URL => ‘http://www.php.net/‘,
CURLOPT_HEADER => false,
CURLOPT_HEADERFUNCTION => ‘on_curl_header‘,
CURLOPT_WRITEFUNCTION => ‘on_curl_write‘
);
最终我的实现片段:
function on_curl_write($ch, $data){ ???$pid = getmypid(); ???$downloadSizeRecorder = DownloadSizeRecorder::getInstance($pid); ???$bytes = strlen($data); ???$downloadSizeRecorder->downloadData .= $data; ???$downloadSizeRecorder->downloadedFileSize += $bytes;// ???error_log(‘ on_curl_write ‘.$downloadSizeRecorder->downloadedFileSize." > {$downloadSizeRecorder->maxSize} \n", 3, ‘/tmp/hyb.log‘); ???//确保已经下载的内容略大于最大限制 ???if (($downloadSizeRecorder->downloadedFileSize - $bytes) > $downloadSizeRecorder->maxSize) { ???????return false; ???} ???return $bytes; ?//这个不正确的返回,将会报错,中断下载 "errno":23,"errmsg":"Failed writing body (0 != 16384)"}
DownloadSizeRecorder是一个单例模式的类,curl下载时记录大小,实现返回下载内容的md5等。
class DownloadSizeRecorder{ ???const ERROR_FAILED_WRITING = 23; //Failed writing body ???public $downloadedFileSize; ???public $maxSize; ???public $pid; ???public $hasOverMaxSize; ???public $fileFullName; ???public $downloadData; ???private static $selfInstanceList = array(); ???public static function getInstance($pid) ???{ ???????if(!isset(self::$selfInstanceList[$pid])){ ???????????self::$selfInstanceList[$pid] = new self($pid); ???????} ???????return self::$selfInstanceList[$pid]; ???} ???private function __construct($pid) ???{ ???????$this->pid = $pid; ???????$this->downloadedFileSize = 0; ???????$this->fileFullName = ‘‘; ???????$this->hasOverMaxSize = false; ???????$this->downloadData = ‘‘; ???} ???/** ????* 保存文件 ????*/ ???public function saveMaxSizeData2File(){ ???????if(empty($resp_data)){ ???????????$resp_data = $this->downloadData; ???????} ???????$fileFullName = ‘/tmp/http_‘.$this->pid.‘_‘.time()."_{$this->maxSize}.download"; ???????if($resp_data && strlen($resp_data)>0) ???????{ ???????????list($headerOnly, $bodyOnly) = explode("\r\n\r\n", $resp_data, 2); ???????????$saveDataLenth = ($this->downloadedFileSize < $this->maxSize) ? $this->downloadedFileSize : $this->maxSize; ???????????$needSaveData = substr($bodyOnly, 0, $saveDataLenth); ???????????if(empty($needSaveData)){ ???????????????return; ???????????} ???????????file_put_contents($fileFullName, $needSaveData); ???????????if(file_exists($fileFullName)){ ???????????????$this->fileFullName = $fileFullName; ???????????} ???????} ???} ???/** ????* 返回文件的md5 ????* @return string ????*/ ???public function returnFileMd5(){ ???????$md5 = ‘‘; ???????if(file_exists($this->fileFullName)){ ???????????$md5 = md5_file($this->fileFullName); ???????} ???????return $md5; ???} ???/** ????* 返回已下载的size ????* @return int ????*/ ???public function returnSize(){ ???????return ($this->downloadedFileSize < $this->maxSize) ? $this->downloadedFileSize : $this->maxSize; ???} ???/** ????* 删除下载的文件 ????*/ ???public function deleteFile(){ ???????if(file_exists($this->fileFullName)){ ???????????unlink($this->fileFullName); ???????} ???}}
curl请求的代码实例中,实现限制下载大小
……curl_setopt($ch, CURLOPT_WRITEFUNCTION, ‘on_curl_write‘);//设置回调函数……$pid = getmypid();$downloadSizeRecorder = DownloadSizeRecorder::getInstance($pid);$downloadSizeRecorder->maxSize = $size_limit;……//发起curl请求$response = curl_exec($ch);……//保存文件,返回md5$downloadSizeRecorder->saveMaxSizeData2File(); ?//保存$downloadFileMd5 = $downloadSizeRecorder->returnFileMd5();$downloadedfile_size = $downloadSizeRecorder->returnSize();$downloadSizeRecorder->deleteFile();
到这里,踩了一个坑。增加了on_curl_write后,$response会返回true,导致后面取返回内容的时候异常。好在已经实时限制了下载的大小,用downloadData来记录了已经下载的内容,直接可以使用。
if($response === true){
$response = $downloadSizeRecorder->downloadData;
}
php使用curl下载指定大小的文件
原文地址:http://www.cnblogs.com/newbalanceteam/p/7662437.html