Linux curl命令参数详解
命令:curl 语法:# curl [option] [url] 常见参数:
2、保存访问的网页 2.2:可以使用curl的内置option:-o(小写)保存网页 2.3:可以使用curl的内置option:-O(大写)保存网页中的文件 3、测试网页返回值 4、指定proxy服务器以及其端口 5、cookie
注意:-c(小写)产生的cookie和-D里面的cookie是不一样的。
6、模仿浏览器 7、伪造referer(盗链) 8、下载文件 #使用内置option:-O(大写) 8.2:循环下载 8.3:下载重命名 8.4:分块下载 8.5:通过ftp下载文件 8.6:显示下载进度条 8.7:不会显示下载进度信息 9、断点续传 10、上传文件 11、显示抓取错误 其他参数(此处翻译为转载): |
一、Linux curl用法举例:
1. linux curl抓取网页:
抓取百度:
1 | curlhttp://www.baidu.com |
如发现乱码,可以使用iconv转码:
1 | curlhttp://iframe.ip138.com/ic.asp|iconv-fgb2312 |
iconv的用法请参阅:在Linux/Unix系统下用iconv命令处理文本文件中文乱码问题
2. Linux curl使用代理:
linux curl使用http代理抓取页面:
1 2 | curl-x111.95.243.36:80http://iframe.ip138.com/ic.asp|iconv -fgb2312 curl-x111.95.243.36:80-Uaiezu:passwordhttp://www.baidu.com |
使用socks代理抓取页面:
1 2 | curl--socks4202.113.65.229:443http://iframe.ip138.com/ic.asp|iconv -fgb2312 curl--socks5202.113.65.229:443http://iframe.ip138.com/ic.asp|iconv -fgb2312 |
代理服务器地址可以从爬虫代理上获取。
3. linux curl处理cookies
接收cookies:
1 | curl-c/tmp/cookieshttp://www.baidu.com #cookies保存到/tmp/cookies文件 |
发送cookies:
1 2 | curl-b"key1=val1;key2=val2;"http://www.baidu.com #发送cookies文本 curl-b/tmp/cookieshttp://www.baidu.com #从文件中读取cookies |
4. linux curl发送数据:
linux curlget方式提交数据:
1 | curl-G-d"name=value&name2=value2"http://www.baidu.com |
linux curlpost方式提交数据:
1 2 | curl-d"name=value&name2=value2"http://www.baidu.com #post数据 curl-da=b&c=d&txt@/tmp/txthttp://www.baidu.com#post文件 |
以表单的方式上传文件:
1 | curl-Ffile=@/tmp/me.txthttp://www.aiezu.com |
相当于设置form表单的method="POST"和enctype=‘multipart/form-data‘两个属性。
5. linux curl http header处理:
设置http请求头信息:
1 2 3 | curl-A"Mozilla/5.0 Firefox/21.0"http://www.baidu.com #设置http请求头User-Agent curl-e"http://pachong.org/"http://www.baidu.com #设置http请求头Referer curl-H"Connection:keep-alive \n User-Agent: Mozilla/5.0"http://www.aiezu.com |
设置http响应头处理:
1 2 | curl-Ihttp://www.aiezu.com #仅仅返回header curl-D/tmp/headerhttp://www.aiezu.com #将http header保存到/tmp/header文件 |
6. linux curl认证:
1 2 | curl-uaiezu:passwordhttp://www.aiezu.com #用户名密码认证 curl-Emycert.pemhttps://www.baidu.com #采用证书认证 |
6. 其他:
1 2 | curl-# http://www.baidu.com #以“#”号输出进度条 curl-o/tmp/aiezuhttp://www.baidu.com #保存http响应到/tmp/aiezu |
一、查看网页源码
直接在curl命令后加上网址,就可以看到网页源码。我们以网址www.sina.com为例(选择该网址,主要因为它的网页代码较短)
curl www.baidu.com
如果要把这个网页保存下来,可以使用-o参数,这就相当于使用wget命令了。
curl -o [文件名] www.baidu.com
二、自动跳转有的网址是自动跳转的。使用-L参数,curl就会跳转到新的网址。
curl -Lhttp://item.taobao.com/item.htm?id=25823396605 键入上面的命令,结果就自动跳转为http://detail.tmall.com/item.htm?id=25823396605
三、显示头信息-i 参数可以显示http response的头信息,连同网页代码一起。
(-I 参数则是只显示http response的头信息。)
curl -i www.baidu.com
HTTP/1.1200OKDate:Fri,28Feb201405:39:57GMTContent-Type:text/htmlTransfer-Encoding:chunkedConnection:Keep-AliveVary:Accept-EncodingSet-Cookie:BAIDUID=0F251A658E427EBB7CBEB0C3F4A70FAE:FG=1;expires=Thu,31-Dec-3723:55:55GMT;max-age=2147483647;path=/;domain=.baidu.comSet-Cookie:BDSVRTM=0;path=/Set-Cookie:H_PS_PSSID=4104_5231_1445_5139_5225_5378_5368_4261_4760_5400;path=/;domain=.baidu.comP3P:CP="OTIDSPCORIVAOURINDCOM"Expires:Fri,28Feb201405:39:45GMTCache-Control:privateServer:BWS/1.1BDPAGETYPE:1BDQID:0xc3b306dca955703dBDUSERID:0<!DOCTYPEhtml>.....
四、显示通信过程-v 参数可以显示一次http通信的整个过程,包括端口连接和http request头信息。
命令:curl -vwww.baidu.com
* About to connect() to www.baidu.com port 80
* Trying 115.239.210.26... connected
* Connected to www.baidu.com (115.239.210.26) port 80
> GET / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: www.baidu.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 28 Feb 2014 05:42:37 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: Keep-Alive
< Vary: Accept-Encoding
< Set-Cookie: BAIDUID=442AD49501EF253AE71F2BAF3E0181FB:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
< Set-Cookie: BDSVRTM=0; path=/
< Set-Cookie: H_PS_PSSID=5228_1461_5187_5138_5225_5379_5368_4261_4760_5401_5286; path=/; domain=.baidu.com
< P3P: CP=" OTI DSP COR IVA OUR IND COM "
< Expires: Fri, 28 Feb 2014 05:41:43 GMT
< Cache-Control: private
< Server: BWS/1.1
< BDPAGETYPE: 1
< BDQID: 0x906950d16fb1e95d
< BDUSERID: 0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Spee<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv
="X-UA-Compatible" content="IE=Edge"><link rel="dns-
如果你觉得上面的信息还不够,那么下面的命令可以查看更详细的通信过程:
curl --trace output.txt www.baidu.com或者
curl --trace-ascii output.txt www.baidu.com运行后,请打开output.txt文件查看。
当然也有更狠的,查看页面跳转过程:
curl -L -I --trace loghttp://item.taobao.com/item.htm?id=25823396605
HTTP/1.1 301 Moved Permanently
Server: Tengine
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html;charset=GBK
Content-Length: 0
Connection: close
P3P: CP=‘CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR‘
Content-Language: zh-CN
Accept-Ranges: bytes
X-Varnish: 1206961665
Via: 1.1 varnish
location: http://detail.tmall.com/item.htm?id=25823396605&
X-Cache: MISS
Cache-Control: max-age=3
HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d25823396605%26%26tbpm%3d1
Cache-Control:
HTTP/1.1 302 Found
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=ktbzEwzFR6qy;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=6c6bc65b9e9a5159cff5b3d0cae4dfd9;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=d768c73859b40e10ef81f7abd0824704;domain=.taobao.com;Expires=Thu, 29-May-2014 06:16:01 GMT;Path=/
P3P: CP=‘CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR‘
Location: http://pass.tmall.com/add?_tb_token_=ktbzEwzFR6qy&cookie2=6c6bc65b9e9a5159cff5b3d0cae4dfd9&t=d768c73859b40e10ef81f7abd0824704&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d25823396605%26%26tbpm%3d1&pacc=A_12nKe89qHIcAyauBtovg==&opi=110.75.118.230&tmsc=1393568161483632
HTTP/1.1 302 Found
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
P3P: CP=‘CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR‘
Set-Cookie: _tb_token_=ktbzEwzFR6qy;domain=.tmall.com;Path=/
Set-Cookie: cookie2=6c6bc65b9e9a5159cff5b3d0cae4dfd9;domain=.tmall.com;Path=/
Set-Cookie: t=d768c73859b40e10ef81f7abd0824704;domain=.tmall.com;Path=/
Location: http://detail.tmall.com/item.htm?id=25823396605&&tbpm=1
HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d25823396605%26%26tbpm%3d2
Cache-Control:
HTTP/1.1 302 Found
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=GgU93fEjKGT4;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=ef3c440d74ff391de6b560da4ef8a5c9;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=187a71d8df58caac2c4e08d40245c31f;domain=.taobao.com;Expires=Thu, 29-May-2014 06:16:01 GMT;Path=/
P3P: CP=‘CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR‘
Location: http://pass.tmall.com/add?_tb_token_=GgU93fEjKGT4&cookie2=ef3c440d74ff391de6b560da4ef8a5c9&t=187a71d8df58caac2c4e08d40245c31f&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d25823396605%26%26tbpm%3d2&pacc=Vo8f-VlYEYPJ6WE3iTX96Q==&opi=110.75.118.230&tmsc=1393568161501736
HTTP/1.1 302 Found
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
P3P: CP=‘CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR‘
Set-Cookie: _tb_token_=GgU93fEjKGT4;domain=.tmall.com;Path=/
Set-Cookie: cookie2=ef3c440d74ff391de6b560da4ef8a5c9;domain=.tmall.com;Path=/
Set-Cookie: t=187a71d8df58caac2c4e08d40245c31f;domain=.tmall.com;Path=/
Location: http://detail.tmall.com/item.htm?id=25823396605&&tbpm=2
HTTP/1.1 302 Found
Server: Tengine
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: keep-alive
at_bucketid: sbucket_-1
X-Bucket-Id: -1
Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d25823396605%26%26tbpm%3d3
Cache-Control:
HTTP/1.1 302 Found
Date: Fri, 28 Feb 2014 06:16:01 GMT
Content-Type: text/html
Content-Length: 260
Connection: close
Set-Cookie: _tb_token_=Uta86PoG6cWC;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: cookie2=cd06dba05f2bf1124200861d0b8a151b;domain=.taobao.com;Path=/;HttpOnly
Set-Cookie: t=618388c77af