分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 网页技术

urllib.request--urllib2

发布时间:2023-09-06 01:19责任编辑:郭大石关键词:url

1 基本使用:

================ ?urllib2 (py2)库的使用 ?
================ py3是urllib.request 其他使用一样
import urllib.requestheaders={ ??‘User-Agent‘:‘......‘}request=urllib2.Request(‘http://www.baidu.com‘,headers=headers) ?# request 对象response=urllib2.urlopen(request) ???????????????????????????????# response 对象html=response.read()response.read()response.getcode()response.geturl()response.info() ?# 的响应信息

2 增加请求头:

import urllib.requestimport randomurl=‘http://www.baidu.com‘UA_list=[‘User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10_7_0)AppleWebKit/535.11(KHTML,likeGecko)Chrome/17.0.963.56Safari/535.11‘, ????????‘User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_8;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50‘, ????????‘User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50‘, ????????‘User-Agent:Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0‘, ????????‘User-Agent:Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0)‘, ????????‘User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT6.0)‘, ????????‘User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv:2.0.1)Gecko/20100101Firefox/4.0.1‘, ????????‘User-Agent:Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11‘, ????????‘User-Agent:Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11‘]User_agent=random.choice(UA_list)request = urllib.request.Request(url)request.add_header(‘User-Agent‘,User_agent) ?# ?增加请求头 ??User-Agentprint(request.get_header(‘User-agent‘)) ????# 获取当前的请求头---User-agent

3 url编码:

import urllib.requestimport urllib.parseurl=‘https://www.baidu.com/s?‘keyword={‘kd‘:‘哈哈哈 ‘}#‘kd=嘿嘿嘿‘print(urllib.parse.urlencode(keyword)) ?# ?urllib.request.quote(string) ???????????????# ????????????????????????urllib.parse.urlencode(dic)

4.post请求--静态:

 ?????????????????????????????????页面url不变化

# 完整的请求头h={"User-Agent":" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",}post_dic={"i":keyword,"from":"AUTO","to":"AUTO","smartresult":"dict&client=fanyideskweb","doctype":"json","version":"2.1","keyfrom":"fanyi.web","ction":"FY_BY_REALTIME","typoResult":"true",}post_data=urllib.parse.urlencode(post_dic).encode(‘utf-8‘)request=urllib.request.Request(post_url,data=post_data,headers=h)response=urllib.request.urlopen(request)print(response.read().decode(‘utf-8‘))

5.post请求 -- 动态(ajax加载):

# ??爬虫----数据来源# ??AJAX 方式加载的页面 ??数据来源一定是 JSON

# ??拿到了json 就是 拿到了网页的数据

 ?Post数据
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
urlopen() 的data参数默认为None,当data参数不为空的时候,urlopen()提交方式为Post。
from urllib import parse,request url=r‘https://movie.douban.com/j/chart/top_list?type=11&interval_id=100%3A90&action=‘ headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"} # ------抓包工具获取----
post_data={# "type":"11",# "interval_id":"100:90",# "action":"","start":"0","limit":"20"}post_data=parse.urlencode(post_data).encode(‘utf-8‘) ??# ?post的数据必须是字节码my_request=request.Request(url,data=post_data,headers=headers)my_response=request.urlopen(my_request)print(my_response.read().decode(‘utf-8‘)) ??# 解码

urllib.request--urllib2

原文地址:http://www.cnblogs.com/big-handsome-guy/p/7710406.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved