分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > IT知识

学习笔记 urllib

发布时间:2023-09-06 01:54责任编辑:彭小芳关键词:url

第一步:

get

# -*- coding:utf-8 ?-*-# 日期:2018/5/15 19:39# Author:小鼠标from urllib import requesturl = ‘http://news.sina.com.cn/guide/‘response = request.urlopen(url) ?#返回http对象web_data = response.read().decode(‘utf-8‘) ?#响应内容web_status = response.status ???????????????#响应状态码print(web_status,web_data)

post

# -*- coding:utf-8 ?-*-# 日期:2018/5/15 19:39# Author:小鼠标from urllib import request,parseurl = ‘http://news.sina.com.cn/guide/‘#post表单提交的内容data = [ ???(‘name‘,‘xiaoshubiao‘), ???(‘pwd‘,‘xiaoshubiao‘)]login_data = parse.urlencode(data).encode(‘utf-8‘)response = request.urlopen(url,data = login_data) ?#返回http对象web_data = response.read().decode(‘utf-8‘) ?#响应内容web_status = response.status ???????????????#响应状态码print(web_status,web_data)

第二步:伪装浏览器

# -*- coding:utf-8 ?-*-# 日期:2018/5/15 19:39# Author:小鼠标from urllib import request,parseurl = ‘http://news.sina.com.cn/guide/‘req = request.Request(url) req.add_header(‘User-Agent‘,‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36‘)req.add_header(‘Accept‘,‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8‘)response = request.urlopen(req)web_data = response.read().decode(‘utf-8‘) ?#响应内容web_status = response.status ???????????????#响应状态码print(web_status,web_data)

第三步:使用代理ip

# -*- coding:utf-8 ?-*-# 日期:2018/5/15 19:39# Author:小鼠标from urllib import request,parseurl = ‘http://news.sina.com.cn/guide/‘req = request.Request(url)#使用代理ipproxy = request.ProxyHandler({‘http‘:‘221.207.29.185:80‘})opener = request.build_opener(proxy, request.HTTPHandler)request.install_opener(opener)req.add_header(‘User-Agent‘,‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36‘)req.add_header(‘Accept‘,‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8‘)response = request.urlopen(req)web_data = response.read().decode(‘utf-8‘) ?#响应内容web_status = response.status ???????????????#响应状态码print(web_status,web_data)

第四步:内容解析

  可以使用封装好的BeautifulSoup,也可以使用re正则来匹配,原理都差不多。

学习笔记 urllib

原文地址:https://www.cnblogs.com/7749ha/p/9042861.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved