分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 教程案例

Scrapy框架中结合splash 解析js ——环境配置

发布时间:2023-09-06 01:14责任编辑:蔡小小关键词:js配置

环境配置:

http://splash.readthedocs.io/en/stable/install.html

pip install scrapy-splash
docker pull scrapinghub/splash
docker run -p 8050:8050 scrapinghub/splash
----

settings.py

#--SPLASH_URL = ‘http://localhost:8050‘#--DOWNLOADER_MIDDLEWARES = {‘scrapy_splash.SplashCookiesMiddleware‘: 723,‘scrapy_splash.SplashMiddleware‘: 725,‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware‘: 810,}#--SPIDER_MIDDLEWARES = {‘scrapy_splash.SplashDeduplicateArgsMiddleware‘: 100,}#--DUPEFILTER_CLASS = ‘scrapy_splash.SplashAwareDupeFilter‘#--HTTPCACHE_STORAGE = ‘scrapy_splash.SplashAwareFSCacheStorage‘import scrapyfrom scrapy_splash import SplashRequestclass MySpider(scrapy.Spider): ???start_urls = ["http://example.com", "http://example.com/foo"] ???def start_requests(self): ???????for url in self.start_urls: ???????????yield SplashRequest(url, self.parse, args={‘wait‘: 0.5}) ???def parse(self, response): ???????# response.body is a result of render.html call; it ???????# contains HTML processed by a browser. ???????# ... ??????
参考链接: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
      http://blog.csdn.net/qq_23849183/article/details/51287935
      http://ae.yyuap.com/pages/viewpage.action?pageId=919763

  

Scrapy框架中结合splash 解析js ——环境配置

原文地址:http://www.cnblogs.com/fh-fendou/p/7612119.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved