分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 运营维护

web crawling(plus7) scrapy1 commands)

发布时间:2023-09-06 01:15责任编辑:沈小雨关键词:暂无标签

Available commands:
?bench ????????Run quick benchmark test
?fetch ????????Fetch a URL using the Scrapy downloader
?genspider ????Generate new spider using pre-defined templates
?runspider ????Run a self-contained spider (without creating a project)
?settings ?????Get settings values
?shell ????????Interactive scraping console
?startproject ?Create new project
?version ??????Print Scrapy version
?view ?????????Open URL in browser, as seen by Scrapy

scrapy fetch [options] <url>

Fetch a URL using the Scrapy downloader and print its content to stdout. You
may want to use --nolog to disable logging

Options
=======
--help, -h ?????????????show this help message and exit
--spider=SPIDER ????????use this spider
--headers ??????????????print response HTTP headers instead of body
--no-redirect ??????????do not handle HTTP 3xx status codes and print response
???????????????????????as-is

Global Options
--------------
--logfile=FILE ?????????log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
???????????????????????log level (default: DEBUG)
--nolog ????????????????disable logging completely
--profile=FILE ?????????write python cProfile stats to FILE
--pidfile=FILE ?????????write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
???????????????????????set/override setting (may be repeated)
--pdb ??????????????????enable pdb on failure

runspider

scrapy shell url  --nolog

In[1]:

scrapy startproject project_name

scrapy version

scrapy view(download a website & view it with browser)

eg: scrapy view url

project commands:


Available commands:
?bench ????????Run quick benchmark test
?check ????????Check spider contracts
?crawl ????????Run a spider
?edit ?????????Edit spider
?fetch ????????Fetch a URL using the Scrapy downloader
?genspider ????Generate new spider using pre-defined templates
?list ?????????List available spiders
?parse ????????Parse URL (using its spider) and print the results
?runspider ????Run a self-contained spider (without creating a project)
?settings ?????Get settings values
?shell ????????Interactive scraping console
?startproject ?Create new project
?version ??????Print Scrapy version
?view ?????????Open URL in browser, as seen by Scrapy


E:\m\f1>scrapy genspider -l
Available templates:
?basic
?crawl
?csvfeed
?xmlfeed


E:\m\f1>scrapy genspider -t basic spider baidu.com
Created spider ‘spider‘ using template ‘basic‘ in module:
?f1.spiders.spider

E:\m\f1>scrapy check spider

----------------------------------------------------------------------
Ran 0 contracts in 0.000s

OK

E:\m\f1>scrapy crawl spider

E:\m\f1>scrapy list
spider

E:\m\f1>scrapy edit spider(linux is fine)

E:\m\f1>scrapy parse www.baidu.com

web crawling(plus7) scrapy1 commands)

原文地址:http://www.cnblogs.com/rabbittail/p/7633241.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved