分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > IT知识

web crawling(plus10)scrapy 4

发布时间:2023-09-06 01:16责任编辑:傅花花关键词:暂无标签


E:\m\f1>cd ..\

E:\m>scrapy startproject qsauto
New Scrapy project ‘qsauto‘, using template directory ‘d:\\users\\administrator\\appdata\\local\\programs\\python\\python36-32\\lib\\site-packages\\scrapy\\templates\\project‘, created in:
???E:\m\qsauto

You can start your first spider with:
???cd qsauto
???scrapy genspider example example.com

E:\m>cd qsauto/

E:\m\qsauto>scrapy genspider -l
Available templates:
?basic
?crawl
?csvfeed
?xmlfeed

E:\m\qsauto>scrapy genspider -t crawl weisuen qiushibaike.com

weisuen.py:

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request
from qsauto.items import QsautoItem


class WeisuenSpider(CrawlSpider):
???name = ‘weisuen‘
???allowed_domains = [‘qiushibaike.com‘]
???‘‘‘
???start_urls = [‘http://www.qiushibaike.com/‘]
???‘‘‘

???rules = (
???????Rule(LinkExtractor(allow=‘article‘), callback=‘parse_item‘, follow=True),
???)

???def start_requests(self):
???????ua = {
???????????"User-Agent": ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0‘}
???????yield Request(‘http://www.qiushibaike.com/‘, headers=ua)
???def parse_item(self, response):
???????i = QsautoItem()
???????#i[‘domain_id‘] = response.xpath(‘//input[@id="sid"]/@value‘).extract()
???????#i[‘name‘] = response.xpath(‘//div[@id="name"]‘).extract()
???????#i[‘description‘] = response.xpath(‘//div[@id="description"]‘).extract()
???????i["content"]=response.xpath("//div[@class=‘content‘]/text()").extract()
???????i["link"]=response.xpath(‘//a[@class="contentHerf"]/@href‘).extract()
???????print(i["content"])
???????print(i["link"])
???????print("")
???????return i


web crawling(plus10)scrapy 4

原文地址:http://www.cnblogs.com/rabbittail/p/7637343.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved