分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > IT知识

js分析 汽_车_之_家 js生成css伪元素 hs_kw44_configUS::before

发布时间:2023-09-06 01:53责任编辑:熊小新关键词:js

0.参考

https://developer.mozilla.org/en-US/docs/Web/API/Window/getComputedStyle

0.1 Use with pseudo-elements

getComputedStyle can pull style info from pseudo-elements (for example, ::after::before::marker::line-marker—see spec here).

<style> h3::after { ??content: ‘ rocks!‘; }</style><h3>generated content</h3> <script> ?var h3 = document.querySelector(‘h3‘); ??var result = getComputedStyle(h3, ‘:after‘).content; ?console.log(‘the generated content is: ‘, result); // returns ‘ rocks!‘</script>

0.2 手动设置伪元素信息

  

0.3 理论上应该是可以直接通过 js 读取伪元素 content 的。。。然而这里读取不到 ::before 的 content

1.初步抓取

1.1不加载js VS 加载js

1.2 requests仅见一张table

In [1]: import requests ??...: from scrapy import Selector ??...: ??...: ??...: url = ‘https://car.autohome.com.cn/config/series/3170.html‘ ??...: s = requests.Session() ??...: s.verify = False ??...: s.headers = { ??...: ????‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36‘ ??...: } ??...: ??...: r = s.get(url) ??...: sel = Selector(text=r.text) ??...: ??...: sel.css(‘table‘) ??...:Out[1]: [<Selector xpath=‘descendant-or-self::table‘ data=‘<table class="t1 txtCen" width="100%">\n ‘>]
View Code

1.3 selenium可见所有table

In [3]: from scrapy import Selector ??...: from selenium import webdriver ??...: ??...: ??...: url = ‘https://car.autohome.com.cn/config/series/3170.html‘ ??...: dr = webdriver.Chrome() ??...: dr.get(url) ??...: sel = Selector(text=dr.page_source) ??...: sel.css(‘table‘) ??...:DevTools listening on ws://127.0.0.1:12968/devtools/browser/2e9adb31-7510-421f-ac13-835350af144eOut[3]:[<Selector xpath=‘descendant-or-self::table‘ data=‘<table class="t1 txtCen" width="100%">\n ‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table cellspacing="0" cellpadding="0" c‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_side" cellspacing="0" cel‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table cellspacing="0" cellpadding="0" c‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_0" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_1" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_2" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_3" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_4" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_5" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_100" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_101" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_102" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_103" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_104" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_105" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_106" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_107" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_108" cellspacing="0" cell‘>]
View Code

1.4 提取table信息

In [4]: for t in sel.css(‘table#tab_0‘): ??...: ????print(‘@‘*10) ??...: ????for row in t.xpath(‘.//tr‘): ??...: ????????print(‘-‘*100) ??...: ????????for col in row.xpath(‘.//th|.//td‘): ??...: ????????????print(‘‘.join(col.xpath(‘.//node()/text()‘).extract()), end=‘\t‘) ??...: ????????print() ??...:@@@@@@@@@@----------------------------------------------------------------------------------------------------基本参数----------------------------------------------------------------------------------------------------厂 ?????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????-----------------------------------------------------------------------------------------------------级别----------------------------------------------------------------------------------------------------能源类型 ???????汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油 ???汽油----------------------------------------------------------------------------------------------------上市 ???2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10----------------------------------------------------------------------------------------------------(kW) ???110 ????110 ????110 ????110 ????140 ????140 ????110 ????110 ????110 ????110 ????140 ????140----------------------------------------------------------------------------------------------------(N·m) ?250 ????250 ????250 ????250 ????320 ????320 ????250 ????250 ????250 ????250 ????320 ????320----------------------------------------------------------------------------------------------------发动机 ?1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 2.0T 190马力 L4 2.0T 190马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 2.0T 190马力 L4 2.0T 190马力 L4----------------------------------------------------------------------------------------------------变速箱 ?7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合 ??????7挡双离合----------------------------------------------------------------------------------------------------长*宽*高(mm) ???4312*1785*1426 ?4321*1785*1426 ?4312*1785*1426 ?4321*1785*1426 ?4312*1785*1426 ?4321*1785*1426 ?4457*1796*1417 ?4462*1796*1417 ?4457*1796*1417 ?4462*1796*1417 ?4457*1796*1417 ?4462*1796*1417----------------------------------------------------------------------------------------------------车身结构 ???????5门5座两厢车 ???5门5座两厢车 ???5门5座两厢车 ???5门5座两厢车 ???5门5座两厢车 ???5门5座两厢车 ???4门5座三厢车 ???4门5座三厢车 ???4门5座三厢车 ???4门5座三厢车 ???4门5座三厢车 ???4门5座三厢车----------------------------------------------------------------------------------------------------最高车速(km/h) ?215 ????215 ????215 ????215 ????230 ????230 ????215 ????215 ????215 ????215 ????230 ????230----------------------------------------------------------------------------------------------------官方0-100km/h加速(s) ???8.4 ????8.4 ????8.4 ????8.4 ????7.4 ????7.4 ????8.4 ????8.4 ????8.4 ????8.4 ????7.4 ????7.4----------------------------------------------------------------------------------------------------0-100km/h加速(s) ???????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????-----------------------------------------------------------------------------------------------------100-0km/h制动(m) ???????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????-----------------------------------------------------------------------------------------------------工信部(L/100km) 5.7 ????5.7 ????5.7 ????5.7 ????6.2 ????6.2 ????5.6 ????5.6 ????5.6 ????5.6 ????6.1 ????6.1----------------------------------------------------------------------------------------------------(L/100km) ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????- ??????-----------------------------------------------------------------------------------------------------整车 ???三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里 ???????三10公里
View Code

1.5 对比可见提取table信息不全

2.问题分析

2.1 开发者工具显示 ‘商‘ 为CSS伪元素的content属性

2.2 使用正常浏览器硬性刷新未见CSS文件存在有效对应信息

2.3 检查发现class name存在规律:hs_kw数字_configyk

.hs_kw22_configyk::before {
???content: "商";
}

.hs_kw44_configyk::before {
???content: "一汽";
}

2.4 selector提取存在多个class name后缀,而且无法提取到其中文字

In [19]: sel = Selector(text=dr.page_source)In [20]: sel.css(‘span[class*="hs_kw"]‘).extract()Out[20]:[‘<span class="hs_kw34_configyk"></span>‘, ‘<span class="hs_kw66_configyk"></span>‘,... ‘<span class="hs_kw42_optionLA"></span>‘,]

2.5 使用正常浏览器全局搜索 hs_kw 和 hs_kw22_configyk 和 configyk ,仅发现关联文件 3170.html

2.6 Sources面板对 3170.html 进行美化排版,复制到 notepad++ 搜索关键字定位到 js 代码段

继续搜索

2.7 复制其中一个代码片段到 Sources 面板作为 snippet,移除头尾 <script> </script>,添加断点

2.8 运行代码片段,发现可疑变量

2.9 根据关键字定位到相关函数(运气好的话),添加log语句,重新运行输出中间变量

3 完整代码

import refrom html import unescapefrom scrapy import Selectorfrom selenium import webdriver# <script> ???# (function(qZ_) { ???????# var Dp_ = function(Dp__) { ???????# ... ???????# return ‘.hs_kw‘ + $index$ + ‘_baikeYK‘; ???????# ... ???# } ???# )(document);# </script># 提取目标脚本pattern_script = re.compile(r""" ???<script> ???( ??????????????????????????#(): all js code ???????\(function.*? ???????‘\.hs_kw‘.*?‘(.*?)‘ ????#(.*?): postfix _baikeYK in ‘.hs_kw‘ + $index$ + ‘_baikeYK‘ ???????.*? ???) ???</script>""", re.X) #|re.S ?????# 改造目标脚本# 中间插入 push 语句# myarr.push([$index$, $temp$]); ??????????????????????# $InsertRule$($index$, $temp$);pattern_code = re.compile(r""" ???^\(function\((.*?)\) ?# (.*?): function argument ???\{ ???????(.*?) ????????????# (.*?): top half js code inside {} ???????(\$InsertRule\$\(\$index\$,\s*\$temp\$\);) ???#(...) ???????(.*?) ????????????# (.*?): bottom half js code inside {} ???\} ????????????????\)\(document\);$ ?????# discard""", re.X)repl_code = r"""return myfunc(document);function myfunc(\1){ ???myarr = Array(); ???\2 ???myarr.push([$index$, $temp$]); ???\3 ???\4 ???return myarr;}""" ?# 根据返回字典填充 htmlmyjs = """var arr = document.querySelectorAll(‘.hs_kw%(key)s%(postfix)s‘);if(arr.length !== 0){ ???for (var i=0; i<arr.length; i++){ ???????console.log(‘.hs_kw%(key)s%(postfix)s %(value)s‘); ???????arr[i].innerHTML = ‘%(value)s‘ + arr[i].innerHTML; ???}}return null;"""url = ‘https://car.autohome.com.cn/config/series/3170.html‘dr = webdriver.Chrome()dr.get(url)# dr.refresh()# 提取多个代码段<script>...</script>及其中的class name 后缀script_list = pattern_script.findall(dr.page_source)for (js, postfix) in script_list: ???print(postfix) ???js = unescape(js) ??#&amp; ?& ???js = pattern_code.sub(repl_code, js) ???ret = dr.execute_script(js) ???mydict = dict(ret) ???print(mydict) ???# 根据返回 myarr 字典填充文字到 html 对应 class name 的 span 内部 ???for (key, value) in mydict.items(): ???????js = myjs%({‘postfix‘: postfix, ‘key‘: key, ‘value‘: value}) ???????# print(js) ???????dr.execute_script(js) ???????sel = Selector(text=dr.page_source)# sel.css(‘span[class*="hs_kw"]‘).extract() ?????for t in sel.css(‘table‘):# for t in sel.css(‘table#tab_0‘): ???print(‘@‘*10) ???for row in t.xpath(‘.//tr‘): ???????print(‘-‘*100) ???????for col in row.xpath(‘.//th|.//td‘): ???????????print(‘‘.join(col.xpath(‘.//node()/text()‘).extract()), end=‘\t‘) ???????print()
View Code

4 运行结果

5.彩蛋(前端都这么直白?!)

js分析 汽_车_之_家 js生成css伪元素 hs_kw44_configUS::before

原文地址:https://www.cnblogs.com/my8100/p/js_qichezhijia.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved