分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 软件开发

【爬虫】爬取某彩票网站的历史数据,并进行分析

发布时间:2023-09-06 02:15责任编辑:顾先生关键词:爬虫

RT。

闲来无事,随便找了一个玩彩票的网址,突发奇想把历史开奖结果拉取下来,并进行分析,看看有什么规律可以帮助到买彩票的。。

首先使用抓包工具charles, 分析这个历史开奖结果的请求方式。

看似就两个参数,但是实际上还有一个cookies,这个是最关键的,通过分析js代码,发现会有一个登陆接口去拿cookies,也就是sessionId,拿到后,放入这个历史数据接口的cookies就可以顺利拿到数据啦~~~

然而数据的返回并不是json格式的,是html,所以采用了大名鼎鼎的jsoup来直接分析,具体方式可以百度。

这里直接贴源码~

package com.wsm.lottery.JSSC10;import com.alibaba.fastjson.JSON;import com.wsm.lottery.dao.LotteryJsscDAO;import com.wsm.lottery.dao.LotteryJsscDAOImpl;import com.wsm.lottery.dao.LotteryJsscDO;import com.wsm.lottery.utils.DateUtils;import com.wsm.lottery.utils.HttpUtils;import com.wsm.lottery.model.JSSC10;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.select.Elements;import java.util.*;public class JSSC10Crawler { ???private static final String JSSC10Url = "**************"; ???private static final LotteryJsscDAO lotteryDao = new LotteryJsscDAOImpl(); ???public static void main(String[] args) throws Exception{ ???????String today = DateUtils.getCurrentDate(); ???????System.out.println(today); ???????Date date = new Date(); ???????int i=20; ???????while(i>5){ ???????????Date newDate = DateUtils.addDay(date,-i); ???????????i--; ???????????String todayNew = DateUtils.dateToString(newDate); ???????????spiderDataIntoDB(todayNew); ???????} ???} ???public static void spiderDataIntoDB(String today) throws Exception{ ???????//1、获取sessionID ???????String cashUrl = JSSC10Url + "/cashlogin"; ???????Map param = new HashMap(); ???????param.put("account","!guest!"); ???????param.put("password","!guest!"); ???????String sessionId = JSON.parseObject(HttpUtils.postForm(cashUrl,param)).getString("message"); ???????//2、获取游客权限--将session注入 ?测试无需。 ???????//GET member/agreement?_OLID_=2ba5aebd150775549fa83a5cfba750297d887bd1 HTTP/1.1 ???????System.out.println(sessionId);// ???????HttpUtils.getSessionId(JSSC10Crawler+"member/agreement?"+sessionId); ???????//3、请求历史记录 ???????//87c1bf6e271f = sessionId ???????Map headMap = new HashMap(); ???????headMap.put("Cookie","87c1bf6e271f"+sessionId.substring(sessionId.indexOf("="))); ???????String jsUrl = JSSC10Url + "member/dresult?lottery=PK10JSC&date="; ???????String resHtml = HttpUtils.get(jsUrl+today,headMap); ???????Document resultDocument = Jsoup.parse(resHtml); ???????//#drawTable > table > tbody > tr:nth-child(1) > td.period ???????//#drawTable > table > tbody > tr:nth-child(1) > td.drawTime ???????//#drawTable > table > tbody > tr:nth-child(1) > td:nth-child(3) > span ???????//#drawTable > table > tbody > tr:nth-child(1104) > td:nth-child(3) > span ???????//4、解析html ???????List<JSSC10> jssc10List = new ArrayList<>(); ???????int i = 1; ???????while (true){ ???????????String trChild = "tr:nth-child("+i+")"; ???????????Elements period = resultDocument.select("#drawTable") ???????????????????.select("table").select("tbody").select(trChild).select("td.period"); ???????????Elements drawTime = resultDocument.select("#drawTable") ???????????????????.select("table").select("tbody").select(trChild).select("td.drawTime"); ???????????if (period.isEmpty()){ ???????????????break; ???????????} ???????????JSSC10 jssc10 = new JSSC10(); ???????????jssc10.setPeriod(period.text()); ???????????String time = drawTime.text(); ???????????time = "2018-" + time.substring(0,5) + " "+ time.substring(time.length()-8); ???????????jssc10.setDrawTime(DateUtils.stringToDate(time,DateUtils.DATE_TIME_FORMAT)); ???????????List<Integer> ballNames = new ArrayList<>(); ???????????for (int j=3; j<=12; j++){ ???????????????String tdChild = "td:nth-child("+j+")"; ???????????????Elements ballName = resultDocument.select("#drawTable") ???????????????????????.select("table").select("tbody").select(trChild).select(tdChild).select("span"); ???????????????ballNames.add(Integer.valueOf(ballName.text())); ???????????} ???????????jssc10.setBallNames(ballNames); ???????????System.out.println(jssc10); ???????????jssc10List.add(jssc10); ???????????i ++; ???????} ???????//分析数据 ???????System.out.println(jssc10List); ???????//插入DB ???????for(JSSC10 jssc10 : jssc10List){ ???????????LotteryJsscDO lotteryJsscDO = new LotteryJsscDO(); ???????????lotteryJsscDO.setCreatePin("siming.wang"); ???????????lotteryJsscDO.setCreateTime(new Date()); ???????????lotteryJsscDO.setPeriod(jssc10.getPeriod()); ???????????lotteryJsscDO.setDrawTime(jssc10.getDrawTime()); ???????????List<Integer> ballNames = jssc10.getBallNames(); ???????????lotteryJsscDO.setBallOne(ballNames.get(0)); ???????????lotteryJsscDO.setBallTwo(ballNames.get(1)); ???????????lotteryJsscDO.setBallThree(ballNames.get(2)); ???????????lotteryJsscDO.setBallFour(ballNames.get(3)); ???????????lotteryJsscDO.setBallFive(ballNames.get(4)); ???????????lotteryJsscDO.setBallSix(ballNames.get(5)); ???????????lotteryJsscDO.setBallSeven(ballNames.get(6)); ???????????lotteryJsscDO.setBallEight(ballNames.get(7)); ???????????lotteryJsscDO.setBallNine(ballNames.get(8)); ???????????lotteryJsscDO.setBallTen(ballNames.get(9)); ???????????lotteryJsscDO.setYn("Y"); ???????????Map paramDb = new HashMap(); ???????????paramDb.put("period",lotteryJsscDO.getPeriod()); ???????????List<LotteryJsscDO> lotteryJsscDOS = lotteryDao.selectListByMap(paramDb); ???????????if(lotteryDao.selectListByMap(paramDb).isEmpty()){ ???????????????lotteryDao.insert(lotteryJsscDO); ???????????} ???????} ???}}

数据库对应的表结构这里也贴一下:

CREATE TABLE `lottery_jssc` ( ?`sys_no` bigint(20) NOT NULL AUTO_INCREMENT, ?`period` varchar(125) DEFAULT NULL, ?`draw_time` datetime DEFAULT NULL, ?`ball_one` int(2) DEFAULT NULL, ?`ball_two` int(2) DEFAULT NULL, ?`ball_three` int(2) DEFAULT NULL, ?`ball_four` int(2) DEFAULT NULL, ?`ball_five` int(2) DEFAULT NULL, ?`ball_six` int(2) DEFAULT NULL, ?`ball_seven` int(2) DEFAULT NULL, ?`ball_eight` int(2) DEFAULT NULL, ?`ball_nine` int(2) DEFAULT NULL, ?`ball_ten` int(2) DEFAULT NULL, ?`create_time` datetime DEFAULT NULL, ?`create_pin` varchar(20) DEFAULT NULL, ?`Yn` varchar(1) DEFAULT NULL, ?PRIMARY KEY (`sys_no`), ?KEY `idx_period` (`period`), ?KEY `idx_draw_time` (`draw_time`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=103311 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
查看表结构ddl

采用的是mybatis+druid,相关配置我已经提交到了github,文末会有相关地址。

最后是分析啦~

直接贴分析结果,以一天为例子,代码可以去github下载。

----------开始分析从2018-09-14 07:00:30到2018-09-15 05:59:15的数据-----------分析1,出现对子,就买13568 10 :1期购买中奖次数:3312期购买中奖次数:1133期购买中奖次数:524期购买中奖次数:95期购买中奖次数:46期购买中奖次数:1分析2,出现对子12359,就买68 10 :1期购买中奖次数:1022期购买中奖次数:883期购买中奖次数:674期购买中奖次数:365期购买中奖次数:276期购买中奖次数:147期购买中奖次数:58期购买中奖次数:29期购买中奖次数:1分析3,出现对子46780,就买135 :1期购买中奖次数:872期购买中奖次数:793期购买中奖次数:524期购买中奖次数:245期购买中奖次数:206期购买中奖次数:147期购买中奖次数:48期购买中奖次数:49期购买中奖次数:310期购买中奖次数:1---------------2018-09-14 数据分析结束!*
查看分析结果-例子

GITHUB地址:https://github.com/wangchaun/lottery-crawlers

欢迎一起交流~

【爬虫】爬取某彩票网站的历史数据,并进行分析

原文地址:https://www.cnblogs.com/wangsiming/p/9657839.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved