分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 前端开发

什么?php也能做爬虫?

发布时间:2023-09-06 01:42责任编辑:郭大石关键词:爬虫

php爬虫代码(爬去我的OJ题库为例)

<?phpfor ($i=1000;;$i++){$url = "http://localhost/JudgeOnline/problem.php?pid=$i"; ?//这儿填OJ地址$info=file_get_contents($url);preg_match(‘|<title>(.*?)<\/title>|i‘,$info,$m); //获取标题$title[$i][1]=$m[1];if (!$m[1]) break; //如果没有标题,说明这题不存在,可以跳过preg_match(‘|<h1 class="text-center">(.*?)<\/h1>|i‘,$info,$m); //获取题目标题信息$title[$i][1]=$m[1];}echo "A total of ";echo $pnum=$i-1000; //题目总数echo " problems<br>";?><?phpfor ($i=1000;$i<=(999+$pnum);$i++){$fh= file_get_contents("http://localhost/JudgeOnline/problem.php?pid=$i"); ?//echo $fh; ?echo "Get P$i ";echo ‘"‘;echo $title[$i][1];echo ‘"<br>‘;unlink("$i.html");$myfile = fopen("$i.html", "w"); //存放文件至 题目编号.htmlfwrite($myfile, $fh);fclose($myfile);}?>

 网页端运行结果:

A total of 21 problems
Get P1000 "1000 : A+B问题"
Get P1001 "1001 : 求累加和"
Get P1002 "1002 : n的阶乘"
Get P1003 "1003 : 阶乘和"
Get P1004 "1004 : 第k小整数"
Get P1005 "1005 : 求a/b的高精度值"
Get P1006 "1006 : 麦森数mason"
Get P1007 "1007 : 旅行"
Get P1008 "1008 : 团伙(team)"
Get P1009 "1009 : 打击犯罪"
Get P1010 "1010 : 家谱(gen)"
Get P1011 "1011 : 搭配购买"
Get P1012 "1012 : 合并果子"
Get P1013 "1013 : 编辑距离"
Get P1014 "1014 : 奖学金"
Get P1015 "1015 : 过河卒"
Get P1016 "1016 : Hello World"
Get P1017 "1017 : 计算器大法好……"
Get P1018 "1018 : 测试题目"
Get P1019 "1019 : 度熊的全1串"
Get P1020 "1020 : 快速排序"

截取1000.html结果:

<!DOCTYPE html><html><head><meta charset="utf-8"><!-- SEO --><meta name="description" content="MasterOJ is an online judge system for ACM/ICPC"><meta name="keywords" content="OJ,Online Judge,MasterOJ,ACM,ICPC"><!-- Icons --><link rel="icon" href="./sitefiles/favicon.ico"><meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0"><meta name="msapplication-TileColor" content="#FEF2E6"><meta name="msapplication-TileImage" content="./sitefiles/favicon.png"><!-- Bootstrap CSS --><link rel="stylesheet" href="./sitefiles/css/bootstrap.min.css"><link rel="stylesheet" href="./sitefiles/css/prettify.css" type="text/css"><link rel="stylesheet" href="./sitefiles/css/font-awesome.min.css" type="text/css"><link rel="stylesheet" href="./sitefiles/css/nprogress.css"><!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --><!--[if lt IE 9]> ?<script src="./sitefiles/js/html5shiv.js"></script> ?<script src="./sitefiles/js/respond.min.js"></script><![endif]--><!--[if lt IE 7]> ?<link rel="stylesheet" href="./sitefiles/css/font-awesome-ie7.css" type="text/css"><![endif]--><link rel="stylesheet" href="./sitefiles/css/bearkidframe.css" type="text/css"><!-- javascripts --><script src="./sitefiles/js/jquery.min.js"></script><script src="./sitefiles/js/bootstrap.min.js"></script><script src="./sitefiles/js/prettify.js"></script><script src="./sitefiles/js/nprogress.js"></script><title>题目描述 - MasterOJ</title></head><body><nav class="navbar navbar-default"><div class="container"><!-- Brand and toggle get grouped for better mobile display --><div class="navbar-header"><button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"><span class="sr-only">Toggle navigation</span><span class="icon-bar"></span><span class="icon-bar"></span><span class="icon-bar"></span></button><a class="navbar-brand" href="index.html">MasterOJ</a></div><!-- Collect the nav links, forms, and other content for toggling --><div id="navbar" class="collapse navbar-collapse"><ul class="nav navbar-nav"><li><a href="hindex.php">首页</a></li><li><a href="problemset.php">题库</a></li><li><a href="status.php">状态</a></li><li><a href="ranklist.php">排名</a></li><li><a href="contestlist.php">比赛</a></li><!--<li><a href="document.php">Document</a></li>--><li><a href="discuss.php">论坛</a></li> ???????????????????????????????<!--<li><a href="#" onclick="javascript:alert(‘该功能未开发!‘)">资源下载</a></li>--> ???????????????????????????????<li><a href="./download/">资源下载</a></li> ???????????????????????????????<li><a href="game/">Games</a></li></ul><ul class="nav navbar-nav navbar-right"><li><a href="./registerpage.php"><i class="fa fa-user-plus"></i> 注册</a></li><li><a href="./loginpage.php"><i class="fa fa-sign-in"></i> 登陆</a></li></ul></div><!-- /.navbar-collapse --></div><!-- /.container-fluid --></nav><div class="container"><h1 class="text-center">1000 : A+B问题</h1><p class="text-center">时间限制:<span class="label label-primary">1 Sec</span>内存限制:<span class="label label-primary">256 MiB</span><br/>提交:<span class="label label-info">8</span>答案正确:<span class="label label-success">5</span></p><p class="text-center"><a id="oj-p-submit" class="btn btn-primary" href="./problemsubmit.php?pid=1000" role="button">提交</a><a class="btn btn-primary" href="./problemstatistics.php?pid=1000" role="button">状态</a><a class="btn btn-primary" href="./discuss.php?pid=1000" role="button">论坛</a><!--<a class="btn btn-primary" href="" role="button">题解</a>--></p><h3><a data-toggle="collapse" data-target="#problemDesc">题目描述</a></h3><div class="collapse in" id="problemDesc" aria-expanded="true"><pre><p>计算A+B的值(A,B,A+B<=2147483647)</p></pre></div><h3><a data-toggle="collapse" data-target="#problemInput">输入</a></h3><div class="collapse in" id="problemInput" aria-expanded="true"><pre><p>两个整数A和B<br></p></pre></div><h3><a data-toggle="collapse" data-target="#problemOut">输出</a></h3><div class="collapse in" id="problemOut" aria-expanded="true"><pre><p>输出A+B</p></pre></div><h3 id="bl-p-datain"><a data-toggle="collapse" data-target="#dataIn">样例输入</a></h3><div class="collapse in" id="dataIn" aria-expanded="true"><div class="zero-clipboard"><span id="bl-p-copy" class="btn-clipboard" onclick="copyToClipboard(document.getElementById(‘dataInContent‘).innerHTML);">复制</span></div><pre id="dataInContent">1 2</pre></div><h3><a data-toggle="collapse" data-target="#dataOut">样例输出</a></h3><div class="collapse in" id="dataOut" aria-expanded="true"><div class="zero-clipboard"><span class="btn-clipboard" onclick="copyToClipboard(document.getElementById(‘dataOutContent‘).innerHTML);">复制</span></div><pre id="dataOutContent">3</pre></div><h3><a data-toggle="collapse" data-target="#problemHint">提示</a></h3><div class="collapse" id="problemHint" aria-expanded="true"><pre></pre></div><!--<h3><a data-toggle="collapse" data-target="#problemTag">标签</a></h3><div class="collapse" id="problemTag" aria-expanded="true"><div class="well"><span class="label label-default"><span></div></div>--><h3><a data-toggle="collapse" data-target="#problemSrc">标签</a></h3><div class="collapse" id="problemSrc" aria-expanded="true"><pre>初级</pre></div></div><!--main wrapper end--><footer class="footer"><div class="container"><p style="float: left;" align="left"><span id="clock">服务器时间: Loading...</span><br/><a class="bl-footer-link" href="document.php">FAQ</a> | <a class="bl-footer-link" href="document.php?f=rule">EULA</a><!--|<i class="fa fa-code"></i><i class="fa fa-download"></i><i class="fa fa-github"></i><i class="fa fa-money"></i><i class="fa fa-book"></i><i class="fa fa-lock"></i><i class="fa fa-qq"></i><i class="fa fa-weixin"></i><i class="fa fa-facebook"></i><i class="fa fa-check"></i><i class="fa fa-circle"></i><i class="fa fa-circle-o"></i><i class="fa fa-clock-o"></i><i class="fa fa-user"></i><i class="fa fa-inbox"></i><i class="fa fa-tags"></i><i class="fa fa-cogs"></i><i class="fa fa-sign-out"></i><i class="fa fa-history"></i><i class="fa fa-edit"></i><i class="fa fa-search"></i><i class="fa fa-laptop"></i><i class="fa fa-paper-plane"></i><i class="fa fa-paper-plane-o"></i><i class="fa fa-flag"></i><i class="fa fa-heart"></i></a>--><p style="float: right; margin-right: 15px;" class="hidden-xs" align="right">Copyright ? 1999~2017 <a class=‘bl-footer-link‘ href=‘hindex.php‘>MasterOJ</a>.<br/>All Rights reserved</p></div> </footer><script>var delta=new Date("2018/02/12 22:56:52").getTime()-new Date().getTime();function clock() {var h,m,s,finalText,week,year,mon,day;var realTime = new Date(new Date().getTime() + delta);year = realTime.getYear() + 1900;if (year > 3000) year-=1900;mon = realTime.getMonth()+1;day = realTime.getDate();week = realTime.getDay();h=realTime.getHours();m=realTime.getMinutes();s=realTime.getSeconds();finalText="服务器时间: "+year+"/"+mon+"/"+day+" "+(h>=10?h:"0"+h)+":"+(m>=10?m:"0"+m)+":"+(s>=10?s:"0"+s);document.getElementById(‘clock‘).innerHTML=finalText;setTimeout("clock()", 1000);}clock();</script><script type="text/javascript">function copyToClipboard(s){ ???//alert(s); ???if(window.clipboardData){ ??????window.clipboardData.setData("Text",s); ??????alert("已经复制到剪切板!"); ???}else if(navigator.userAgent.indexOf("Opera") != -1) { ????????window.location = s; ?????}else if(window.netscape) { ???????try { ?????????????netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); ?????????} catch (e) { ?????????????alert("被浏览器拒绝!\n请在浏览器地址栏输入‘about:config‘并回车\n然后将‘signed.applets.codebase_principal_support‘设置为‘true‘"); ?????????} ?????????var clip = Components.classes[‘@mozilla.org/widget/clipboard;1‘].createInstance(Components.interfaces.nsIClipboard); ?????????if (!clip) ?????????????return; ?????????var trans = Components.classes[‘@mozilla.org/widget/transferable;1‘].createInstance(Components.interfaces.nsITransferable); ?????????if (!trans) ?????????????return; ?????????trans.addDataFlavor(‘text/unicode‘); ?????????var str = new Object(); ?????????var len = new Object(); ?????????var str = Components.classes["@mozilla.org/supports-string;1"].createInstance(Components.interfaces.nsISupportsString); ?????????var copytext = s; ?????????str.data = copytext; ?????????trans.setTransferData("text/unicode",str,copytext.length*2); ?????????var clipid = Components.interfaces.nsIClipboard; ?????????if (!clip) ?????????????return false; ?????????clip.setData(trans,null,clipid.kGlobalClipboard); ?????????alert("已经复制到剪切板!"); ?????}}$(window).load(function(){ ???prettyPrint();})</script></body></html>

 所以php也是能搞爬虫的

什么?php也能做爬虫?

原文地址:https://www.cnblogs.com/yemaster/p/8445800.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved