1.获取博客园的博客标题以及博客地址,获取友情链接
2.代码实现:
public static void main(String[] args) throws Exception{ ???????// 创建httpClient实例 ???????CloseableHttpClient httpClient = HttpClients.createDefault(); ???????// 创建httpGet实例 ???????HttpGet httpGet = new HttpGet("http://www.cnblogs.com"); ???????httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"); ???????CloseableHttpResponse response = httpClient.execute(httpGet); ???????String content = null; ???????if(response != null){ ???????????HttpEntity entity = response.getEntity(); ??????????????content = EntityUtils.toString(entity, "UTF-8"); ?// 获取网页内容 ???????????Document document = Jsoup.parse(content); ?// 解析网页,得到文档对象 ???????????????????????// 1.通过选择器查找所有博客标题以及链接 ???????????Elements ele = document.select("#post_list .post_item .post_item_body h3 a"); ???????????for(Element e : ele){ ???????????????System.out.println("博客标题:" + e.text() + "---博客地址:" + e.attr("href")); ???????????} ???????????????????????// 2.获取友情链接 ???????????Element linkEle = document.select("#friend_link").first(); ???????????System.out.println("友情链接纯文本:" + linkEle.text()); ???????????System.out.println("友情链接HTML:" + linkEle.html()); ???????} ???????if(response != null){ ???????????response.close(); ???????} ???????if(httpClient != null){ ???????????httpClient.close(); ???????} ???}
3.Jsoup学习地址
开源博客系统-Jsoup
Jsoup(四)-- Jsoup获取DOM元素属性值
原文地址:http://www.cnblogs.com/xbq8080/p/7534877.html