jsoup is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, ...
Author, DevRel, Blogger, Open Source Hacker, Java Rockstar, Conference Speaker, Instructor and Entrepreneur ...
clean the url:http://blog.sina.com.cn/s/blog_501a5b1f0102dx6z.html It's have to much wbr tags,when i search the page source ,found 24205. i look at org.jsoup.safety ...
With enterprise applications, it's not unusual to aggregate content published on live sites. As such, it's a good idea to develop a level of familiarity with one of the popular Java screen scraper ...