📅  最后修改于: 2023-12-03 15:17:05.066000             🧑  作者: Mango
Jsoup is a Java library for working with HTML documents, providing a set of APIs for extracting and manipulating data using the DOM, CSS, and jQuery-like methods. It can be used in conjunction with frameworks like Spring, Hibernate, and Struts, and is compatible with both JVM and Android.
Some of the main features of Jsoup are:
To use Jsoup in your Java project, you can add the following Maven dependency:
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.3</version>
</dependency>
</dependencies>
Alternatively, you can download the JAR file from the official website and add it to your project's classpath.
Document doc = Jsoup.connect("https://www.example.com").get();
System.out.println(doc.title());
This example downloads the HTML document from https://www.example.com
and prints its title to the console.
Document doc = Jsoup.connect("https://www.example.com").get();
Elements links = doc.select("a[href]");
for (Element link : links) {
System.out.println(link.attr("href"));
}
This example extracts all the links from the HTML document and prints their URLs to the console.
Document doc = Jsoup.connect("https://www.example.com").get();
Element link = doc.select("a").first();
link.attr("href", "https://www.google.com");
System.out.println(link);
This example changes the URL of the first link in the HTML document to https://www.google.com
and prints the modified link to the console.
String dirtyHtml = "<p><script>alert('XSS')</script>Example</p>";
String cleanHtml = Jsoup.clean(dirtyHtml, Whitelist.basic());
System.out.println(cleanHtml);
This example cleans the input HTML string by removing any script tags and other potentially malicious content, and prints the sanitized HTML to the console.
Jsoup is a powerful tool for working with HTML documents in Java, providing a comprehensive set of APIs for parsing, manipulating, and sanitizing HTML input. Whether you're building web scrapers, data analysis tools, or full-blown web applications, Jsoup can help you get the job done quickly and easily.