You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jsoup is a Java library that makes it easy to work with real-world HTML and XML. It offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM API methods, CSS, and xpath selectors.
jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers.
clean user-submitted content against a safe-list, to prevent XSS attacks
output tidy HTML
jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree.
If you have any questions on how to use jsoup, or have ideas for future development, please get in touch via jsoup Discussions.
If you find any issues, please file a bug after checking for duplicates.
The colophon talks about the history of and tools used to build jsoup.
Status
jsoup is in general, stable release.
Author
jsoup was created and is maintained by Jonathan Hedley, its primary author.
jsoup is an open-source project, and many contributors have helped improve it over the years. You can see their contributions and join the development on GitHub.
Citing jsoup
If you use jsoup in research or technical documentation, you can cite it as:
Jonathan Hedley & jsoup contributors. jsoup: Java HTML Parser (2009–present). Available at: https://jsoup.org
@misc{jsoup,
author = {Jonathan Hedley and jsoup contributors},
title = {jsoup: Java HTML Parser},
year = {2025},
url = {https://jsoup.org}
}
About
jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.