I was using Cobra until now because of how easy it was but unfortunately it had some problem with a few test cases. Does anyone suggest a tried-and-tested library?
I've tried Cobra's built in one and HTMLCleaner without any luck.
CARVIEW |
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesTeams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about TeamsI was using Cobra until now because of how easy it was but unfortunately it had some problem with a few test cases. Does anyone suggest a tried-and-tested library?
I've tried Cobra's built in one and HTMLCleaner without any luck.
Mozilla HTML Parser looks rather interesting. By definition, it's supposed to be as good as Gecko engine itself, which is likely to cover your needs.
Take a look at Saxon (no, I'm not involved in any way with the product, just a satisfied user).
[Answering the title - the overall question and comments are not consistsent]
JTidy (https://jtidy.sourceforge.net/) is a port of Dave Raggett's HTMLTidy. It's very useful though I think development may have slowed/ceased.
I suggest Validator.nu's parser, based on the HTML5 parsing algorithm. (Mozilla is currently in the process of replacing its own HTML parser with this one.)
XPathFactory.newInstance()
, which creates the stock Java evaluator that works on any XML document loaded in a DOM model (as instance ofDocument
). CORBA itself isn't an XPath evaluator - it's an HTML parser which producesDocument
, and it did that wrong in your case. So what you actually want is a "good Java HTML parser", not "good Java XPath evaluator".