0

I was using Cobra until now because of how easy it was but unfortunately it had some problem with a few test cases. Does anyone suggest a tried-and-tested library?

I've tried Cobra's built in one and HTMLCleaner without any luck.

3
  • Judging by your last question, the problem isn't with "XPath evaluator". You were using XPathFactory.newInstance(), which creates the stock Java evaluator that works on any XML document loaded in a DOM model (as instance of Document). CORBA itself isn't an XPath evaluator - it's an HTML parser which produces Document, and it did that wrong in your case. So what you actually want is a "good Java HTML parser", not "good Java XPath evaluator". Commented Nov 26, 2009 at 23:55
  • Oops... sorry. I've revised my question... I'm just going nuts with all the HTML in front of my eyes...
    – Legend
    Commented Nov 27, 2009 at 0:05
  • I'm sure this same question was on SO earlier this week... Commented Nov 27, 2009 at 0:36

5 Answers 5

4

TagSoup is really great when dealing with crappy HTML/XHTML.

Jericho (and NekoHTML) are good too to parse non valid HTML.

TagSoup and Jericho: tried-and-tested. NekoHTML: feedback from trustable source.

0
1

Mozilla HTML Parser looks rather interesting. By definition, it's supposed to be as good as Gecko engine itself, which is likely to cover your needs.

1

Take a look at Saxon (no, I'm not involved in any way with the product, just a satisfied user).

2
  • Saxon is an awesome XSLT 2.0 & XQuery implementation, but it doesn't parse HTML. Commented Nov 27, 2009 at 0:10
  • @Pavel - The original question didn't mention HTML Commented Nov 27, 2009 at 2:31
1

[Answering the title - the overall question and comments are not consistsent]

JTidy (https://jtidy.sourceforge.net/) is a port of Dave Raggett's HTMLTidy. It's very useful though I think development may have slowed/ceased.

1

I suggest Validator.nu's parser, based on the HTML5 parsing algorithm. (Mozilla is currently in the process of replacing its own HTML parser with this one.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.