Given the string "<table><tr><td>Hello World!</td></tr></table>"
, what is the (easiest) way to get a DOM Element representing it?
CARVIEW |
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Advertising Reach devs & technologists worldwide about your product, service or employer brand
- Knowledge Solutions Data licensing offering for businesses to build and improve AI tools and models
- Labs The future of collective knowledge sharing
- About the company Visit the blog
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesTeams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams7 Answers 7
If you have a string which contains HTML you can use Jsoup library like this to get HTML elements:
String htmlTable= "<table><tr><td>Hello World!</td></tr></table>";
Document doc = Jsoup.parse(htmlTable);
// then use something like this to get your element:
Elements tds = doc.getElementsByTag("td");
// tds will contain this one element: <td>Hello World!</td>
Good luck!
Here's a way:
import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class HtmlParseDemo {
public static void main(String [] args) throws Exception {
Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(reader, new HTMLTableParser(), true);
reader.close();
}
}
class HTMLTableParser extends HTMLEditorKit.ParserCallback {
private boolean encounteredATableRow = false;
public void handleText(char[] data, int pos) {
if(encounteredATableRow) System.out.println(new String(data));
}
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = true;
}
public void handleEndTag(HTML.Tag t, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = false;
}
}
-
What if I want to put all the data pieces into an array in the outer class, rather than print them out? Commented Jun 17, 2013 at 18:21
-
@Imray, go right ahead, you have my permission to put them in some sort of collection instead of printing them :) Commented Jun 17, 2013 at 18:34
-
I put them in a collection inside the
HTMLTableParser
class, and then created a getter method to get them. Is that the best way to do it? Commented Jun 17, 2013 at 19:58 -
@BartKiers how is it related to topic question?? The question is "to get a DOM Element representing it", not to catch SAX events!– rauchCommented Jan 22, 2014 at 10:56
you could use HTML Parser, which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge
How do you make use of the HTML-processing capabilities that are built into Java? You may not know that Swing contains all the classes necessary to parse HTML. Jeff Heaton shows you how.
I've used Jericho HTML Parser it's OSS, detects(forgives) badly formatted tags and is lightweight
I found this somewhere (don't remember where):
public static DocumentFragment parseXml(Document doc, String fragment)
{
// Wrap the fragment in an arbitrary element.
fragment = "<fragment>"+fragment+"</fragment>";
try
{
// Create a DOM builder and parse the fragment.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document d = factory.newDocumentBuilder().parse(
new InputSource(new StringReader(fragment)));
// Import the nodes of the new document into doc so that they
// will be compatible with doc.
Node node = doc.importNode(d.getDocumentElement(), true);
// Create the document fragment node to hold the new nodes.
DocumentFragment docfrag = doc.createDocumentFragment();
// Move the nodes into the fragment.
while (node.hasChildNodes())
{
docfrag.appendChild(node.removeChild(node.getFirstChild()));
}
// Return the fragment.
return docfrag;
}
catch (SAXException e)
{
// A parsing error occurred; the XML input is not valid.
}
catch (ParserConfigurationException e)
{
}
catch (IOException e)
{
}
return null;
}
One can use some of the javax.swing.text.html
utility classes for parsing HTML.
import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
//...
try {
String htmlString = "<html><head><title>Example Title</title></head><body>Some text...</body></html>";
HTMLEditorKit htmlEditKit = new HTMLEditorKit();
HTMLDocument htmlDocument = (HTMLDocument) htmlEditKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(new StringReader(htmlString),
htmlDocument.getReader(0), true);
// Use HTMLDocument here
System.out.println(htmlDocument.getProperty("title")); // Example Title
} catch(IOException e){
//Handle
e.printStackTrace();
}
See:
- The Overflow Blog
-
-
- Featured on Meta
-
-
-
-
-
Linked
Related
Hot Network Questions
- Siding listed in IRC - what are “Wood Rustic, Drop”, and “Butt Tip”?
- Quadratic Hecke characters with certain given local components
- Moving multiplying cells away
- Write two very different programs with the same characters
- Why is 0.0 printed as 0.00001 when rounding upward?
- How much of a really big a spaceship could a nuke take out?
- How to extend an MDF shelf?
- Meaning of 芭蕉 in Tang age
- Not being introduced to the team – should I ask for it?
- Early Sailing days in September 1066... Could Harold defeat William at Hastings, if it happened before Stamford Bridge?
- How can I turn the text in a PNG file back into text layer?
- What does this word "ahn-shdah-heh" mean?
- How should one pronounce plural French-derived German words?
- Is it okay to email researchers if I’m just a high school student?
- When did the OEIS get even better?
- Can you yaw an airplane by only using differential thrust?
- Lay explaination of 'lower evolutionary constraint genes'?
- Use of l3keys to collect optional arguments of `\NewDocumentCommand`
- Why does my laptop's display turn off when I'm away from the keyboard?
- Do things unknown automatically become unknowable in normal epistemic modal logic
- What is the meaning of 'Tel.......,qui......'
- Source that Eliyahu Hanavi was an angel at the creation
- Why does the "gap method" give 35 instead of the correct answer 50 for non-adjacent arrangements in a circle?
- What does "enter the hold at arden" mean