Given the string "<table><tr><td>Hello World!</td></tr></table>"
, what is the (easiest) way to get a DOM Element representing it?
CARVIEW |
- Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
- Advertising Reach devs & technologists worldwide about your product, service or employer brand
- Knowledge Solutions Data licensing offering for businesses to build and improve AI tools and models
- Labs The future of collective knowledge sharing
- About the company Visit the blog
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesTeams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams7 Answers 7
If you have a string which contains HTML you can use Jsoup library like this to get HTML elements:
String htmlTable= "<table><tr><td>Hello World!</td></tr></table>";
Document doc = Jsoup.parse(htmlTable);
// then use something like this to get your element:
Elements tds = doc.getElementsByTag("td");
// tds will contain this one element: <td>Hello World!</td>
Good luck!
Here's a way:
import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class HtmlParseDemo {
public static void main(String [] args) throws Exception {
Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(reader, new HTMLTableParser(), true);
reader.close();
}
}
class HTMLTableParser extends HTMLEditorKit.ParserCallback {
private boolean encounteredATableRow = false;
public void handleText(char[] data, int pos) {
if(encounteredATableRow) System.out.println(new String(data));
}
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = true;
}
public void handleEndTag(HTML.Tag t, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = false;
}
}
-
What if I want to put all the data pieces into an array in the outer class, rather than print them out? Commented Jun 17, 2013 at 18:21
-
@Imray, go right ahead, you have my permission to put them in some sort of collection instead of printing them :) Commented Jun 17, 2013 at 18:34
-
I put them in a collection inside the
HTMLTableParser
class, and then created a getter method to get them. Is that the best way to do it? Commented Jun 17, 2013 at 19:58 -
@BartKiers how is it related to topic question?? The question is "to get a DOM Element representing it", not to catch SAX events!– rauchCommented Jan 22, 2014 at 10:56
you could use HTML Parser, which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge
How do you make use of the HTML-processing capabilities that are built into Java? You may not know that Swing contains all the classes necessary to parse HTML. Jeff Heaton shows you how.
I've used Jericho HTML Parser it's OSS, detects(forgives) badly formatted tags and is lightweight
I found this somewhere (don't remember where):
public static DocumentFragment parseXml(Document doc, String fragment)
{
// Wrap the fragment in an arbitrary element.
fragment = "<fragment>"+fragment+"</fragment>";
try
{
// Create a DOM builder and parse the fragment.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document d = factory.newDocumentBuilder().parse(
new InputSource(new StringReader(fragment)));
// Import the nodes of the new document into doc so that they
// will be compatible with doc.
Node node = doc.importNode(d.getDocumentElement(), true);
// Create the document fragment node to hold the new nodes.
DocumentFragment docfrag = doc.createDocumentFragment();
// Move the nodes into the fragment.
while (node.hasChildNodes())
{
docfrag.appendChild(node.removeChild(node.getFirstChild()));
}
// Return the fragment.
return docfrag;
}
catch (SAXException e)
{
// A parsing error occurred; the XML input is not valid.
}
catch (ParserConfigurationException e)
{
}
catch (IOException e)
{
}
return null;
}
One can use some of the javax.swing.text.html
utility classes for parsing HTML.
import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
//...
try {
String htmlString = "<html><head><title>Example Title</title></head><body>Some text...</body></html>";
HTMLEditorKit htmlEditKit = new HTMLEditorKit();
HTMLDocument htmlDocument = (HTMLDocument) htmlEditKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(new StringReader(htmlString),
htmlDocument.getReader(0), true);
// Use HTMLDocument here
System.out.println(htmlDocument.getProperty("title")); // Example Title
} catch(IOException e){
//Handle
e.printStackTrace();
}
See:
- The Overflow Blog
-
-
- Featured on Meta
-
-
-
-
-
Linked
Related
Hot Network Questions
- Early 70s scifi novel about underground people, caste is determined by the color of your poncho, and everyone eats mouldy grain
- Did the first sin in the garden of Eden actually take place before the fruit was eaten?
- Can I place between subject variables as random slopes?
- How to 'scale' a square list?
- How can I turn the text in a PNG file back into text layer?
- Is it okay to email researchers if I’m just a high school student?
- Can authors be blacklisted by academic publishers for multiple rejections without any ethical misconduct?
- Question about equilateral triangle that packed into a square
- Blinded despite Blindsight?
- A soviet sci-fi novel (novelette?) with frogs?
- Why is this delayed differential system giving errors in plotting the solution?
- Do the constructible lines and circles (not merely their intersections) cover the plane?
- Why 0.0 is printed as 0.00001 when rounding upward?
- Write two very different programs with the same characters
- Speaking constructibility of a natural number in ZFC
- How to move points in geometry nodes like a venetian blind?
- Rationality of Buying Insurance; Any Way to Evaluate Risk-Aversion?
- Is a chemistry-first technological paradigm as plausible as one focused on physics or biology?
- Was Hermann Ganswindt's helicopter actually the first manned heavier-than-air motor-powered flight?
- Can one engage with physics (or science) without touching philosophy?
- Entity Framework - Is there a safety mechanism to prevent accidentally running Update-Database?
- What does this word "ahn-shdah-heh" mean?
- When was the last time all alive humans stayed on earth?
- Why is the Schrödinger wave equation totally different from the classical wave equation?