CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 36
Krextor
(sidenote: Krextor is a shining example of using XSLTDoc to document XSLT code. I was able to get up and running with XSLTdoc on my own code in about 20 minutes, and now all I want to do is go document my code! But I'll stick to documenting how to use Christoph's Krextor instead.)
I've been writing XSL for almost a decade and have written plenty of it to transform XML to RDF. But every time I do, it seems like a from-scratch repetitive endeavor that is virtually devoid of excitement and reward. Comparing those experiences to the way that I quickly and concisely transform tabular data to RDF (with conversion:Enhancement), there is clearly something missing in the XML case. In the tabular case, I'm finding and expressing the patterns I see and let the converter handle the drudgery. In (my) XML case, I have to act upon any patterns I see by writing the code. I'm hoping that Christoph has found the XML magic that I found for tabular (but couldn't for XML!).
EEEEEEEEEEAK!
Look at this bad idea that escaped into the world:
<EMSDataSet xsi:schemaLocation="https://www.nemsis.org https://www.nemsis.org/media/XSD/EMSDataSet.xsd"
xmlns="https://www.nemsis.org" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance">
<Header>
...
<E05>
<E05_02 xsi:nil="true"/>
<E05_03>2008-10-30T00:05:51.0Z</E05_03>
<E05_04>2008-10-30T00:06:04.0Z</E05_04>
<E05_05>2008-10-30T00:06:57.0Z</E05_05>
<E05_06>2008-10-30T00:15:43.0Z</E05_06>
<E05_07>2008-10-30T00:20:05.0Z</E05_07>
<E05_09>2008-10-30T00:40:56.0Z</E05_09>
<E05_10>2008-10-30T00:49:11.0Z</E05_10>
<E05_11>2008-10-30T01:05:24.0Z</E05_11>
<E05_13 xsi:nil="true"/>
</E05>
<E06>
<E06_01_0>
<E06_01>WASKI</E06_01>
<E06_02>LAURA</E06_02>
</E06_01_0>
<E06_04_0>
<E06_04>9473 ROSA L PARKS AVENUE</E06_04>
<E06_05>51000</E06_05>
<E06_07>01</E06_07>
<E06_08>36105</E06_08>
</E06_04_0>
<E06_06>01101</E06_06>
<E06_10>424333300</E06_10>
<E06_11>655</E06_11>
<E06_12>670</E06_12>
<E06_13>695</E06_13>
<E06_14_0>
<E06_14>69</E06_14>
<E06_15>715</E06_15>
</E06_14_0>
<E06_16>1936-10-27</E06_16>
<E06_17>3342539663</E06_17>
</E06>
...
</Record>
</Header>
</EMSDataSet>
Not only can RDF help this situation, but so would another flavor of XML! (if you know what I MEEEEE06_17
AN) Regardless, we need the Linkable URI instance naming and OWL ontology that RDF gives us. So lets get started!
https://trac.kwarc.info/krextor/ seems to provide the best overview for what Krextor provides.
- Grab the svn
mkdir -p ~/utilities/krextor/svn
cd ~/utilities/krextor/svn
svn co https://svn.kwarc.info/repos/swim/projects/krextor/trunk krextor
- How would I run it?
ShellScript, JavaWrapper, RunViaJAXP
bash-3.2$ krextor
Syntax: krextor IN..OUT FILE
Extracts RDF from the XML document FILE. IN specifies the format of FILE; OUT
specifies the desired RDF serialization.
-h, --help Show this help
Looks like OUT
can be rxr
, ntriples
, turtle
, rdf-xml
, rdfa
, or YOUR OWN. Turtle is cool with me...
Looks like IN
can be omdoc
, ocd
, xhtml-rdfa
, or hcalendar
-- none of which I care about transforming to RDF. Looks like I can establish my own IN
identifier...
An example that uses Milhouse and Bart Simpson! Looks like the interface we get with Krextor is by providing a bucket of our own templates (called an extraction module) in the krextor:main
mode, then we have a bucket of XSL templates that we can call (krextor:create-resource
, krextor:add-uri-property
, krextor:add-literal-property
) to assert triples.
https://trac.kwarc.info/krextor/browser/trunk/src/xslt/extract shows all of the Extraction Modules that come with Krextor.
Where do I associate the IN
id with my extraction module? Based on https://trac.kwarc.info/krextor/browser/trunk/src/xslt/extract, I'm starting to think IN
corresponds directly to the file name. Do I have to put the extraction module into krextor's utilities/krextor/svn/krextor/src/xslt/extract
? (yes). Not how I'd like to organize my extraction modules because mine will be highly contextualized, but I'll go with it for now. Perhaps I can get past the krextor.sh
and invoke saxon.jar
myself.
mkdir -p ~/utilities/krextor/simpsons-eg
cd ~/utilities/krextor/simpsons-eg
Look at the input:
bash-3.2$ cat milhouse.xml
<person friends="https://van-houten.name/milhouse">
<name>Bart Simpson</name>
</person>
Look at the extraction module:
bash-3.2$ cat krextor-extraction-module-for-social-network-xml.xsl
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "https://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "https://www.w3.org/2000/01/rdf-schema#">
<!ENTITY dc "https://purl.org/dc/elements/1.1/">
<!ENTITY foaf "https://xmlns.com/foaf/0.1/">
]>
<xsl:transform version="2.0"
xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:krextor="https://kwarc.info/projects/krextor"
exclude-result-prefixes="">
<xsl:template match="person" mode="krextor:main">
<xsl:call-template name="krextor:create-resource">
<xsl:with-param name="type" select="'&foaf;Person'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="person/@friends" mode="krextor:main">
<xsl:call-template name="krextor:add-uri-property">
<xsl:with-param name="property" select="'&foaf;knows'"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="person/name" mode="krextor:main">
<xsl:call-template name="krextor:add-literal-property">
<xsl:with-param name="property" select="'&foaf;name'"/>
</xsl:call-template>
</xsl:template>
</xsl:transform>
Get "my" extract module to where krextor.sh
can see it:
bash-3.2$ cp krextor-extraction-module-for-social-network-xml.xsl ~/utilities/krextor/svn/krextor/src/xslt/extract
bash-3.2$ l ~/utilities/krextor/svn/krextor/src/xslt/extract
total 216
-rw-r--r-- 1 lebot staff 1247 Mar 3 09:29 krextor-extraction-module-for-social-network-xml.xsl
drwxr-xr-x 10 lebot staff 340 Feb 2 14:07 util
-rw-r--r-- 1 lebot staff 3113 Feb 2 14:07 hcalendar.xsl
-rw-r--r-- 1 lebot staff 15770 Feb 2 14:07 ocd.xsl
-rw-r--r-- 1 lebot staff 24833 Feb 2 14:07 omdoc-owl.xsl
-rw-r--r-- 1 lebot staff 30614 Feb 2 14:07 omdoc.xsl
-rw-r--r-- 1 lebot staff 2070 Feb 2 14:07 test.xsl
-rw-r--r-- 1 lebot staff 7664 Feb 2 14:07 xhtml-rdfa.xsl
-rw-r--r-- 1 lebot staff 4065 Feb 2 14:07 xmath.xsl
-rw-r--r-- 1 lebot staff 4214 Feb 2 14:07 xml.xsl
Good to note krextor's namespace:
xmlns:krextor="https://kwarc.info/projects/krextor"
Triples!
bash-3.2$ krextor krextor-extraction-module-for-social-network-xml..turtle milhouse.xml
<file:/Users/me/utilities/krextor/simpsons-eg/milhouse.xml>
a <https://xmlns.com/foaf/0.1/Person> ;
<https://xmlns.com/foaf/0.1/knows> <https://van-houten.name/milhouse> ;
<https://xmlns.com/foaf/0.1/name> "Bart Simpson" .
Given all of what just happened, I'm still not ready for https://trac.kwarc.info/krextor/wiki/YourOwnExtraction#GettingStarted...
- Storing extraction templates outside of
svn/krextor/src/xslt/extract
A temporary fix until I can poke into krextor.sh
; I need to keep my extraction modules organized elsewhere.
rm ~/utilities/krextor/svn/krextor/src/xslt/extract/krextor-extraction-module-for-social-network-xml.xsl
cd ~/utilities/krextor/simpsons-eg
mv krextor-extraction-module-for-social-network-xml.xsl krextor-extraction-module-for-social-network-xml.krx
ln -s `pwd`/krextor-extraction-module-for-social-network-xml.krx \
~/utilities/krextor/svn/krextor/src/xslt/extract/krextor-extraction-module-for-social-network-xml.xsl
NOTE: I'm going to prepend .krx
before the .xsl
(or replace it) so I know it is a Krextor Extraction Module. When the stylesheet is not within the context of krextor/src/xslt/extract/
, it can no longer be easily recognized as a krextor extraction module.
Back to the (csv2rdf4lod-automation) conversion cockpit with source/Sample_Output_NEMSIS_XML.xml
waiting to become RDF.
vi manual/nemsis-v2.2.1.krx
and pasted in contents from https://trac.kwarc.info/krextor/browser/trunk/src/xslt/extract/xml.xsl
ln -s `pwd`/manual/nemsis-v2.2.1.krx ~/utilities/krextor/svn/krextor/src/xslt/extract/nemsis-v2.2.1.xsl
Had to tweak krextor
to get my classpath included:
#java -jar ${SAXON_JAR:-$KREXTOR_HOME/lib/saxon/saxon9.jar} -s:$infile -xsl:$transformer
saxon.sh $transformer foo bar $infile
krextor nemsis-v2.2.1..turtle source/Sample_Output_NEMSIS_XML.xml
<xsl:transform version="2.0"
xmlns:xsl="https://www.w3.org/1999/XSL/Transform"
xmlns:xs="https://www.w3.org/2001/XMLSchema"
xmlns:xd="https://www.pnp-software.com/XSLTdoc"
xmlns:krextor="https://kwarc.info/projects/krextor"
xmlns:krextor-genuri="https://kwarc.info/projects/krextor/genuri"
xmlns:ems="https://www.nemsis.org"
xmlns:rat="java:edu.rpi.tw.data.rdf.utils.pipes.starts.Cat"
xmlns:eparams="java:edu.rpi.tw.data.csv.impl.DefaultEnrichmentParameters"
xmlns:foaf="https://xmlns.com/foaf/0.1/"
xmlns:dcterms="https://purl.org/dc/terms/"
exclude-result-prefixes="#all">
<xsl:include href="model_integration/rutil/foaf-ns.xsl"/>
<xsl:include href="model_integration/rutil/dc-ns.xsl"/>
<xd:doc type="stylesheet">
<xd:short>Extraction module for NEMSIS v2.1.1</xd:short>
<xd:author>Timothy Lebo</xd:author>
<xd:copyright></xd:copyright>
<xd:svnId></xd:svnId>
</xd:doc>
<xd:doc>Path to RDF encoding of enhancement parameters.</xd:doc>
<xsl:param name="eparams-ttl" select="'ems-nemsis/version/2011-Mar-01/manual/NEMSIS_Data_
Elements_Definitions_v2.2.1.xls.csv.e1.params.ttl'"/>
<xd:doc>Java object representing the RDF encoding of enhancement parameters.</xd:doc>
<xsl:variable name="eParamsRep" select="rat:load($eparams-ttl)"/>
<xd:doc>Java object that calculates namespaces.</xd:doc>
<xsl:variable name="eParams" select="eparams:new($eParamsRep)"/>
<!-- Note that this is not the global default; actually the
concrete way of URI generation is decided on element level -->
<!--
<param name="autogenerate-fragment-uris" select="'pseudo-xpath', 'generate-id'"/>
-->
<xsl:param name="autogenerate-fragment-uris" select="'generate-id'"/>
<xsl:strip-space elements="*"/>
<xsl:template match="ems:E04" mode="krextor:main">
<xsl:call-template name="krextor:create-resource">
<xsl:with-param name="subject" select="concat(eparams:getURIOfVersionedDataset($eParams),
'/typed/crew-member/',ems:E04_01)"/>
<xsl:with-param name="type" select="($foaf:Person, $foaf:Agent)"/>
<xsl:with-param name="properties">
<krextor:property uri="{$dcterms:isReferencedBy}" object="{eparams:getURIOfVersionedDataset($eParams)}"/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
<xsl:template match="ems:E04_02" mode="krextor:main">
<xsl:call-template name="krextor:add-uri-property">
<xsl:with-param name="property" select="$foaf:firstName"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="ems:E06" mode="krextor:main">
<xsl:variable name="count">
<xsl:number level="any" count="ems:E06"/>
</xsl:variable>
<xsl:call-template name="krextor:create-resource">
<xsl:with-param name="subject" select="concat(eparams:getURIOfVersionedDataset($eParams),
'/typed/person/',$count)"/>
<xsl:with-param name="type" select="($foaf:Person, $foaf:Agent)"/>
<xsl:with-param name="properties">
<krextor:property uri="{$dcterms:isReferencedBy}" object="{eparams:getURIOfVersionedDataset($eParams)}"/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
<xsl:template match="ems:E06_02" mode="krextor:main">
<xsl:call-template name="krextor:add-literal-property">
<xsl:with-param name="property" select="$foaf:firstName"/>
</xsl:call-template>
</xsl:template>
<xsl:template match="ems:E06_01" mode="krextor:main">
<xsl:call-template name="krextor:add-literal-property">
<xsl:with-param name="property" select="$foaf:family_name"/>
</xsl:call-template>
</xsl:template>
</xsl:transform>