CARVIEW |


W3C Semantic Web Tutorial
Presentation given at Conference on Semantics in Healthcare & Life Sciences (C-SHALS) 2008, in Boston, USA, on the 5th of March, 2008.
Follow along at https://www.w3.org/2008/Talks/0305-C-SHALS/.
Eric Prud'hommeaux (W3C), Sanitation Engineer.
Lee Feigenbaum (Cambridge Semantics), RDF Data Access Working Group Chair.
Last modified: $Date: 2008/12/11 13:43:15 $

Program
Drug discovery
Using the Semantic Web: Precise Answers to Complex Questions:
- Find me genes involved in signal transduction that are related to pyramidal neurons.



Integrate databases ...
- Mesh
- Pubmed
- Entrez Gene
- Gene Ontology
- ...
... so that one query ...
... (trivially) spans several DBs ...

... to yield cross-specialty inforation

What was the trick?
- Good-enough modeling.
- RDFS - simple subclass/subproperty relationships
- OWL - inference and more expressive modeling
- Query interface tailored to data model.
- SPARQL - RDF query language
- Agreement on common terms and relations.
- RDF - flexible underlying data model
- URI - unambiguous naming
Unification needed in lots of places
- Protein/Gene identifiers
- Biological processes
- Publication abstracts
- Publication metadata
- Chemical identifiers
- Patient data
- ...
Patient data
<?xml version="1.0"?> <ClinicalDocument transformation="hl7-rim-to-pomr.xslt"> <recordTarget> <patientRole> <patientPatient> <name> <given>Henry</given> <family>Levin</family> </name> <administrativeGenderCode code="M"/> <birthTime value="19320924"/> </patientPatient> </patientRole> </recordTarget> <component> <StructuredBody> <Observation> <code displayName="Cuff blood pressure"/> <effectiveTime value="200004071430"/> <targetSiteCode displayName="Left arm"/> <entryRelationship typeCode="COMP"> <Observation> <effectiveTime value="200004071530"/> <value value="132" unit="mm[Hg]"/> </Observation> </entryRelationship> </Observation> <Observation> <code displayName="Cuff blood pressure"/> <effectiveTime value="200004071530"/> <targetSiteCode displayName="Left arm"/> <entryRelationship typeCode="COMP"> <Observation> <code displayName="Systolic BP"/> <effectiveTime value="200004071530"/> <value value="135" unit="mm[Hg]"/> </Observation> </entryRelationship> <entryRelationship typeCode="COMP"> <Observation> <code displayName="Diastolic BP"/> <effectiveTime value="200004071530"/> <value value="88" unit="mm[Hg]"/> </Observation> </entryRelationship> </Observation> </StructuredBody> </component> </ClinicalDocument>
- Patient identifier
- Medical history
- Family medical history
- Health-related behavior
RDF is good for modeling all this data...
...regardless of its source.
What does RDF provide?
- Common (simple) model to for all this data.
- Incentive and infrastructure to re-use terms when possible and invent terms when necessary.
- Simple and complex ontology languages (RDFS and OWL).
- Intuitive re-use of now-familiar web topology.
- Scalable — partial (monotonic) reasoning allowed.
The Resource Description Framework (RDF)
A First Look at Turtle
<https://thefigtrees.net/lee/id#lee> <https://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://xmlns.com/foaf/0.1/Person> . <https://thefigtrees.net/lee/id#lee> <https://xmlns.com/foaf/0.1/name> "Lee Feigenbaum" . <https://thefigtrees.net/lee/id#lee> <https://xmlns.com/foaf/0.1/homepage> <https://thefigtrees.net/lee/> .
... is more succinctly represented as:
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#type> . @prefix foaf: <https://xmlns.com/foaf/0.1/> . <https://thefigtrees.net/lee/id#lee> rdf:type foaf:Person ; foaf:name "Lee Feigenbaum" ; foaf:homepage <https://thefigtrees.net/lee/> .
Patient Data in RDF
_:p1 a galen:Patient ; foaf:family_name "Levin" ; foaf:firstName "Henry" . _:c1a edns:patient _:p1 ; edns:screeningBP [ a cpr:clinical-examination ; dc:date "2000-04-07T15:30:00" ; edns:systolic [ a galen:AbsoluteMeasurement ; ex:unit "mm[Hg]" ; r:value "132" ; skos:prefLabel "Systolic BP" ] ; edns:diastolic [ a galen:AbsoluteMeasurement ; ex:unit "mm[Hg]" ; r:value "86" ; skos:prefLabel "Diastolic BP" ] ; edns:location snomed:_66480008 ; # SNOMED:left arm edns:posture snomed:_163035008 # SNOMED:sitting ] . | There is a blood-pressure examination of a patient named Henry Levin. The examination was on 7-April-2000 at 3:30pm and was conducted on the patient's left arm while he was sitting. The examination resulted in a systolic blood pressure measurement of 132 and a diastolic measurement of 86. |
RDF Resources
- RDF at the W3C - primer and specifications
- Semantic Web tools - community maintained list; includes triple store, programming environments, tool sets, and more
- 302 Semantic Web Videos and Podcasts - includes a section specifically on RDF videos
- RDF/XML sample patient data - complex model used in this tutorial
- Turtle sample patient data - complex model used in this tutorial
- Turtle simplified sample patient data - simple model used in this tutorial
Introduction to SPARQL
Why SPARQL?
SPARQL is the query language of the Semantic Web. It lets us:
- Pull values from structured and semi-structured data
- Explore data by querying unknown relationships
- Perform complex joins of disparate databases in a single, simple query
- Transform RDF data from one vocabulary to another
SELECTing variables
- SPARQL variables bind to RDF terms
- Ex. ?journal, ?disease, ?price
- Like SQL, we pick the variables we want from a query with a SELECT clause
- Ex. SELECT ?article ?author ?published
- A SELECT query results in a table of values:
?artist | ?album | ?times_platinum |
---|---|---|
Michael Jackson | Thriller | 27 |
Led Zeppelin | Led Zeppelin IV | 22 |
Pink Floyd | The Wall | 22 |
Triple patterns
A triple pattern is an RDF triple that can have variables in any of the subject, predicate, or object positions.
Examples:
- Find countries and their capital cities:
- ?country geo:capital ?capital .
- Given a FOAF URI, find the person's name:
- <https://thefigtrees.net/id#lee> foaf:name ?name .
- What direct relationships exist between two employees?
- emp:8A0120 ?relationship emp:D29J10 .
Simple query pattern
We can combine more than one triple pattern to retrieve multiple values and easily traverse an RDF graph:
- Find countries, their capital cities, and their populations:
- ?country geo:capital ?capital .
?country geo:population ?population .
- ?country geo:capital ?capital .
- Given a FOAF URI, find the person's name and friends' names:
- <https://thefigtrees.net/id#lee> foaf:name ?name .
<https://thefigtrees.net/id#lee> foaf:knows ?friend .
?friend foaf:name ?friend_name .
- <https://thefigtrees.net/id#lee> foaf:name ?name .
- Retrieve all third-line managers in the company:
- ?emp hr:managedBy ?first_line .
?first_line hr:managedBy ?second_line .
?second_line hr:managedBy ?third_line .
- ?emp hr:managedBy ?first_line .
GRAPH constraints
SPARQL lets us query different RDF graphs in a single query. Consider movie reviews:
- Target one authoritative data source (What does Roger Ebert say?):
-
GRAPH <https://example.org/reviews/rogerebert> { ex:atonement rev:hasReview ?review . ?review rev:rating ?rating . }
-
- Relate multiple sources (How do my reviews compare to Ebert's?):
-
GRAPH <https://example.org/reviews/rogerebert> { ?movie rev:hasReview ?rev1 . ?rev1 rev:rating ?ebert . } GRAPH <https://example.org/reviews/me> { ?movie rev:hasReview ?rev2 . ?rev2 rev:rating ?me . }
-
- Retrieve provenance data (Which reviewers have given out perfect ratings?):
-
GRAPH ?reviewer_graph { ?review rev:rating 10 . }
-
Result forms
Besides selecting tables of values, SPARQL allows three other types of queries:
- ASK - returns a boolean answering, does the query have any results?
- CONSTRUCT - uses variable bindings to return new RDF triples
- DESCRIBE - returns server-determined RDF about the queried resources
SELECT and ASK results can be returned as XML or JSON. CONSTRUCT and DESCRIBE results can be returned via any RDF serialization (e.g. RDF/XML or Turtle).
SPARQL Protocol Mechanics
The SPARQL Protocol is a simple method for asking and answering SPARQL queries over HTTP. A SPARQL URL is built from three parts:
- The URL of a SPARQL endpoint (e.g. https://example.org/sparql)
- (Optional, as part of the query string) The graphs to be queried against (e.g. named-graph-uri=https://example.org/reviews/ebert
- (As part of the query string) The query itself (e.g. query=SELECT...)
https://example.org/sparql?named-graph-uri=http%3A%2F%2Fexample.orgm%2F reviews%2Febert&query=SELECT+%3Freview_graph+WHERE+%7B%0D%0A++GRAPH+%3Frev iew_graph+%7B%0D%0A+++++%3Freview+rev%3Arating+10+.%0D%0A++%7D%0D%0A%7D
Example Query: Henry Levin's Blood Pressure
PREFIX dc: <https://purl.org/dc/elements/1.1/> PREFIX edns: <https://www.loa-cnr.it/ontologies/ExtendedDnS.owl#> PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX galen: <https://www.co-ode.org/ontologies/galen#> PREFIX r: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX snomed: <https://termhost.example/SNOMED/> SELECT ?date ?sys ?dias ?position { ?p r:type galen:Patient ; foaf:family_name "Levin" ; foaf:firstName "Henry" . ?c edns:patient ?p ; edns:screeningBP ?scr . ?scr dc:date ?date ; edns:systolic [ r:value ?sys ] ; edns:diastolic [ r:value ?dias ] ; edns:posture ?position . } ORDER by ?date
The sample query can be run against this sample data.
Example Query 2: Henry Levin's Blood Pressure While Sitting
PREFIX dc: <https://purl.org/dc/elements/1.1/> PREFIX edns: <https://www.loa-cnr.it/ontologies/ExtendedDnS.owl#> PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX galen: <https://www.co-ode.org/ontologies/galen#> PREFIX r: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX snomed: <https://termhost.example/SNOMED/> SELECT ?date ?sys ?dias { ?p r:type galen:Patient ; foaf:family_name "Levin" ; foaf:firstName "Henry" . ?c edns:patient ?p ; edns:screeningBP ?scr . ?scr dc:date ?date ; edns:systolic [ r:value ?sys ] ; edns:diastolic [ r:value ?dias ] ; edns:posture snomed:_163035008 . # SNOMED:sitting } ORDER by ?date
The sample query can be run against this sample data.
SPARQL Resources
- SPARQL Frequently Asked Questions
- SPARQL implementations - community maintained list of open-source and commercial SPARQL engines
- Public SPARQL endpoints - community maintained list
- SPARQL extensions - collection of SPARQL extensions implemented in various SPARQL engines
Introduction to GRDDL
- Using GRDDL to get RDF from XML, XHTML
- XML data
- Mapped to RDF
- XSLT
- GRDDLing HTML
- GRDDL Mechanics
Using GRDDL to get RDF from XML, XHTML
GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a way to boostrap RDF out of XML and in particular XHTML data by explicitly indicating transformations from RDF to XML. GRDDL relies on:
- Source Document: an XHTML or XML document which references at least one GRDDL transformation and hence licenses a GRDDL-aware agent to extract RDF.
- GRDDL-aware agent: a software agent able to identify GRDDL transformations and run them to extract RDF.
- GRDDL Transformation: an algorithm--usually expressed in XSLT--for getting RDF from a source document
XML data
<?xml version="1.0"?> <ClinicalDocument transformation="hl7-rim-to-pomr.xslt"> <recordTarget> <patientRole> <patientPatient> <name> <given>Henry</given> <family>Levin</family> </name> <administrativeGenderCode code="M"/> <birthTime value="19320924"/> </patientPatient> </patientRole> </recordTarget> <component> <StructuredBody> <Observation> <code displayName="Cuff blood pressure"/> <effectiveTime value="200004071430"/> <targetSiteCode displayName="Left arm"/> <entryRelationship typeCode="COMP"> <Observation> <effectiveTime value="200004071530"/> <value value="132" unit="mm[Hg]"/> </Observation> </entryRelationship> </Observation> <Observation> <code displayName="Cuff blood pressure"/> <effectiveTime value="200004071530"/> <targetSiteCode displayName="Left arm"/> <entryRelationship typeCode="COMP"> <Observation> <code displayName="Systolic BP"/> <effectiveTime value="200004071530"/> <value value="135" unit="mm[Hg]"/> </Observation> </entryRelationship> <entryRelationship typeCode="COMP"> <Observation> <code displayName="Diastolic BP"/> <effectiveTime value="200004071530"/> <value value="88" unit="mm[Hg]"/> </Observation> </entryRelationship> </Observation> </StructuredBody> </component> </ClinicalDocument>
- Patient history
- Name
- Age
- Illnesses
- Encounter details
- Reason for visit
- Vitals (e.g. BP)
- Tests
- Prescriptions
- Facility details...
Mapped to RDF
XSLT
<xsl:template match="rim:ClinicalDocument[rim:recordTarget/ rim:patientRole/rim:patientPatient]"> <cpr:patient-record> <xsl:apply-templates select="rim:effectiveTime"/> <xsl:apply-templates select="rim:recordTarget/ rim:patientRole/rim:patientPatient"/> <xsl:for-each select="rim:author/ rim:assignedAuthor/rim:assignedPerson"> <foaf:maker> <foaf:Person> <xsl:apply-templates select="rim:name"/> </foaf:Person> </foaf:maker> </xsl:for-each> <xsl:apply-templates select="rim:component"/> </cpr:patient-record> </xsl:template> <xsl:template match="rim:name/rim:family"> <foaf:family_name><xsl:value-of select="."/></foaf:family_name> </xsl:template> <xsl:template match="rim:name/rim:given"> <foaf:firstName><xsl:value-of select="."/></foaf:firstName> </xsl:template> <xsl:template match="rim:patientPatient"> <edns:about> <galen:Patient> <xsl:apply-templates select="rim:name"/> </galen:Patient> </edns:about> </xsl:template>
- Patterns for extracting information from XML.
- XML or plain text output
- Templates matched by XPath
- Most specific rule applies.
- Explicit enumerators
- xsl:for-each, xsl:when...
XSLT
<xsl:template match="rim:ClinicalDocument[rim:recordTarget/ rim:patientRole/rim:patientPatient]"> <cpr:patient-record> <xsl:apply-templates select="rim:effectiveTime"/> <xsl:apply-templates select="rim:recordTarget/ rim:patientRole/rim:patientPatient"/> <xsl:for-each select="rim:author/ rim:assignedAuthor/rim:assignedPerson"> <foaf:maker> <foaf:Person> <xsl:apply-templates select="rim:name"/> </foaf:Person> </foaf:maker> </xsl:for-each> <xsl:apply-templates select="rim:component"/> </cpr:patient-record> </xsl:template>
<xsl:template match="rim:name/rim:family"> <foaf:family_name><xsl:value-of select="."/></foaf:family_name> </xsl:template> <xsl:template match="rim:name/rim:given"> <foaf:firstName><xsl:value-of select="."/></foaf:firstName> </xsl:template> <xsl:template match="rim:patientPatient"> <edns:about> <galen:Patient> <xsl:apply-templates select="rim:name"/> </galen:Patient> </edns:about> </xsl:template>
XSLT
<xsl:template match="rim:ClinicalDocument[rim:recordTarget/ rim:patientRole/rim:patientPatient]"> <cpr:patient-record> <xsl:apply-templates select="rim:effectiveTime"/> <xsl:apply-templates select="rim:recordTarget/ rim:patientRole/rim:patientPatient"/> <xsl:for-each select="rim:author/ rim:assignedAuthor/rim:assignedPerson"> <foaf:maker> <foaf:Person> <xsl:apply-templates select="rim:name"/> </foaf:Person> </foaf:maker> </xsl:for-each> <xsl:apply-templates select="rim:component"/> </cpr:patient-record> </xsl:template>
<xsl:template match="rim:name/rim:family"> <foaf:family_name><xsl:value-of select="."/></foaf:family_name> </xsl:template> <xsl:template match="rim:name/rim:given"> <foaf:firstName><xsl:value-of select="."/></foaf:firstName> </xsl:template>
<xsl:template match="rim:patientPatient"> <edns:about> <galen:Patient> <xsl:apply-templates select="rim:name"/> </galen:Patient> </edns:about> </xsl:template>
XSLT
<xsl:template match="rim:ClinicalDocument[rim:recordTarget/ rim:patientRole/rim:patientPatient]"> <cpr:patient-record> <xsl:apply-templates select="rim:effectiveTime"/> <xsl:apply-templates select="rim:recordTarget/ rim:patientRole/rim:patientPatient"/> <xsl:for-each select="rim:author/ rim:assignedAuthor/rim:assignedPerson"> <foaf:maker> <foaf:Person> <xsl:apply-templates select="rim:name"/> </foaf:Person> </foaf:maker> </xsl:for-each> <xsl:apply-templates select="rim:component"/> </cpr:patient-record> </xsl:template> <xsl:template match="rim:name/rim:family"> <foaf:family_name><xsl:value-of select="."/></foaf:family_name> </xsl:template> <xsl:template match="rim:name/rim:given"> <foaf:firstName><xsl:value-of select="."/></foaf:firstName> </xsl:template>
<xsl:template match="rim:patientPatient"> <edns:about> <galen:Patient> <xsl:apply-templates select="rim:name"/> </galen:Patient> </edns:about> </xsl:template>
XSLT courtesy of Chimezie Ogbuji
GRDDLing HTML
GRDDL can extract RDF from both XML and (X)HTML.
Patient | Systolic BP | Diastolic BP |
---|---|---|
Henry Levin | 132 | 86 |
... | ... | ... |
<html> <head profile="https://www.w3.org/2003/g/data-view"> <title>Clinical Study 8B1a: Patient BP</title> <link rel="transformation" href="bp-html-to-pomr.xslt" /> </head> ...
GRDDL Mechanics, Part 1
Content publisher provides XML or XHTML document that does one of:
- For individual documents:
- (XHTML) Uses the GRDDL HTML metadata profile and has a link element pointing to a transofmation
- (XML) Has a grddl:transformation attribute on the root node
- For families of documents:
- (XHTML) Uses an HTML metadata profile that itself resolves via GRDDL to RDF that points to a profile transformation
- (XML) Has a namespace document that itself resolves via GRDDL to RDF that points to a namespace transformation
GRDDL Mechanics, Part 2
Content consumers make use of GRDDL by doing the following:
- User points a GRDDL-aware agent at the document.
- User's tools:
- GET the resource
- GET any applicable namespace or profile documents
- GET the transform
- Execute the transformation
- Parse the results as RDF
GRDDL-aware Querying
Some SPARQL engines can directly query GRDDL source documents.
PREFIX dc: <https://purl.org/dc/elements/1.1/> PREFIX edns: <https://www.loa-cnr.it/ontologies/ExtendedDnS.owl#> PREFIX foaf: <https://xmlns.com/foaf/0.1/> PREFIX galen: <https://www.co-ode.org/ontologies/galen#> PREFIX r: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?date ?sys ?dias ?location ?position { ?p r:type galen:Patient ; foaf:family_name "Levin" ; foaf:firstName "Henry" . ?c edns:patient ?p ; edns:screeningBP ?scr . ?scr dc:date ?date ; edns:systolic [ r:value ?sys ] ; edns:diastolic [ r:value ?dias ] ; edns:location ?location ; edns:posture ?position . }
The actual query against the actual XML source is more complex.
GRDDL Resources
- GRDDL Primer
- GRDDL Specification
- GRDDL Tutorial - from WWW2007
DBpedia
- Linked open data emphasizes the use of Semantic Web technologies to build a Web of data
- Benefit: Make the Web's content discoverable, repurposeable and queryable.
- DBpedia represents Wikipedia infoboxes as over 90,000,000 connected RDF triples.
- The Web of data can be browsed independent of any single particular pre-arranged structure.
What other musicians are based in Seattle?
Find me all movies that run longer than 5 hours.
UltraLink
- Novartis solution for cross-linking over 1,500,000 biologic and chemical terms, including synonyms, taxonomies, and pointers into data repositories
- RDF (in particular the SKOS vocabulary) models the network of domain-specific terms
- Text-mining annotators enable one-click access to enterprise-wide information on a particular term in context
UltraLink in Action
Why RDFa?
- UltraLink is only as good as its text-mining software. What if an acquisition brings with it a new Web-based corpus of pathway data that uses terms not recognized by the annotators?
- What if content publishers could indicate the semantics of their content inline in a standard fashion? Then we'd only need to write a single recognizer...
- RDFa (RDF in attributes) allows exactly this.
RDFa: Embedding RDF in HTML attributes
- HTML attributes give the semantics of a Web page's content
- Any RDFa processing software can extract the semantics of a Web page, no matter the domain
- Other software can consume the extracted RDF incrementally, without worrying about the details of the source Web pages
- RDFa goals: modularity, extensibility, loose coupling, don't repeat yourself (DRY), in-context metadata (copy & paste)
RDFa: The Gory Details
attribute | specifies | attribute | specifies | |
---|---|---|---|---|
@about | subjects | @property | predicate relating subject to literal content | |
@href | objects, clickable | @rel | predicate relating subject to resources (@href, @src) | |
@src | objects, embedded | @rev | predicate relating resources to subject in reverse | |
@resource | objects, not clickable | @content | Object of triple (instead of element content) | |
@instanceof | RDF types | @datatype | literal values' data types |
For more, see the RDFa primer or the RDFa specification.
RDFa Example: Chemicals
InChI is a textual identifier for chemical substances. Consider inchi.html:
<table> <tr> <th>Familiar name</th><th>InChI</th> </tr><tr> <td>Methane</td> <td about="https://example.org/methane" property="chem:inchi" xmlns:chem="https://www.blueobelisk.org/chemistryblogs/"> InChI=1/CH4/h1H4 </td> ...
This RDFa encodes the single RDF triple:
<https://example.org/methane> chem:inchi "InChI=1/CH4/h1H4" .
Using RDFa: In context
- The Firefox Operator extension finds structured data in Web pages and enables domain-specific actions.
- On the current Web: content publishers decide what can be done with the data (via links, script)
- On the Semantic Web: content publishers publish actionable data; content consumers decide how to act on it

See inchi.html.
Using RDFa: Query
There are various ways to query Web pages marked up with RDFa:
- Use an RDFa-aware SPARQL engine (as we did with GRDDL).
- Use a GRDDL-aware SPARQL engine, and publish RDFa data with the profile: https://ns.inria.fr/grddl/rdfa/
- Wrap the Web page(s) in an online RDFa extraction service:
# Find propane's InChI string
PREFIX chem: <https://www.blueobelisk.org/chemistryblogs/>
PREFIX ex: <https://example.org/>
SELECT ?inchi
FROM <https://www.w3.org/2007/08/pyRdfa/extract?uri=https://www.w3.org/2008/Talks/0305-C-SHALS/inchi.html>
WHERE {
ex:propane chem:inchi ?inchi .
}
UltraLink and RDFa References
- Operator Firefox extension, by Mike Kaply
- PubChem Operator action, by Egon Willighagen
- Operator user script to add the "Search in PubChem" action to Operator, updated from Egon Willighagen's original script
- Sample XHTML+RDFa marked up with RDF representing InChI data
- RDFa Talks - caution: some involve outdated RDFa syntax
Progressive modeling
Stages of modeling (frequently in this order):
- Unambiguous identifiers
- Shared identifiers
- Informative relationships (arc predicates)
- Class/type modeling
- OWL modeling
Unambiguous identifiers
- Distinguish p53 protein from skateboard bearings.
- Distinguish a name from a first name, given name, family name, lcommafName.
Shared identifiers
- Use the same identifiers (for e.g. proteins).
- Tabular mappings (my name X = your name Y).
- Algorithmic mappings (my name X = your name f(X)).
Informative relationships (arc predicates)
- Simple annotation systems just "relate" things.
- Next they specialize the relationships (e.g. subsumed by, works for, ...).
- Eventually user-suppliable relationships (by pick or edit).
Class/type modeling
- type labels classify things in your data.
- :Fido a :dog .
- subtype creates hierarchy of these types.
- :dog rdfs:subClassOf :mammal .
- ⇒ :Fido a :mammal .
- subProperty does the same for predicates.
- :Bob my:friendsWith :Sue .
- my:freindsWith rdfs:subProperyOf foaf:knows .
- ⇒ :Bob foaf:knows :Sue .
Inferred type modeling
- domain/range identify the type of the subject/object of a predicate.
- :Bob foaf:givenName "Bob" .
- foaf:givenName rdfs:domain foaf:Person .
- ⇒ :Bob a foaf:Person .
- This may not behave as you expect...
- foaf:givenName rdfs:domain foaf:Person .
- :Logan airport:code "KBOS" .
- :Logan foaf:givenName "Logan International Airport" .
- ...does not "validate":
- ⇒ :Logan a foaf:Person .
OWL modeling
- transitiveProperty: if A p1 B and B p1 C, then A p1 C.
- :Bob :worksFor :Sue .
- :Sue :worksFor :Tom .
- ⇒ :Bob :worksFor :Tom .
- Derivations of types (upcoming).
- Derivations of identity (upcoming).
Introduction to OWL
- Type inference.
- :Bob a :MGHcharlesStOncologist .
- Identity inference.
- :Bob owl:sameAs :BobSmith .
- Consitency rules.
- An :MGHcharlstStPatient
- is an :MGHpatient.
- has a :physician of type (:MGHcharlesStOncologist or :MGHcharlesStOptician or ...).
- has an MGH Charles Street consent form.
- An :MGHcharlstStPatient
OWL type inference
- onProperty
- :fluffy :hasChild :fluffyPrime .
- :fluffyPrime a :kitten.
- ⇒ :fluffy a :cat .
OWL identity inference
- inverseFunctionalPropery
- :Bob foaf:mbox <mailto:bobS@foo.example> .
- :BobSmith foaf:mbox <mailto:bobS@foo.example> .
- foaf:mbox a owl:inverseFunctionalProperty .
- ⇒ :Bob is the same as :BobSmith
- cardinality constraints
- :patientEncounter5 :patient :Bob .
- :patientEncounter5 :patient :BobSmith .
- :patient owl:maxCardinality 1 .
- ⇒ :Bob is the same as :BobSmith
OWL consistency rules
- An :MGHcharlstStPatient
- is an :MGHpatient.
- has one :physician of type (:MGHcharlesStOncologist or :MGHcharlesStOptician or ...).
- has an MGH Charles Street consent form.
- Limited "validation" capability.
- _:patientX :physician _:physY
_:physY a :MGHcharlesStOncologist - _:patientX :physician _:physZ
_:physZ a :MGHcharlesStOpticiaMn - ⇒ :MGHcharlesStOncologist is the same as :MGHcharlesStOptician
- _:patientX :physician _:physY
- use disjointWith to assert that the class of :MGHcharlesStOncologists has different members than the the class :MGHcharlesStOpticians
- (use differentFrom to assert that :MGHcharlesStOncologist7 is not the same as :MGHcharlesStOptician3)
OWL expression
- An :MGHcharlstStPatient is an :MGHpatient who has one :physician of type (:MGHcharlesStOncologist or :MGHcharlesStOptician or ...)
- Abstract syntax
Class(a:MGHcharlstStPatient partial a:MGHpatient) Class(a:MGHcharlstStPatient complete restriction(a:physician allValuesFrom(unionOf(a:MGHcharlesStOncologist a:MGHcharlesStOptician))))
- RDF syntax:
_:MCSO rdfs:subClassOf :MGHpatient . _:MCSO rdfs:subClassOf _:physType . _:physType owl:onProperty :physician . _:physType owl:allValuesFrom _:list1 . _:list1 rdf:first :MGHcharlesStOncologist . _:list1 rdf:rest _:list2 . _:list2 rdf:first :MGHcharlesStOptician . _:list2 rdf:rest rdf:nil . _:MCSO owl:equivalentClass :MGHcharlstStPatient .
OWL restrictions
- _:physType a owl:Restriction .
- _:physType owl:onProperty :physician .
- _:physType owl:allValuesFrom _:anotherRestriction .
HCLS KB flagship query
The application of a commercial text mining tool to neuroscience-related PubMed abstracts results in a set of annotations that link MeSH terms to genes (for more details on MeSH, see the table in Data Sources. An article with PubMed id 10698743 mentions ncbi_gene:1812 and that the corresponding PubMed record has a MeSH term mesh:D017966. The following three triples express this:
pubmedRec:10698743 | sc:has-as-minor-mesh | mesh:D017966 |
article:10698743 | sc:identified_by_pmid | pubmedRec:10698743 |
ncbi_gene:1812 | sc:describes_gene_or_gene_product_mentioned_by | article:10698743 |
HCLS KB flagship query
A set of genes or gene products in human bodies are described by ncbi_gene:1812. Here, we call this set _:equiv1812.
_:equiv1812 | owl:onProperty | dnaGeneProduct:described_by |
_:equiv1812 | owl:hasValue | ncbi_gene:1812 |
HCLS KB flagship query
bySequence:ncbi_gene.1812 is identical to the class _:equiv1812, meaning, it has the same extension (members) but not the same intention (meaning). We assert this identical set because it allows the definition of the gene class to be completely defined by the above two statements (see OWL Web Ontology Language Semantics and Abstract Syntax Section 4. Mapping to RDF Graphs).
bySequence:ncbi_gene.1812 | owl:equivalentClass | _:equiv1812 |
HCLS KB flagship query
Using our other supplied constant, we note that adenylate cyclase activation, go:GO_0007190, is part of signal transduction, go:GO_0007166. Note: this simplified query matches only processes that are a sub-process of go:GO_0007166; the actual query, described in §9 Named Graphs, looks also for subclasses. The part_of relationships were inferred from the OWL class restrictions expressed within the shaky line. These are described in §6.1 Modeling Details. The class of functions that are realized_as adenylate cyclase activation is here labeled _:activateAdenylCyclase.
go:GO_0007190 | obo:part_of | go:GO_0007166 | . |
_:activateAdenylCyclase | owl:onProperty | ro:realized_as | . |
_:activateAdenylCyclase | owl:someValuesFrom | go:GO_0007190 | . |
HCLS KB flagship query
There are many possible classes of substance participating in molecular signaling, one of which (called here _:molecularSignalers_1) is defined by the ability to activate adenyl cyclase.
_:signalingParticipants_1 | owl:onProperty | ro:has_function | . |
_:signalingParticipants_1 | owl:someValuesFrom | _:activateAdenylCyclase | . |
HCLS KB flagship query
The class of proteins in the intersection of _:signalingParticipants_1 and bySequence:ncbi_gene.1812 is here abbreviated protein:p1812_7190_1, though the actual identifier is protein:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a. Note: the Venn diagram reveals that this set is potentially empty (like the intersection of cars and ice cream stands), theoretically permitting the query to range over pairs of gene/process that aren't related through any known protein. However, OWL-DL reasoners will not infer new classes, so the proteins in the intersection of ncbi_gete:1812 and the substances participating in molecular signalling is restricted to the set which have already been entered into the knowledgebase, e.g. like p1812_7190_1.
protein:p1812_7190_1 | rdfs:subClassOf | _:signalingParticipants_1 | . |
protein:p1812_7190_1 | rdfs:subClassOf | bySequence:ncbi_gene.1812 | . |
HCLS KB flagship query
ncbi_gene:1812 and go:GO_0007190 have human-readable labels.
ncbi_gene:1812 | rdfs:label | "Entrez Gene record for human DRD1, 1812" |
go:GO_0007190 | rdfs:label | "adenylate cyclase activation" |
HCLS KB flagship query
try it (or try the tiny URL for the query)
prefix go: <https://purl.org/obo/owl/GO#> prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> prefix owl: <https://www.w3.org/2002/07/owl#> prefix mesh: <https://purl.org/commons/record/mesh/> prefix sc: <https://purl.org/science/owl/sciencecommons/> prefix ro: <https://www.obofoundry.org/ro/ro.owl#> prefix senselab: <https://purl.org/ycmi/senselab/neuron_ontology.owl#> prefix obo: <https://purl.org/obo/owl/obo#> SELECT ?genename ?processname ?receptor_protein_name WHERE { # PubMeSH includes ?gene_records mentioned in ?articles which are identified by pmid in ?pubmed_records . GRAPH <https://purl.org/commons/hcls/pubmesh> { ?pubmed_record sc:has-as-minor-mesh mesh:D017966 . ?article sc:identified_by_pmid ?pubmed_record . ?gene_record sc:describes_gene_or_gene_product_mentioned_by ?article } # The Gene Ontology asserts that foreach ?protein, ?protein ro:has_function [ ro:realized_as ?process ]. GRAPH <https://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?restriction1 . ?restriction1 owl:onProperty ro:has_function . ?restriction1 owl:someValuesFrom ?restriction2 . ?restriction2 owl:onProperty ro:realized_as . ?restriction2 owl:someValuesFrom ?process . # Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to ?gene_record. ?protein rdfs:subClassOf ?protein_superclass . ?protein_superclass owl:equivalentClass ?restriction3 . ?restriction3 owl:onProperty sc:is_protein_gene_product_of_dna_described_by . ?restriction3 owl:hasValue ?gene_record . # Each ?process (that we are interested in) is a subclass of the signal transduction process. # @@ nested graph constraint GRAPH <https://purl.org/commons/hcls/20070416/classrelations> { { ?process obo:part_of go:GO_0007166 } UNION { ?process rdfs:subClassOf go:GO_0007166 } } } GRAPH <https://purl.org/commons/hcls/gene> { ?gene_record rdfs:label ?genename } GRAPH <https://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname }}
OWL restriction expressivity
- PARTIAL — This restriction is a necessary condition for this class.
- COMPLETE — This restriction completely defines the extent of this class.
- AND — This restriction requires both things to be true.
- OR — This restriction requires either thing to be true.
- ALL — The values of some property must all be in some set.
- SOME — Some of the values of a property must be in some set.
OWL Resources
- W3C OWL Guide and Specifications
- Manchester OWL Syntax (PDF)
- OWL 1.1 Working Group - currently working on a new version of OWL
- OWL Reasoner List - community maintained list of open-source and commercial OWL tools