CARVIEW |
Implementation of HDT
W3C Member Submission 30 March 2011
- This version:
- https://www.w3.org/submissions/2011/SUBM-HDT-Implementation-20110330/
- Latest version:
- https://www.w3.org/submissions/HDT-Implementation/
- Editor:
- Javier D. Fernández
- Authors:
-
Javier D. Fernández
Miguel A. Martínez-Prieto
Claudio Gutierrez
Axel Polleres
Mario Arias
Alejandro Andrés
Guillermo Rodríguez-Cano
Copyright © 2011 DERI Galway at the National University of Ireland, Galway, Ireland, Free University of Bozen-Bolzano, The Open University, Universidad Politécnica de Madrid, Alcatel-Lucent, Cisco, OpenLink Software and Profium Ltd. All rights reserved.
This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
Abstract
This document contains a brief description of the implementation of a tool to create and interact with RDF HDT (Header-Dictionary-Triples).
Status of this Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document is a part of the HDT Submission which comprises five documents:
- Binary RDF Representation for Publication and Exchange (HDT)
- Extending VoID for publishing HDT
- RDF Schema for HDT Header Descriptions
- Relationship of HDT to relevant other technologies
- Implementation of HDT
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
Table of Contents
Introduction
Figure 1 shows a conceptual description of the process of obtaining an HDT representation from a RDF graph. The first step extracts basic RDF features necessary to build the Dictionary and the underlying graph, as well as information that will be included in the Header. The second and third steps build the Dictionary and encode the Triples respectively. The abstract notion of HDT is finally implemented (fourth step) into a practical and usable HDT ready for modular and clean publication (and management) and compact exchange.
HDT-It! 0.7 is a C++ tool performing this process. It is a free software / Open Source C tool that makes use of Raptor library to provide a set of parsers and serializers between HDT and the main RDF syntaxes. It also provides a basic querying interface. The project is hosted at https://code.google.com/p/hdt-it.
HDT Creation
HDT creation refers to the process of converting an existing RDF document (in a given syntax) into HDT. HDT-It! makes use of Raptor library to parse firstly the given document (RDF/XML, N3, Turtle, JSON).
The HDT creation is guided by a configuration file given in the execution with the main parameters (documented in the project site). The original RDF document conversion is a multi-phase process.
Dictionary Building
The Dictionary component is an abstract class which is instantiated with a concrete dictionary implementation. HDT-It! 0.7 provides the concrete class DictionaryPlain which corresponds to the dictionary implementation by default.
HDT-It! 0.7 makes use of Hash and vector structures to maintain the mapping between strings and IDs, following the alphabetical order through a final sorting and re-mapping operation.
Triples Encoding
The Triples component is an abstract class which is instantiated with a concrete triples implementation. HDT-It! 0.7 provides the Plain Triples, Compact Triples and Bitmap Triples implementations. The configuration file will specify the concrete implementation to follow.
Once the dictionary is built, HDT-It! 0.7 makes a second read over the original RDF document replacing the IDs, building an auxiliary vector structure to represent the triples and sorting it following the Adjacency List order (by default or the order specified in the configuration file). This structure is used by any of the three given implementations.
- Plain Triples implementation just makes use of the auxiliary structure directly.
- Compact Triples implementation constructs the streams dynamically while reading the auxiliary structure.
- Bitmap Triples implementation constructs the streams in the same way, reading the auxiliary structure. At the same time, a bit array is filled out and finally passed to the bitmap construction class.
HDT output
If output file/s are given in the configuration file, HDT-It! 0.7 creates the Header, Dictionary and Triples files.
- The Header is filled out the format metadata with the information given in the configuration file. It also adds basic statistical metadata (or advanced if it is set in the configuration file).
- The Dictionary is filled out with the dictionary structure built for the dictionary implementation. The dictionary separator can be provided in the configuration file.
- The Triples component follows the concrete triples implementation. The number of bits per element is taken from the configuration file (the default value is 32).
This implementation, HDT-It! 0.7, does not perform the compress phase, both for the Dictionary and the Header. In this case, the user should have to run the appropriate application (e.g. gzip and Huffman) over the generated output and change the Header dc:format property.
HDT in use
HDT-It! 0.7 allows an HDT load from a given HDT Header. It allows several features:
- Print user-friendly Header metadata.
- Generate another RDF syntax from HDT (RDF/XML, N3, Turtle, JSON).
- Querying HDT.
This implementation, HDT-It! 0.7, does not perform the uncompress phase, both for the Dictionary and the Header. In this case, the user should have to run first the appropriate application (e.g. gzip and Huffman) over the original input.
Querying HDT
This feature is only available for Bitmap Triples, due to the operations (rank, select) allowed by the Bitmap indexing and used in Check&Find operation.
HDT-It! 0.7 allows to query by console or by a given file (documented in the project site). The operations can be:
- ASK queries of SPARQL for patterns (s,p,o), (s,?p,?o) and (s,p,?o).
- CONSTRUCT query of SPARQL for simple WHERE patterns (s,p,o), (s,?p,?o) and (s,p,?o).The resultant is a RDF HDT graph.
The S-P-O Adjacency List order is assumed. The response patterns vary for alternative representations S-O-P Adj. List, P-S-O, P-O-S, O-P-S Adj. List and O-S-P Adj. List.
References
- [Notation3 (N3)]
- T. Berners-Lee, Notation 3. Available at https://www.w3.org/DesignIssues/Notation3.
- [RDF/JSON]
- K. Alexander. RDF/JSON: A Specification for serialising RDF in JSON. In SFSW 2008.
- [RDF/XML]
- D. Beckett. RDF/XML Syntax Specification (Revised). W3C Recommendation 10 February 2004. Available at https://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/
- [Turtle]
- D. Beckett, T. Berners-Lee. Turtle - Terse RDF Triple Language. W3C Team Submission 14 January 2008. Available at https://www.w3.org/TeamSubmission/2008/SUBM-turtle-20080114/
Acknowledgements (Informative)
HDT work is partially funded by MICINN (TIN2009-14009-C02-02), Millennium Institute for Cell Dynamics and Biotechnology (ICDB) (Grant ICM P05-001-F), and Fondecyt 1090565 and 1110287. Javier D. Fernández is granted by the Regional Government of Castilla y Leon (Spain) and the European Social Fund.