Welcome to the SecureChain project! This repository contains the tools and scripts to build a comprehensive knowledge graph for tracking dependencies, vulnerabilities, and other critical information across the software supply chain.
SecureChain is a project that builds a cross-ecosystem knowledge graph (KG) of software, hardware, and known vulnerabilities (CVE/CWE), linking versions, dependency edges, vendors, and advisories across sources such as ConanCenter, Debian, GitHub, deps.dev, NVD (CVE), CPE, Wikipedia/DBpedia lookups, and curated vendor info.
- Ontology: Secure Chain Ontology (
sc:→https://w3id.org/secure-chain/, extendsschema.org)
-
💾 I just want the data / to query it: see
kg/README.md→ Links to the Google Drive data dump, public SPARQL endpoint, and example queries. -
🛠️ I want to build or extend the KG: see
integration/README.md→ End-to-end pipeline: structured data collectors, optional NER/LLM relation extraction, and KG construction scripts. -
🎨 I want a visual query helper: see
visualization/README.md→ Blockly-based SPARQL blocks to explore the graph visually. -
📜 I want schema details: see the ontology docs → Full class/property hierarchy, with links to schema.org and other reused vocabularies.
Software has become an integral part of crucial infrastructures throughout the United States. Underlying modern software systems is the supply chain of open-source software components, such as Apache Spark, whose functionalities are reused and integrated into various systems underpinning modern society.
While software supply chains empower the rapid development of software systems, they also increase the risks, since any bugs, vulnerabilities, and unauthorized changes in upstream components can propagate to downstream systems and cause severe consequences. This is evident through many software crises witnessed in recent years, such as the Heartbleed bug, the Equifax data breach, and the NPM left-pad incident that almost broke the Internet.
Develop a unified knowledge graph to continually collect and track software dependency and vulnerabilities discussed in various online documents. 🔮
In this project, our team aims to develop a unified knowledge graph that captures rich, up-to-date information about software components in heterogeneous software ecosystems. The resulting knowledge graph will empower us to further develop a novel multi-modal query interface for knowledge dissemination, as well as new risk mitigation approaches that perform deep scans on software systems, detect potential risks, and automatically repair them.
The figure below demonstrates an example knowledge graph for software supply chain security, where each entity—such as a software library or a vulnerability—is represented as a node, and the relations between them are depicted as edges.
SecureChain is a project that builds a cross-ecosystem knowledge graph (KG) of software, hardware, and known vulnerabilities (CVE/CWE), linking versions, dependency edges, vendors, and advisories across sources such as ConanCenter, Debian, GitHub, deps.dev, NVD (CVE), CPE, Wikipedia/DBpedia lookups, and curated vendor info.
The knowledge graph canonically uses the namespace https://w3id.org/secure-chain/ and extends schema.org with a small set of classes & properties for supply-chain security.
This project is organized into several key directories. For detailed information on each component, please refer to the README.md file within the respective directory.
└── SecureChain/
├── integration/
├── kg/
└── visualization/
-
integration/: Contains the complete data integration pipeline for extracting, processing, and constructing the knowledge graph. -
kg/: Provides access to the knowledge graph data dumps, a live SPARQL endpoint, query examples, and detailed ontology information. -
visualization/: Includes a web-based tool for visualizing SPARQL queries against the knowledge graph, making it easier to explore and understand the data.
Contributions are welcome! Typical areas:
-
New data bridges (ecosystems, registries, SBOMs)
-
Schema refinements (properties/classes)
-
Data quality checks & deduplication
-
Query examples & dashboards
Please open an issue or PR with a clear description and steps to reproduce your changes.
This project is licensed under the Apache License 2.0. See the LICENSE file for more details.

