Node connectivity measurements for Hetionet v1.0 metapaths
Creators
- 1. University of Pennsylvania
- 2. North Carolina State University
- 3. Pfizer
Description
Hetionet v1.0 is a hetnet (heterogeneous network) with 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. This record contains computed connectivity measurements for Hetionet v1.0 for all metapaths (types of paths) up to length 3. These measurements are designed to assess the extent of connectivity between two nodes along a given metapath. Several types of data are included:
- Path counts: Path counts measure the number of paths from a source node to a target node along a specified metapath. The path count is a special case of the degree-weighted path count (DWPC) metric where the damping exponent parameter is set to 0.0. Path counts for all source–target node combinations of a given metapath are stored in a matrix with source nodes as rows and target nodes as columns.
- Degree-weighted path counts: DWPCs measure the abundance of paths from a source to target node along a given metapath (like path counts), but are adjusted for the degrees along the path such that paths through higher degree nodes are downweighted according to a damping parameter. The DWPCs here use a damping exponent of 0.5 and the same matrix serialization as the path count datasets. The values are not scaled/transformed. To compare to the null DWPCs discussed below, divide each value by the mean DWPC for the entire matrix and apply an inverse hyperbolic sine transformation.
- Degree-grouped permutation summaries: Degree-grouped permutations (DGP) are used to compute the significance of DWPC values. Specifically, they are used to estimate null distribution for DWPCs from the unpermuted hetnet. DGP summaries provide summary statistics of DWPCs computed on permuted hetnets. The permuted hetnets are derived from Hetionet v1.0 using the XSwap algorithm. This approach preserves node degree but randomizes edges to muddle their meaning. DWPCs were computed for 200 permuted networks and grouped by source–target node degree within each metapath. Permuted DWPCs were scaled by dividing by the unpermuted DWPC mean and then inverse hyperbolic sine transformed. Every degree pair for a given metapath has corresponding statistics that summarize its values across permuted hetnets. These statistics include the number of observed DWPCs, the number of nonzero DWPCs, the sum of the DWPCs, and the sum of squared DWPCs. These values are sufficient to calculate the parameters of a gamma-hurdle null DWPC distribution.
Data Format: the .zip files are HetMat archive files. This simply means that the directory structure and file formats of the archived files conform to the HetMat data structure for storing hetnets on disk. Matrices are stored as scipy.sparse .npz files. .npz is a numpy array serialization format that scipy uses to write sparse matrices to disk.
TSV files in this upload report information on the contents of the archives. The .zip-info.tsv files contain a list of all files included in the zip archives. metapath-dwpc-stats.tsv contains summary information on the unpermuted path counts and DWPCs. Note that results are archived by path length, such that all metapaths of length 1 are in a different archive than metapaths of length 2. Therefore, users who only need results for shorter metapaths, do not need to download the large archives for longer metapaths. There are 24 metapaths of length 1, 242 metapaths of length 2, and 1939 metapaths of length 3.
Connectivity Search Database: connectivity-search-pg_dump.sql.gz is a PostgreSQL database dump for use with the connectivity-search-backend repository.
Source code: These datasets were computed by the bulk.ipynb notebook from greenelab/hetmech@34e95b9.
Funding: This work was supported through a research collaboration with Pfizer Worldwide Research and Development. This work is funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grants GBMF4552 and GBMF4560.
More information: See the manuscript titled Hetnet connectivity search provides rapid insights into how two biomedical entities are related.
Files
degree-grouped-perms_length-1_damping-0.5.zip
Files
(190.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:4ce3ad2d2db62e429b04f09398486b14
|
5.8 GB | Download |
|
md5:4a20f2507b60ebf30d518b1c5bf75654
|
329.3 kB | Download |
|
md5:81043d9c041c7a98364f398139a01edf
|
16.1 MB | Preview Download |
|
md5:c657ac341cbf5cc1d76b060d505ab58d
|
136.2 MB | Preview Download |
|
md5:921bba3db7ca5ecd98150652da7e62dc
|
733.1 MB | Preview Download |
|
md5:208627dce14ddf54cb0a7cf5c15a0b28
|
459.4 kB | Download |
|
md5:03dd773cde90b560ed64dd1b7d4879f7
|
3.4 MB | Preview Download |
|
md5:567657f54ff2b21e99f9aa58871a1330
|
11.7 MB | Preview Download |
|
md5:a2f95c9f6c8df9ded63d1e2946e40e6e
|
3.2 GB | Preview Download |
|
md5:bb59070e45789f0937503932ea44046c
|
11.5 GB | Preview Download |
|
md5:194d5b65540b9e55cc7e8290ba68d159
|
37.7 GB | Preview Download |
|
md5:81fbc3926d3d4457c15433c1059db9d0
|
131.7 GB | Preview Download |
|
md5:98e5dd74d2efbec20f63966979e20a40
|
211.3 kB | Download |
|
md5:f2d18c4dfd82bcb9af7af8b97690aae5
|
116.9 kB | Download |