You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adam Novak edited this page Apr 13, 2023
·
2 revisions
Graph references often contain linear references within them, which you might want copies of for, for example, calling variants with a linear-reference-based caller like Google's DeepVariant.
If you don't already have a FASTA file for an assembly that is included in a graph, you can use vg to extract the assembly FASTA directly from the graph, like this:
Here, the argument to -x should be the graph file, in rGFA, GFA, .vg, .gbz, or any other graph file format that vg can read (see File Formats). The argument to --paths-by should be the prefix of the set of paths you would like to extract; generally you can use a sample or assembly name here. You can use vg paths --list -x <the graph> to get a list of all paths available.
This will produce a FASTA file on standard output:
>GRCh38#0#chr1
GGGGTACA
In most cases, the sequence names in the FASTA will be in PanSN format (see Path Metadata Model); these will match the names used by vg surject, and so a FASTA extracted like this is easy to use with a BAM file produced by vg surject.
To save it to a file, you can redirect the output with >.
If you are interested in extracting haplotype paths from a .gbwt file, you can pass the .gbwt file with the -g option to vg paths, and the corresponding .gg file or any matching graph with -x.