You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WIP zig wrapper for htslib parsing of VCFs for genetic variants
I wrote this learning zig so it probably has many non-ziggy usages.
Most of the htslib VCF/Variant stuff is supported.
Usage
⚠️hts-zig tries to allocate as little as possible, use
.dup() on a variant if when it's necessary to keep it in memory.
constvcf=@import("src/vcf.zig")
constVCF=vcf.VCFconstallocator=std.testing.allocator;
varvcf=VCF.open("tests/test.snpeff.bcf").?; // can return null// file and cleanup htslib memory at end of scopedefervcf.deinit();
varvariant=vcf.next().?; // .? syntax gets an optional result.trystdout.print("\nvariant:{any}\n", .{variant}); // Variant(chr1:30859-30860 (G/C))// # get the AD field// # needs to allocate, this interface will likely change.// # extract the FORMAT/sample AD field (allelic depth)// # to get INFO, use get(vcf.Field.info, ...);varfld="AD";
varads=std.ArrayList(i32).init(allocator)
deferads.deinit();
tryvariant.get(vcf.Field.format, ads, fld);
// 4 samples * 2trystdout.print("\nAD:{any}\n", .{ads.items}); // { 7, 0, 2, 0, 6, 0, 4, 0 }// # genotypes:vargts_mem=std.ArrayList(i32).init(allocator); // can re-use this for each variantdefergts_mem.deinit();
vargts=tryvariant.genotypes(>s_mem);
trystdout.print("\ngts:{any}\n", .{gts});
// # gts:[0/0/, 0/0/, 0/1/, 0/0/] (note trailing / is accurate as it's how it's stored in htslib)// # region queriesvariter=tryivcf.query(chrom, 69269, 69270);
while (iter.next()) |v| {
trystd.testing.expect(v.start() ==69269);
}
testing and dev
This will require zig and htslib installed.
zig build test
Zig requires very little additional work to wrap a C library, but it is
currently not able to access struct fields when the struct contains bitfields.
For this reason, we need to write functions to access fields in many htslib
structs. Those functions are here
TODO
Add nice genotypes access methods/structs
Add vcf.query() (currently only iteration from start of file is supported, not querying by genomic location.
updating header.
setting INFO fields.
writing. currently everything is read-only.
fewer allocations (pass ArrayList to functions).
support querying vcf as well as bcf
set_samples().
fix ergonomics and think about error and null return types.
Why?
it's quite useful to learn a new language by writing something that
I'll use every day. zig looks interesting and sane