You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BFC is a standalone high-performance tool for correcting sequencing errors from
Illumina sequencing data. It is specifically designed for high-coverage
whole-genome human data, though also performs well for small genomes.
The BFC algorithm is a variant of the classical spectrum alignment algorithm
introduced by Pevzner et al (2001). It uses an exhaustive search to
find a k-mer path through a read that minimizes a heuristic objective function
jointly considering penalties on correction, quality and k-mer support. This
algorithm was first implemented in my fermi assembler and then refined a few
times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a
blocked bloom filter to filter out most singleton k-mers and keeps the rest in a
hash table (Melsted and Pritchard, 2011). The use of bloom filter
is how BFC is named, though other correctors such as Lighter and
Bless actually rely more on bloom filter than BFC.
This command line keeps k-mer occurring twice or more in a bloom filter (with
some false positives) and identifies the longest stretch in a read that has
hits in the bloom filter. K-mer trimming is about four times as fast as error
correction.
BFC-KMC
An alternative implementation of the algorithm is available at the kmc
branch of this repository. It uses KMC2 for k-mer counting
and keeps high-occurrence k-mers in a bloom filter. BFC-KMC should be invoked as: