Carview!

Introduction

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes.

The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC.

Usage

BFC can be invoked as:

bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz

where option -s specifies the approximate size of the genome. It is possible to use one set of reads to correct another set:

bfc -s 3g -t16 readset1.fq.gz readset2.fq.gz | gzip -1 > corrected_readset2.fq.gz

and to process data from Unix pipes ("<(command)" is bash specific):

bash -c "bfc -s 3g -t16 <(bzip2 -dc reads.fq.bz2) <(bzip2 -dc reads.fq.bz2) | gzip -1 > out.fq.gz"

BFC also offers an option to trim reads containing singleton k-mers (don't switch -s and -k as some options are ordered):

bfc -1 -s 3g -k51 -t16 corrected.fq.gz | gzip -1 > trimmed.fq.gz

This command line keeps k-mer occurring twice or more in a bloom filter (with some false positives) and identifies the longest stretch in a read that has hits in the bloom filter. K-mer trimming is about four times as fast as error correction.

BFC-KMC

An alternative implementation of the algorithm is available at the kmc branch of this repository. It uses KMC2 for k-mer counting and keeps high-occurrence k-mers in a bloom filter. BFC-KMC should be invoked as:

kmc -k55 reads.fq.gz prefix tmpdir
bfc-kmc -t16 prefix reads.fq.gz | gzip -1 > corrected.fq.gz

KMC2 source code and precompiled binaries are available at the KMC website.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
tex		tex
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
bbf.c		bbf.c
bbf.h		bbf.h
bfc.c		bfc.c
bfc.h		bfc.h
bseq.c		bseq.c
bseq.h		bseq.h
correct.c		correct.c
count.c		count.c
errstat.js		errstat.js
hash2cnt.c		hash2cnt.c
htab.c		htab.c
htab.h		htab.h
khash.h		khash.h
kmer.h		kmer.h
kseq.h		kseq.h
ksort.h		ksort.h
kthread.c		kthread.c
kvec.h		kvec.h
utils.c		utils.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Usage

BFC-KMC

About

Uh oh!

Releases

Packages

Languages

License

lh3/bfc

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

BFC-KMC

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages