| CARVIEW |
Select Language
HTTP/2 200
date: Mon, 19 Jan 2026 12:12:53 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"c1551feb4cb4e7598d0d40cb68f7cf4e"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com github.githubassets.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com wss://alive-staging.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com marketplace-screenshots.githubusercontent.com/ copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com github.githubassets.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=KZPsezupvKlRpIxEe%2F%2BrN%2B0%2BPtVLgZVtp5uALQ40FV7NtO4fxxb%2BiqzapRkU6oFoc2MRLadO%2B0cet1URoEYgNPG8X5AkvLg2fXI9X5B00d2%2FuJDevg3G022ck2VpGp9spm8auwtD7CP5HGaxWRmMKaGo5n65%2FsJa%2FJUNGyaKTZiGWTnbSpR5W2hZtlHXChczsADdaRdsk400z0rsjGaFeCv%2FFGrPULzCL0lkIuLsfvd3o%2Fs10UmuKaYH1LcQWPMAFqQXAH9%2Bdaw2I760c9jC2g%3D%3D--tFux7JCPZP1UcfSx--vC1%2BmJUNvZMLw12VRe6ODA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1126079031.1768824772; Path=/; Domain=github.com; Expires=Tue, 19 Jan 2027 12:12:52 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 19 Jan 2027 12:12:52 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: EA36:3710FD:32BCAC1:3BC03D9:696E1FC4
GitHub - realbigws/DeepBound: Predict the boundary of transcript start and end from RNA-seq reads alignment
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 0
realbigws/DeepBound
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
#=================
DeepBound (v0.1)
date: 2017.02.10
#=================
Author: Sheng Wang, Jianzhu Ma, Mingfu Shao
Contact email: realbigws@gmail.com, majianzhu@gmail.com, shaomingfu@gmail.com
#=================== Abstract =========================
Predict the boundary of transcript (i.e., start and end) from RNA-seq reads alignment,
via AUC-Maximized Deep Convolutional Neural Fields (DeepCNF) model.
[Reference]:
DeepBound: Accurate Identification of Transcript Boundaries via Deep Convolutional Neural Fields.
Mingfu Shao, Jianzhu Ma, and Sheng Wang
(submitted to ISMB 2017)
#=================== Install ==========================
1. download the package
git clone --recursive https://github.com/realbigws/DeepBound
cd DeepBound/
--------------
2. compile
cd source_code/
./oneline_make.sh
cd ../
--------------
3. update the package
git pull
git submodule foreach --recursive git pull origin master
#================
Overall Package
#================
#================= Usage and Example =====================#
USAGE: ./oneline_command.sh <-i input_BAM_file> [-o output_boundary]
[-f chromosomes_dir] [-g annotation_file] [-m min_expression]
[-O premature_bound] [-p prob_threshold] [-w window_size]
Options:
***** required arguments *****
-i input_BAM_file : specifies the input reads alignemnt file in BAM format.
It has to be sorted (for example, by samtools).
-o output_boundary : specifies the predicted boundaries in FASTA format.
0 for end-boundary, 1 for non-boundary, and 2 for start-boundary.
***** optional arguments *****
--| Step 1. from BAM to generate ReadCount by bin/gsamples
-f chromosomes_dir : is optional but strongly recommended,
which specifies directory of chromosome sequences in FASTA format.
-g annotation_file : is optional, which specifies the ground-truth expression abundance in GTF file.
-m min_expression : is together with '-g' option, which will make gsamples
ignore these transcripts in GTF file with expression abundance under this value.
--| Step 2. predict premature boundary by DeepBound.sh
-O premature_bound : is the directory containing predicted premature boundary,
by DeepBound.sh which is based on AUC-maximizd DeepCNF model.
--| Step 3. process premature boundary by bin/pinpoint
-p prob_threshold : is optional (default 0.6), which specifies the minimum average probability
over a window that will be considerred as a true boundary.
-w window_size : is optional (default 10), which gives the window size for average calculation.
#--------------------------
oneline command example
./oneline_command.sh -i example/test.bam
the output file for predicted boundary could be found in 'test.boundary' in FASTA format.
#==============
Detail Steps
#==============
#======== Section I: from BAM, generate ReadCount by bin/gsamples ==============#
1.1 Usage of bin/gsamples
The usage of gsamples is:
bin/gsamples <in-bam-file> <out-sample-file> [-f fasta-dir] [-g gtf-file] [-e min-expression]
Options:
The parameter <in-bam-file> specifies the input reads alignemnt file. It has to be sorted (for example, by samtools).
The parameter <out-sample-file> specifies the the output sample ReadCount file. See below for more details.
The parameter [fasta-dir] is optional but strongly recommended, which specifies directory of all sequences files of all chromosomes. These sequence files will be used as features.
The parameter [gtf-file] is optional, which specifies the ground-truth expressed transcripts. If this file is given, the true boundaries will be contained in the output sample file as true labels; otherwise, the true labels will be -1. When evaluting mode is on (i.e., to evalute the performance of this method, this file should be given.
The parameter [min-expression] is together with [-g gtf-file], which will make gsamples ignore these transcripts in the gtf-file with expression abundance under this value.
#+++++++++++++++++++++++++++++++++++
1.2 oneline command example
bin/gsamples example/test.bam test.ReadCount
#======== Section II: predict premature boundary by DeepBound.sh ==================#
2.1 Usage of DeepBound.sh
Usage: ./DeepBound.sh <ReadCount_input> <output_folder>
[Note]: <ReadCount_input> should contain concatenated ReadCount files.
<output_folder> would contain predicted premature boundary.
#+++++++++++++++++++++++++++++++++++
2.2 oneline command example
./DeepBound.sh example/test.ReadCount test_out/
#======== Section III: process premature boundary prediction by bin/pinpoint ===========#
3.1 Usage of bin/pinpoint
The usage of pinpoint is:
bin/pinpoint: <sample-file> <prediction-file> <output-file> [-p probability-threshold] [-w window-size]
Options:
The parameter <sample-file> specifies the sample ReadCount file used for prediction (see gsamples program).
The parameter <prediction-file> specifies the probability file generated by the DeepBound.
The parameter <output-file> specifies the predicted boundaries: 0 for end-boundary, 1 for non-boundary, and 2 for start-boundary
The parameter [probability-threshold] is optional (default value is 0.25), which specifies the minimum average probability over a window that will be considerred as a true boundary.
The parameter [window-size] is optional (default value is 10), which gives the window size.
#+++++++++++++++++++++++++++++++++++
3.2 oneline command example
bin/pinpoint example/test.ReadCount example/test.premature test.boundary
#======================
ReadCount file format
#======================
#=================== format of ReadCount file ======================#
each sequence consists of 23 rows.
1. the initial row, starting with '#', is the header showing the comment or description of this sequence.
2. the first row below '#' header records the 'Start Mid End' label.
If real-world data is applied, then just put any number here (e.g., -1);
If for training purpose, then MUST set 0,1,2 for END, MID, and START, respectively.
3. the second row below '#' header records the positive abundance value.
If real-world data is applied, then just put any number here (e.g., 0);
If for training purpose, then MUST put a real positive value here to indicate the abundance value.
4. each data entry should have 23 lines in total, including the '#' header row, first boundary label row, and second abundance value row.
The following 14 rows are the feature row, with the 1st row of them indicating the read coverage.
The next 4 rows are the A,T,C,G sequence in 0/1 one-hot matrix format.
The final 2 rows indicates the binary features of TRANSCRIPT_START and TRANSCRIPT_START.
#=======================
Premature output files
#=======================
#=================== output files and folders ====================#
#-> input data list
1. sample_test_sample_list
the file list shows the number of sequences in <ReadCount_input>, with the same order as in the original file.
all following folders contain each sequence as a file with the same name as 'sample_x', where x is the order of sequence, starting from 0.
#-> output prediction
2. bou_pred_out/
the predicted boundary
#-> concatenated prediction
3. <input_name>.premature
concatenated premature prediction in one file, where <input_name> should be the same as the <ReadCount_input> file.
About
Predict the boundary of transcript start and end from RNA-seq reads alignment
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
You can’t perform that action at this time.