NanoTS is a deep-learning variant caller for SNP detection from long-read transcriptome sequencing. It supports both Oxford Nanopore (ONT) and PacBio platforms, with pretrained models for each technology and library type.
- ONT: cDNA and direct RNA (R10.4.1)
- PacBio: HiFi MAS-seq cDNA
Note: NanoTS is currently supported only on Linux systems.
Ensure you have Conda installed, then create and activate the nanoTS environment:
git clone git@github.com:Xinglab/NanoTS.git # or git clone https://github.com/Xinglab/NanoTS.git
cd NanoTS
conda env create -f environment.yml # create the nanoTS environment (take ~1 minute due to dependency resoltuion)
conda activate nanoTSInstall nanoTS:
pip install . # installs in ~5 secondsAlternatively, run the NanoTS runner directly:
python ./source/nanoTS-runner.py [OPTIONS]Ensure you have Singularity installed, then pull the nanoTS container to use it instantly without installation:
singularity pull library://zelinliu/nanots/nanots:latest
| Platform | Library | Chemistry | Model (unphased) | Model (phased) |
|---|---|---|---|---|
| ONT | cDNA | R10.4.1 | unphased_cDNA_nanopore_R104_HG002.pth |
phased_cDNA_nanopore_R104_HG002.pth |
| ONT | direct RNA | R10.4.1 | unphased_dRNA_nanopore_R104_HG002.pth |
phased_dRNA_nanopore_R104_HG002.pth |
| PacBio Revio | cDNA (MAS-seq) | HiFi (CCS), MAS-seq | unphased_MASseq_PacBio_Revio_HG002.pth |
phased_MASseq_PacBio_Revio_HG002.pth |
Tip: Choose the model that matches your platform (ONT vs PacBio), library type (cDNA vs dRNA), and pipeline stage (unphased vs phased).
NanoTS is a command-line tool with the following subcommands:
| Subcommand | Description |
|---|---|
| bam | Appends a numeric suffix to each BAM alignment QNAME to ensure uniqueness. |
| unphased_call | Extracts SNP candidates + features from unphased BAM and performs unphased DL calling. |
| haplotype | Performs haplotype phasing using a VCF, reference, and BAM file. |
| phased_call | Extracts features from H1 & H2 BAM and performs phased DL calling. |
| clean | Removes temporary files from the output directory. |
| full_pipeline | All-in-one command that includes all of the above steps. |
nanoTS <subcommand> [OPTIONS]Append a numeric suffix to each BAM alignment QNAME based on its count per read, ensuring each alignment has a unique QNAME (e.g., read_1, read_2).
| Argument | Description |
|---|---|
-i/--input |
Input sorted BAM file. |
-o/--output |
Output BAM file with suffixed QNAMEs. |
| Argument | Default | Description |
|---|---|---|
--region |
None | Target region (chr1 or chr1:1000-2000). |
Extracts SNP candidates + features from unphased BAM and performs unphased deep learning-based SNP calling.
| Argument | Description |
|---|---|
--bam |
Sorted and indexed BAM file. |
--ref |
Reference genome FASTA file. |
--outdir |
Output directory. |
--model |
Pre-trained unphased model (.pth file). |
| Argument | Default | Description |
|---|---|---|
--threads |
24 | Number of CPU threads. |
--region |
None | Target region for analysis (chr1 or chr1:1000-2000). |
--ALT |
2 | Minimum number of reads supporting the ALT allele. |
--total |
2 | Minimum total read coverage required at a variant site. |
--ratio |
0.05 | Minimum proportion of ALT reads relative to total coverage. |
--depth |
1000 | Maximum number of reads sampled per variant (0 to disable limit). |
🔹 Note: Increasing --ALT and --total thresholds may affect haplotype inference efficiency.
Performs haplotype phasing using an input VCF, reference genome, and BAM file.
| Argument | Description |
|---|---|
--vcf |
Input VCF file. |
--ref |
Reference genome FASTA file. |
--bam |
Input BAM file. |
--outdir |
Output directory. |
| Argument | Default | Description |
|---|---|---|
--hap_qual |
10 | QUAL threshold to filter SNPs during haplotype phasing. |
Extracts features from H1 & H2 BAM and performs phased deep learning-based SNP calling.
| Argument | Description |
|---|---|
--bam |
Sorted and indexed BAM file. |
--ref |
Reference genome FASTA file. |
--outdir |
Output directory. |
--model |
Pre-trained phased model (.pth file). |
| Argument | Default | Description |
|---|---|---|
--threads |
24 | Number of CPU threads. |
--depth |
1000 | Maximum reads per variant (0 to disable limit). |
Removes temporary files in the output folder.
| Argument | Description |
|---|---|
--outdir |
Output directory to clean. |
Runs the entire NanoTS pipeline from a BAM file through final phased VCF output, then cleans up intermediate files.
| Argument | Description |
|---|---|
--bam |
Input sorted & indexed BAM file. |
--ref |
Reference genome FASTA file. |
--model_unphased |
Path to unphased model (.pth file). |
--model_phased |
Path to phased model (.pth file). |
--outdir |
Output directory for all steps. |
| Argument | Default | Description |
|---|---|---|
--threads |
24 | Number of CPU threads. |
--region |
None | Target region for analysis (chr1 or chr1:1000-2000). |
--ALT |
2 | Minimum number of reads supporting the ALT allele. |
--total |
2 | Minimum total read coverage required at a variant site. |
--ratio |
0.05 | Minimum proportion of ALT reads relative to total coverage. |
--depth |
1000 | Maximum number of reads sampled per variant (0 to disable limit). |
--hap_qual |
10 | QUAL threshold to filter SNPs during haplotype phasing. |
Here’s an end-to-end example using NanoTS. The run takes about 5 minutes and requires at least 23 GB of memory.
cd example/
# Download and prepare the reference genome
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gzip -d hg38.fa.gz
samtools faidx hg38.fa
# Align Nanopore reads to the reference genome
minimap2 -ax splice -ub -t 24 -k 14 -w 4 --secondary=no hg38.fa tutorial.fastq.gz | \
samtools sort -o tutorial.bam # Note: requires at least 23 GB of memory
samtools index tutorial.bam
# Define output directory
outdir=nanoTS_result/
##################################
#### All-in-one step ####
nanoTS full_pipeline \
--bam tutorial.bam \
--ref hg38.fa \
--model_unphased ../model/unphased_cDNA_nanopore_R104_HG002.pth \
--model_phased ../model/phased_cDNA_nanopore_R104_HG002.pth \
--outdir $outdir \
--threads 24 \
--ALT 2 \
--total 2 \
--ratio 0.05 \
--depth 1000 \
--hap_qual 10
##################################
#### Separate steps ####
# Step 1: Suffix each BAM alignment QNAME by count per read
nanoTS bam \
-i tutorial.bam \
-o tutorial.qname.bam
# Step 2: Unphased SNP calling
nanoTS unphased_call \
--bam tutorial.qname.bam \
--ref hg38.fa \
--threads 24 \
--ALT 2 \
--total 2 \
--ratio 0.05 \
--depth 1000 \
--outdir $outdir \
--model ../model/unphased_cDNA_nanopore_R104_HG002.pth
# Step 3: Haplotype phasing
nanoTS haplotype \
--vcf ${outdir}unphased_predict.pass.vcf \
--ref hg38.fa \
--bam tutorial.qname.bam \
--outdir $outdir \
--QUAL 10
# Step 4: Phased SNP calling
nanoTS phased_call \
--bam tutorial.qname.bam \
--ref hg38.fa \
--threads 24 \
--depth 1000 \
--outdir $outdir \
--model ../model/phased_cDNA_nanopore_R104_HG002.pth
# Step 5: Clean temporary files
nanoTS clean \
--outdir $outdirRun NanoTS with Singularity (always bind host folders you read/write)
cd example/
SIF=../../../nanots_latest.sif # use your SIF path
EXAMPLE_DIR="$(pwd)"
MODEL_DIR="$(cd ../model && pwd)" # the model folder in this repository. Adjust if your model files live elsewhere
OUTDIR="$EXAMPLE_DIR/nanoTS_result"
mkdir -p "$OUTDIR"
# Bind BOTH the data dir and the model dir so the container can see them.
singularity exec -B "$EXAMPLE_DIR","$MODEL_DIR" "$SIF" \
/opt/conda/envs/nanoTS/bin/nanoTS full_pipeline \
--bam "$EXAMPLE_DIR/tutorial.bam" \
--ref "$EXAMPLE_DIR/hg38.fa" \
--model_unphased "$MODEL_DIR/unphased_cDNA_nanopore_R104_HG002.pth" \
--model_phased "$MODEL_DIR/phased_cDNA_nanopore_R104_HG002.pth" \
--outdir "$OUTDIR" \
--threads 24 \
--ALT 2 --total 2 --ratio 0.05 --depth 1000 --hap_qual 10
unphased_predict.pass.vcf→ SNP calls generated from theunphased_callstep.phased_predict.pass.vcf→ SNP calls generated from thephased_callstep after haplotype phasing.
All results are stored in the specified --outdir directory.
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.
For questions, bug reports, or feature requests, contact:
📧 Zelin Liu – liuz6@chop.edu