# create folders for compressed raw data, and decompressed raw data 
mkdir -p ~/nanopore_analysis/{1_raw_data,2_fastq_decompressed}
# create folders for QC
mkdir -p ~/nanopore_analysis/{3_fastqc,4_fastp,5_minionqc}
mkdir -p ~/nanopore_analysis/{6_nanofilt,7_nanoplot}
# create folders for Assembly
mkdir -p ~/nanopore_analysis/{9_metaflye,10_raven,11_canu}
# create folders for verification tools Assembly 
mkdir -p ~/nanopore_analysis/{12_metaquast,13_bandage}
# show definitive folder structure
tree ~/nanopore_analysisIMPORTANT: Dorado tool is needed to convert pod5 to fastq, but it’s unavailable on the server, so the fastq files provided were used instead. After created the project folder, we needed to create links to conect the origin “fastq_pass” folder, containing the barcode folders, to the destiny folder ~/nanopore_analysis/1_raw_data:
# create links to raw data (all the barcode directories with fastq.gz files stored in fastq_pass)
ln -s /group/lectures/DTAN25data/BIOINF26_Gruppe2/no_sample_id/20250219_2052_MN35031_FBA50370_f12dc3bb/fastq_pass/* ~/nanopore_analysis/1_raw_data
# remove non-relevant folders (barcodes 01-08 and barcodes 17-24 and also the unclassified...)
ls | grep -vE 'barcode0[9-9]|barcode1[0-6]' | xargs rm
# now we can have access to the barcode folders 
ls -l
# and also access the fastq.gz files inside, like in this summary:
for folder in barcode09 barcode1{0..6}; do
  count=$(ls "$folder"/*.gz 2>/dev/null | wc -l) # Count the .gz files in the folder
  echo "Folder: $folder ($count gz files)"
done
# create links to ALL barcode01-24/fastq.gz files stored in fastq_pass 
ln -s /group/lectures/DTAN25data/BIOINF26_Gruppe2/no_sample_id/20250219_2052_MN35031_FBA50370_f12dc3bb/fastq_pass/* ~/nanopore_analysis/1_raw_data
# remove non-relevant folders (barcodes 01-08 and barcodes 17-24 and also the unclassified...)
ls | grep -vE 'barcode0[9-9]|barcode1[0-6]' | xargs rm
# now we can have access to the barcode folders 
ls -l
# and also access the fastq.gz files inside, like in this summary:
for folder in barcode09 barcode1{0..6}; do
  count=$(ls "$folder"/*.gz 2>/dev/null | wc -l) # Count the .gz files in the folder
  echo "Folder: $folder ($count gz files)"
doneDecompress all the fastq.gz files (do not worry, it is less than 10 min)
# decompress all fastq.gz files (takes < 10 min)
cd ~/nanopore_analysis/1_raw_data
for barcode in barcode*; do
  mkdir -p ../2_fastq_decompressed/$barcode
  gunzip -c $barcode/*.fastq.gz > ../2_fastq_decompressed/$barcode/combined.fastq
done
ls -hs ~/nanopore_analysis/2_fastq_decompressed/*/*
# check that fastqc worked:
# cat ~/nanopore_analysis/2_fastq_decompressed/barcode01/combined.fastq | head -n 10input: .fastq file or sequencing_summary.txt output: different plots and or summary
In accordance with the results of the initial QC different filter parameters will be defined:
- trimming of read ends
 - adapter/barcode removal
 - filtering mean read quality
 - filterung read length
 
# fastqc: - requirements (output directory must exist)
cd ~/nanopore_analysis/2_fastq_decompressed
for barcode in barcode09; do
  # mkdir -p ~/nanopore_analysis/3_fastqc/"$barcode"
  # fastqc "$barcode"/combined.fastq \
    #        --outdir ~/nanopore_analysis/3_fastqc/"$barcode" \
    #        --extract \
    # 	   --threads 4
  tree ~/nanopore_analysis/3_fastqc/"$barcode"
done- Result: all the fastqc results have the same structure
 
/home/rmedina/nanopore_analysis/3_fastqc/barcode09 ├── combined_fastqc │ ├── fastqc_data.txt │ ├── fastqc.fo │ ├── fastqc_report.html │ ├── Icons │ │ ├── error.png │ │ ├── fastqc_icon.png │ │ ├── tick.png │ │ └── warning.png │ ├── Images │ │ ├── adapter_content.png │ │ ├── duplication_levels.png │ │ ├── per_base_n_content.png │ │ ├── per_base_quality.png │ │ ├── per_base_sequence_content.png │ │ ├── per_sequence_gc_content.png │ │ ├── per_sequence_quality.png │ │ └── sequence_length_distribution.png │ └── summary.txt ├── combined_fastqc.html └── combined_fastqc.zip 3 directories, 18 files
- Basic Statistics Fastqc: 
summary.txt 
The Basic Statistics module provides key data about the analyzed file:
- Filename: Name of the analyzed file.
 - File type: Base calls or colorspace.
 - Encoding: ASCII quality format.
 - Total Sequences: Number processed (actual/estimated).
 - Filtered Sequences: Removed sequences (Casava mode).
 - Sequence Length: Shortest to longest range.
 - %GC: Percentage of guanine and cytosine bases.
 
here in the folder barcode09 we can see the summary.txt:
cat ~/nanopore_analysis/3_fastqc/barcode09/combined_fastqc/summary.txtRESULTS:
PASS Basic Statistics combined.fastq FAIL Per base sequence quality combined.fastq PASS Per sequence quality scores combined.fastq FAIL Per base sequence content combined.fastq PASS Per sequence GC content combined.fastq PASS Per base N content combined.fastq WARN Sequence Length Distribution combined.fastq PASS Sequence Duplication Levels combined.fastq PASS Overrepresented sequences combined.fastq PASS Adapter Content combined.fastq
WE will focus in the statistics with fail and warning signals:
| FAIL | Per base sequence quality | combined.fastq | 
| FAIL | Per base sequence content | combined.fastq | 
| WARN | Sequence Length Distribution | combined.fastq | 
cd ~/nanopore_analysis/2_fastq_decompressed
for barcode in barcode10; do
  mkdir -p ~/nanopore_analysis/3_fastqc/"$barcode"
  fastqc "$barcode"/combined.fastq \
         --outdir ~/nanopore_analysis/3_fastqc/"$barcode" \
         --extract \
	 --threads 4
doneprintf '\n==> summary.txt:\n'
cat ~/nanopore_analysis/3_fastqc/barcode10/combined_fastqc/summary.txt
printf '\n--> fastqc_data.txt:\n'
head ~/nanopore_analysis/3_fastqc/barcode10/combined_fastqc/fastqc_data.txtRESULTS:
==> summary.txt: PASS Basic Statistics combined.fastq FAIL Per base sequence quality combined.fastq PASS Per sequence quality scores combined.fastq FAIL Per base sequence content combined.fastq FAIL Per sequence GC content combined.fastq PASS Per base N content combined.fastq WARN Sequence Length Distribution combined.fastq PASS Sequence Duplication Levels combined.fastq PASS Overrepresented sequences combined.fastq PASS Adapter Content combined.fastq --> fastqc_data.txt: ##FastQC 0.11.9 >>Basic Statistics pass #Measure Value Filename combined.fastq File type Conventional base calls Encoding Sanger / Illumina 1.9 Total Sequences 729889 Sequences flagged as poor quality 0 Sequence length 61-225281 %GC 36
- Per base sequence quality: FAIL
 
Cause: Sequencing chemistry degrades with increasing read length and for long runs Solution: sequencing chemistry degrades with increasing read length and for long runs
ls ~/nanopore_analysis/3_fastqc/barcode09/combined_fastqc/fastqc_report.html
scp -r -P 1722 bioinf02:/home/rmedina/nanopore_analysis/3_fastqc/barcode09/combined_fastqc/fastqc_report.html /home/riccardocd ~/nanopore_analysis/2_fastq_decompressed
for barcode in barcode09 barcode1{0..6}; do
  mkdir -p ~/nanopore_analysis/3_fastqc/"$barcode"
  fastqc "$barcode"/combined.fastq \
         --outdir ~/nanopore_analysis/3_fastqc/"$barcode" \
         --extract \
	 --threads 4
donePath to the fastqc_data.txt file:
/home/rmedina/nanopore_analysis/3_fastqc/barcode09 ├── combined_fastqc │ ├── fastqc_data.txt
cd /home/rmedina/nanopore_analysis/3_fastqc
for barcode in barcode*
do
  printf "${barcode}:   "
  cat /home/rmedina/nanopore_analysis/3_fastqc/"$barcode"/combined_fastqc/fastqc_data.txt \
    | grep '^Total'
doneRESULTS:
| barcode09: | Total Sequences 1065437 | 
| barcode10: | Total Sequences 729889 | 
| barcode11: | Total Sequences 1217908 | 
| barcode12: | Total Sequences 667557 | 
| barcode13: | Total Sequences 407956 | 
| barcode14: | Total Sequences 83556 | 
| barcode15: | Total Sequences 735701 | 
| barcode16: | Total Sequences 1156564 | 
a table with a most detailed information in fastqc_data.txt
- Total Sequences
 - Sequences flagged as poor quality
 - Sequence length
 
cd ~/nanopore_analysis/2_fastq_decompressed
print_separator() {
    printf "|-------------------------------------+-----------------------|\n"
}
for barcode in barcode09 barcode1{0..6}; do
  # mkdir -p ~/nanopore_analysis/3_fastqc/"$barcode"
  # fastqc "$barcode"/combined.fastq \
  #        --outdir ~/nanopore_analysis/3_fastqc/"$barcode" \
  #        --extract \
  # 	 --threads 4
	 fastqc_file="../3_fastqc/${barcode}/combined_fastqc/fastqc_data.txt"
  # Function to print the horizontal separator
  # Print table header
  print_separator
  printf "| dir: %-30s |  file: %-14s |\n" "${barcode}" "combined.fastq"
  print_separator
  
  # Check if the file exists and format the output
  if [[ -f "$fastqc_file" ]]; then
    # Define the specific lines to extract
    sed -n '6p;7p;8p;9p;10p' "$fastqc_file" | while IFS=$'\t' read -r measure value; do
    printf "| %-35s | %-21s |\n" "$measure" "$value"
    done
  else
    printf "| %-35s | %-21s |\n" "File Missing" "N/A"
  fi
done
print_separatorRESULTS:
| dir: barcode09 | file: combined.fastq | 
|---|---|
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 1065437 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 29-391635 | 
| %GC | 42 | 
| dir: barcode10 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 729889 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 61-225281 | 
| %GC | 36 | 
| dir: barcode11 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 1217908 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 64-305294 | 
| %GC | 44 | 
| dir: barcode12 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 667557 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 44-259254 | 
| %GC | 44 | 
| dir: barcode13 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 407956 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 42-380798 | 
| %GC | 40 | 
| dir: barcode14 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 83556 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 59-242572 | 
| %GC | 43 | 
| dir: barcode15 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 735701 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 29-397562 | 
| %GC | 69 | 
| dir: barcode16 | file: combined.fastq | 
| Encoding | Sanger / Illumina 1.9 | 
| Total Sequences | 1156564 | 
| Sequences flagged as poor quality | 0 | 
| Sequence length | 59-241224 | 
| %GC | 45 | 
# cd ~/nanopore_analysis/0_scripts
mkdir -p ~/nanopore_analysis/4_fastp
cd $_
wget http://opengene.org/fastp/fastp
chmod a+x ./fastp
ls -hs fastpR script
- input: sequencing_summary.txt
 - output:
    
- summary.yaml
 - different plots
 
 - combining different data sets for possible comparison
 
download Rscript minion_qc
mkdir -p ~/nanopore_analysis/5_minionqc
cd ~/nanopore_analysis/5_minionqc
curl https://raw.githubusercontent.com/roblanf/minion_qc/master/MinIONQC.R > MinIONQC.RIn this project, the 433 .fastq files contained in each barcode folder, were joined in one called combined.fastq. As a result, each barcode folder contain only one sequencing_summary.txt, which will be used as an input for minion_qc
run the script
cd ~/nanopore_analysis/5_minionqc
Rscript MinIONQC.R \
	--input=~/nanopore_analysis/3_fastqc \
	--output=~/nanopore_analysis/5_minionqc \
	--processors=4 \
	--qscore_cutoff=7 \
	--format=tiff \
	--smallfigures=TRUEIt sounds like you’re looking to structure the data exploration and filtering process for your Nanopore metagenomics data. Let me break it down step by step using the tools and methods available:
- Data Exploration (Initial QC)
    
- Input: 
.fastqfiles or a folder containing them, orsequencing_summary.txtfiles. - Tools:
        
- MinIONQC: Generate diagnostic plots to explore read quality and sequencing performance.
 - NanoPlot: Visualize distributions (e.g., read length, quality scores) to identify data trends and potential issues.
 
 - Output:
        
- Quality control plots (e.g., read quality histograms, length distributions).
 - Summary statistics about read counts, mean quality, and sequencing performance.
 
 
 - Input: 
 - Defining Filter Parameters
    Based on your QC results, establish the filtering criteria:
    
- Trimming Read Ends:
        
- Use NanoFilt to trim low-quality bases at read ends.
 
 - Adapter/Barcode Removal:
        
- If adapters or barcodes are present, use tools like Porechop to remove them.
 
 - Filtering Mean Read Quality:
        
- Set a threshold (e.g., minimum Q-score) and filter using NanoFilt.
 
 - Filtering by Read Length:
        
- Use NanoFilt or a custom script to remove reads below or above a specific length threshold.
 
 
 - Trimming Read Ends:
        
 
NanoPlot --fastq <path_to_fastq> --plots hex dotNanoFilt -q 7 --length 1000 <input.fastq> > filtered.fastqThis example sets a minimum Q-score of 7 and a minimum read length of 1000 bp.
ls /group/bin/kaiju*
ls /group/bin/kraken*tree -d /group/db
rm rf ~/nanopore_analysis/