MUSIAL (MUlti Sample varIant AnaLysis) is a Java command-line tool to analyze large sets of VCF files with prokaryotic single nucleotide variants (SNVs) and insertions/deletions (indels). It provides an interface for generating comprehensive statistics and alignments, as well as assessing variability at genome, gene and protein levels.
- Integrates SnpEff and other Sequence Ontology compliant annotations to help interpret variants.
- Projection to genomic features (genes) facilitates allele- and proteoform-specific information that supports the characterization of individual samples.
- VCF based sequence reconstruction at nucleotide and protein sequence level and tabular reports on sample, feature and variant statistics.
An executable jar file (Java 21) is available from the Releases section.
MUSIAL operates on a modular, task-based architecture that is primarily initiated by the build task, which creates a JSON file (storage) as its primary output; this is then used as input for all other tasks.
Details on the use of the software and tutorials can be found in the repository Wiki. The general CLI usage is java -jar MUSIAL-v2.4.2.jar <task>, whereby the following tasks are available:
build - Build a local database file (storage) in JSON format from variant calls; the mandatory input for other tasks.
Command line arguments of task build
-C,--configuration <arg> Path to a JSON file specifying the build task parameter configuration for MUSIAL. Visit the documentation for details.
expand - Expand an existing storage file from variant call files and/or meta data.
Command line arguments of task expand
-d,--dry-run Only report on novel entries without writing the updated storage.
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
-m,--vcfMeta <arg> Path to a .tsv or .csv file specifying sample annotations.
-o,--output <arg> Path to write the output file (default: overwrite input file).
-V,--vcfFiles <arg> List of file or directory paths. All files must be in VCF format.
view - View the content (features, samples or variants; and their attributes) of a MUSIAL storage file.
Command line arguments of task view
-C,--content <arg> The content to view. One of FEATURES, SAMPLES, VARIANTS (case-insensitive).
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
-o,--output <arg> Path to write the output file. If not provided, a default file will be created based on the input file (default). If `print` or
`stdout` is specified, the output will be printed to the console.
-q,--query <arg> One or multiple identifiers or genomic ranges (contig:start-end) to query.
profile - Profile samples with respect to variants, alleles, or proteoforms.
Command line arguments of task profile
-C,--content <arg> The content to view. One of VARIANTS, ALLELES, PROTEOFORMS (case-insensitive).
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
-o,--output <arg> Path to write the output file. If not provided, a default file will be created based on the input file (default). If `print` or
`stdout` is specified, the output will be printed to the console.
-q,--query <arg> One or multiple identifiers or genomic ranges (contig:start-end) to consider.
-x,--reduced Represent entries in a reduced format, i.e., sequence types as numbers with 0 as the reference or synonymous sequence and
variants without detailed call information.
sequence - Generate and write sequence data.
Command line arguments of task sequence
-a,--align Whether to align sequences (optional, default: false).
-c,--content <arg> Whether to generate NUCLEOTIDE or AMINOACID sequences (optional, case-insensitive, default: NUCLEOTIDE).
-f,--split <arg> Whether to split output files by FEATURE, SAMPLE, BOTH, or NONE (optional, case-insensitive, default: FEATURE).
-I,--storage <arg> Path to a .json(.gz) file generated with the build task of MUSIAL.
-l,--locations <arg> One or multiple feature identifiers or genomic ranges (contig:start-end) to generate sequence data of. If none are provided,
all features or full contig ranges will be considered.
-m,--merge Whether to merge identical sequences (optional, default: false).
-o,--output <arg> Path to write the output. If not provided, the directory of the input storage is used. If a directory is provided, files are
created there. If a file is provided, its parent directory is used.
-s,--samples <arg> One or multiple sample identifiers to retrieve sequences for (optional).
-v,--variable Whether to only consider variable positions (optional, default: false).
MUSIAL is also available via a web interface at https://musial-tuevis.cs.uni-tuebingen.de/ currently running version v2.3.10.
MUSIAL v2.4 is built with JDK 21.0.6 and Gradle 9.1.0. If you want to compile the source code, run gradle clean build in the root directory of the project. The JavaDoc of the software is available at https://integrative-transcriptomics.github.io/MUSIAL/javadoc/.
- Detailed information about the software can be found in the repository's Wiki.
- Found an issue or have a feature request? Feel free to Open a GitHub issue.
