Run =================================== .. toctree:: :maxdepth: 2 :caption: Contents: Creating a genome STR index (optional) -------------------------------------- Creates a bed file of large STR regions in the reference genome. This step is performed automatically as part of `strling extract`. However, when running multiple samples, it is more efficient to do it once, then pass the file to `strling extract` using the `-g` option. .. code-block:: bash strling index $reference_fasta Single sample ------------- **Extract informative pairs to a binary format** .. code-block:: bash strling extract -f $reference_fasta /path/to/$sample.cram $sample.bin Output file: $sample.bin - a binary file describing STR-containing reads **Call strs on the extract binary data** .. code-block:: bash mkdir -p str-results/ strling call --output-prefix str-results/$sample -f $reference_fasta /path/to/$sample.cram $sample.bin Output files: $sample-bounds - STR loci interrogated in that sample $sample-genotype.txt - Locus size estimates and other per-locus information (see :ref:`outputs`) $sample-unplaced.txt - Counts of unplaced STR reads that could not be assigned to a specific locus Joint calling ------------- **Extract informative read pairs to a binary format for each sample** This step is the same as above, you can use the same bin files. .. code-block:: bash strling extract -f $reference_fasta /path/to/$sample2.cram $sample1.bin strling extract -f $reference_fasta /path/to/$sample2.cram $sample2.bin Output file(s): $sample1.bin, $sample2.bin, ... - binary files describing STR-containing reads **Joint call str loci across all samples** Requires minimum read evidence from at least one sample. .. code-block:: bash mkdir -p str-results/ strling merge --output-prefix str-results/joint -f $reference_fasta $sample1.bin $sample2.bin Output file: joint-bounds.txt - positions of STR loci found by combining across all individuals, used for the call stage when joint calling Merging can be performed by chromosome to reduce memory requirements and parallelize using `--chromosome`. See the workflows for examples. **Call genotypes/estimate allele sizes for all loci in each sample** .. code-block:: bash strling call --output-prefix str-results/$sample1 -b str-results/joint-bounds.txt -f $reference_fasta /path/to/$sample1.cram $sample1.bin strling call --output-prefix str-results/$sample2 -b str-results/joint-bounds.txt -f $reference_fasta /path/to/$sample2.cram $sample2.bin Output files as above. **Find outliers** Finds loci that are expanded in one individual relative to other individuals in the joint called cohort. .. code-block:: bash strling-outliers.py --genotypes *-genotype.txt --unplaced *-unplaced.txt Output files: STRs.tsv - a single file with all loci in all samples and their outlier p-values (see :ref:`outputs`) $sample.STRs.tsv - the same data, filtered to a single individual