Outputs¶

The main output file is prefix-genotype.txt. It reports all STR expansion loci that pass thresholds as well as any provided as input. The columns are:

chrom: chromosome/contig name
left: predicted left boundary of STR locus
right: predicted right boundary of STR locus
repeatunit: predicted STR repeat unit
allele1_est: estimated size of the shorter allele in repeat units relative to the reference, from spanning reads (if any). “na” indicates no reads support an allele shorter than the read length, so both may be large.
allele2_est: estimated size of the larger allele in repeat units relative to the reference, from anchored reads
anchored_reads: number of reads with evidence of expansion, which are anchored by a well aligned mate
spanning_reads: number of reads that span the locus
spanning_pairs: number of read pairs that span the locus
expected_spanning_pairs: number of read pairs expected to span the locus in the absence of an expansion, given local read depth
spanning_pairs_pctl: (1 + obs - exp) / (exp + 1) spanning read pairs as a percentile. Values range between 0 and 1, with smaller values greater confidence of a large expansion (especially homozygous) relative to other loci in that sample.
left_clips: number of soft-clipped reads supporting the left side of the locus position
right_clips: number of soft-clipped reads supporting the right side of the locus position
unplaced_pairs: number of unplaced STR reads assigned to this locus (will only be >0 for a uniquely expanded repeat unit)
depth: local median depth around the locus
sum_str_counts: the sum of STR repeat units in all reads assigned to that locus

Some additional outputs are provided with detailed supporting evidence used to make the genotype calls:

Putative str bounds: prefix-bounds.txt
Counts of str-like reads that are unplaced (could not be assigned to a locus): prefix-unplaced.txt

Only output when compiled with -d:debug:

All str-like reads: prefix-reads.txt
Spanning reads and spanning pairs:prefix-spanning.txt

The main output for the strling-outliers script is STRs.tsv. The columns are the same as above, except where specified:

chrom
left
right
locus: locus unique identifier in the form chrom-left-right-repeatunit
sample
repeatunit
allele1_est
allele2_est
spanning_reads
spanning_pairs
left_clips
right_clips
unplaced_pairs
sum_str_counts
sum_str_log: log2 of depth normalized sum_str_counts
depth
outlier: z score testing for outliers
p: p value, is this locus significantly expanded relative to other samples?
p_adj: p value adjusted for multiple testing per sample using the Benjamini-Hochberg method

Note: p = 0.0 should be interpreted as p < 10e-310

Outputs¶

STRling

Navigation

Related Topics