Spacing is however not proportional and allele candidates of the same length are not stacked on top of each
other, but rather side-by-side. A green bar is given to sequences that are present in the database, a red bar when not. The vertically adjustable gray transparent zone determines the threshold for which allele candidate bars with a lower abundance will not be withheld in the final profile. By default, it is set to 10%. Note that sequences with an abundance threshold lower than 0.5% (configurable) are already filtered during the analysis. When hovering over a bar, a detailed block of information is displayed for that allele candidate. An example is shown in Fig. 4. This information can be used to examine if the underlying high throughput screening compounds sequence of the bar is either a true allele or erroneous sequence (stutter, sequencing- or PCR error). The title bar of the information block shows the locus name, and the database name of the allele candidate. When the allele is not present in the database, ‘NA’ together with Selleck PD0332991 the number of repeats relative to known alleles is shown between brackets. Locus statistics are summarized in the left column: • ‘Total reads’ stands
for all reads that are classified under the locus. Statistics for the current allele candidate are in the right column: • ‘Index’ is a unique reference index label assigned to each filtered unique sequence, starting at ‘1’ with the shortest sequence for this locus in the analysis. When two sequences have the same length, the smaller index number is assigned randomly.
The bottom part of the information block shows the region of interest of the allele candidate sequence together with related sequences from the same locus. Related sequences with up to two differences are shown; a difference being either one repeat number difference or one base pair difference. One difference is indicated by a relation degree “Ist” and two differences by “IInd”. Fig. 4 shows the two information blocks of the two true alleles from locus D8S1179 in an interesting example that shows the advantage of MPS over CE. For 9947A, CE results show only one peak at locus D8S1179, resulting in a profile with a homozygous allele 13 for D8S1179. Our analysis clearly shows two alleles that have the Afatinib cell line same length (corresponding to allele 13), but have a different intra-STR sequence when compared to each other. The information blocks support this heterozygous call; only a small portion of the reads are filtered for this locus, the number of unique reads are low and the abundance of the two allele candidates is approximately 50%. The percentage of clean flanks [9] in the candidate alleles sequences is also very high. All these parameters indicate that the sequencing and PCR error rate is low. In the part of the information blocks that shows the related sequences, the G ↔ A difference between the two alleles is shown. The two alleles are related to each other by a “Ist” order degree.