Figure 1.
A graph of length-dependent data loss for quality-based filtering (reads are removed with ≥3% low quality bases (<Q27) over a given length). The green line shows the results of the small dataset 1, and the red line the results of the larger dataset 2. Note the non-uniform loss of data with length. In some instances the fraction of low quality bases is reduced as sequence length increases and therefore more reads pass the 3% low quality threshod. For the larger dataset, length trimming at 225 bases (brown line) balances the desire for using the longest comparable region with the lowest data loss.