چکیده
|
Recent developments in high-throughput sequencing (HTS) technologies and
bioinformatics have drastically changed research in virology, especially for virus
discovery. Indeed, proper monitoring of the viral population requires information on
the different isolates circulating in the studied area. For this purpose, HTS has greatly
facilitated the sequencing of new genomes of detected viruses and their comparison.
However, bioinformatics analyses allowing reconstruction of genome sequences and
detection of single nucleotide polymorphisms (SNPs) can potentially create bias and
has not been widely addressed so far. Therefore, more knowledge is required on the
limitations of predicting SNPs based on HTS-generated sequence samples.
To address this issue, we compared the ability of 14 plant virology laboratories, each
employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic
virus (PepMV) in three samples through large-scale performance testing (PT) using
three artificially designed datasets. To evaluate the impact of bioinformatics analyses,
they were divided into three key steps: reads pre-processing, virus-isolate
identification, and variant calling. Each step was evaluated independently through an
original, PT design including discussion and validation between participants at each
step. Overall, this work underlines key parameters influencing SNPs detection and
proposes recommendations for reliable variant calling for plant viruses.
The identification of the closest reference, mapping parameters and manual
validation of the detection were recognized as the most impactful analysis steps for
the success of the SNPs detections. Strategies to improve the prediction of SNPs are
also discussed.
|