Most virus detection methods are geared towards the detection of specific single viruses or just a few known targets, and lack the capability to uncover the novel viruses that cause emerging viral infections. To address this issue, we developed a computational method that employs a panel of conserved sequence probes to directly classify emerging or unsequenced viruses at the genus level, and specific probes for known virus identification. Moreover, using the two types of probes to determine the identity of a virus in a complementary way further assures detection accuracy.
More than 19,600 full-length viral genomes covering 53 viral families and 214 genera were obtained from ~340,000 viral sequences archived in the GenBank viral database (Release 152). The genome segments of the viruses with segmented or multipartite genomes were treated as individual viral sequences. Additional partially sequenced viral genomes were included for computing the conserved genus probes for genera with ≤ 5 fully sequenced members and viruses with segmented or multipartite genomes, which made up a total number of 27,610 viral genome sequences for the computation. Viral genome sequences with more than 99.5% similarity were treated as redundant, and only the longest sequences were retained. Finally, the genera and species-specific probes for more than 5,700 viruses are archived in the viral probe database.