All of these vFams were removed during the filtering step as their low strict recall and lack of specificity for viral sequences make them uninformative and potentially misleading when searching for viral sequences in metagenomic datasets. The vFams built from viral sequences with non-viral homologs are present in vFam-B but are not included in vFam-A. In addition to vFam size and non-viral sequence homology, we examined vFam length as a potential predictor of vFam performance. When Xanthohumol comparing strict recall to the length of the vFam, while the longest vFams did tend to have high strict recall, there appeared to be almost no correlation between vFam length and strict recall for vFams of length less than 600. For vFams of length 600 and greater, 96% had strict recall at least 80%, the filtering threshold employed after cross-validation; for vFams with length less than 600, this number dropped to 83%. Overall, the major contributing factors to higher vFam strict recall were the number of sequences used to build the vFam and the lack of non-viral homologs of the viral sequences used to build the vFam. Many recent viral discovery projects supported by deep sequencing data relied solely on BLAST to identify and classify viral reads. In order to compare the performance of the vFams to BLAST on real data, we tested both approaches on three previously published datasets containing viruses in metagenomic backgrounds. These three datasets contain viruses that were novel discoveries spanning a range of divergence from previously known viruses, allowing us to explore the sensitivity and precision of vFams and BLAST in different contexts. There were several known diarrhea-causing viruses identified in the pool, which were removed for the sake of this analysis, as well as 483 reads deriving from a novel 8.0 kilobase picornavirus called Human klassevirus 1. The closest known relative of klassevirus by amino acid identity at the time of its discovery was Aichi virus with,40% identity across the length of the polyprotein that spans almost the entire genome. Approximately 7 million read translations were Capromorelin tartrate aligned to the viral BLAST database and vFam database, and ranked by BLAST E-value and HMMER3��s domain E-value scores respectively.