Table S1 summarizes the number of SNPs after quality control and the numbers of cases and controls for each of the datasets. More information on these datasets can be found in the original papers. All the datasets are available from the Eastern Cooperative Oncology Group through requests to the operations office . In addition to the three real datasets, We also use a synthetic dataset with 70 cases and 70 controls, 2172 SNPs without differentiation between the cases and controls, and four synthetic high-order SNP combinations of size 3, 4, 5 and 6 respectively, that are associated with case-control groupings. . Note that, the above four datasets have much larger number of SNPs than the datasets used in previous studies on high-order SNP interactions . With these four datasets, we will show that the proposed framework is substantially more efficient and scalable than existing approaches. Although the proposed approach could not directly handle datasets with more than 10,000 SNPs due to the intrinsic computational complexity of high-order SNP combination search, it is worth noting that tag SNP selection techniques can be used to first obtain a set of less redundant SNPs before the use of the proposed approach. In this way, genome-wide studies with hundreds of thousands of SNPs could also be analyzed. With this binary encoding of a SNP combination, a x2 test of the association between any combination and a binary BKM120 phenotype has a fixed degree of freedom of 1 and is independent of the size of the combination. Here, the goal is to test the association between the present and absent of the SNP combination, under the binary encoding, and a binary phenotype. Note that, other statistical measures can also be used for similar purpose. This also implies that the proposed framework can handle datasets with imbalanced number of cases and controls. The degree of freedom being 1 is an R428 important advantange for high-order SNP combination analysis because most real datasets have a limited number of samples that are insufficient for estimating the association between a combination of larger size and a disease phenotype if the statistical measure in use has a degree of freedom increasing with the size of a combination. The fixed degree of freedom also allow the direct comparison of the statistics of SNP combinations of different sizes, which is important for quantifying the gain of discriminative power of a SNP combination with respect to its subsets. With the two discriminative SNP combinations shown in Figure 1 and the additional examples in Figure S2, we now describe how to leverage the discriminative pattern mining framework to efficiently search for high-order SNP combinations that have strong association with a disease phenotype.