A gene cluster buy GSK126 was scored by considering the sum of log likelihood-based edges between all genes within the cluster. Such a scoring scheme is conceptually equivalent to calculating the expected likelihood that all genes within the cluster will participate in the same genetic
phenotype. To account for the fact that functional interactions between genes in a cluster are not independent, we employed a previously developed de-weighting heuristic (Lee et al., 2004) described in the Experimental Procedures; similar results were obtained with or without the deweighting procedure (see Table S1). To calculate the p value for the resulting clusters, random events were generated with the same gene count or alternatively with the same genomic length, as in the observed de novo CNV dataset. The greedy algorithm was then applied to search for high-scoring clusters formed by genes from these random events. p values were assigned to clusters based on the distribution of scores in the randomized data clusters (see Experimental Selleckchem VX770 Procedures). We and others have previously used various network-based methods to analyze genetic data from rare and common diseases (Feldman et al., 2008, Franke et al., 2006, Iossifov et al., 2008,
Iossifov et al., 2009, Lango Allen et al., 2010 and Raychaudhuri et al., 2009). NETBAG differs from the previous approaches in several important ways. Specifically, the underlying weighted network does not represent a molecular interaction network or a set of predefined functional pathways, but instead the prior likelihood that any pair of human genes is involved in the same genetic phenotype. NETBAG then defines a formal procedure for identifying strongly connected clusters among a large set of genetically perturbed genes and evaluating the genome-wide cluster significance. The relative importance of specific genes
forming a cluster is then evaluated based on the contribution of genes to the overall cluster score. We are currently working on making the NETBAG method available as a web server; in the meantime, we will be happy to share the developed Amisulpride methodology with any interested parties. The NETBAG approach was directly applied to the experimental CNV dataset described in the companion paper by Levy et al. (2011; this issue of Neuron). This set contained 75 rare de novo CNVs encompassing 746 unique human genes. For our analysis, we combined all overlapping events into a single region and removed all events that did not intersect any genes; we also removed six very large CNV events (length >5 mb). As a result, the final set used for our analysis contained 47 CNV regions from affected individuals intersecting 433 genes. In addition, Levy et al.