Feb 8, 2007: Analyzing Protein-Protein Interaction Networks: A Case for Ensemble Clustering
Filed in: Colloquium
Prof. Srinivasan Parthasarathy, Ohio State University
In this talk I will describe our recent work on analyzing protein protein interaction networks. The objective is to find clusters of proteins that perhaps have common functionality. Such clusters can then be used to potentially identify novel functions of un-annotated proteins. A key challenge is that the graphs embedding such interactions have interesting topological and structurable properties that makes it less than amenable to standard clustering or graph partitioning approaches. A further complication is that it is also believed that the current state of knowledge about such graphs is incomplete in the sense that many of the interactions currently reported in the literature are believed to be false. Again clustering methods that are sensitive to noise and outliers do not work well in this context.
In this talk I will begin by describing recent solutions that we have developed in this context as they apply to the protein-protein interaction network of yeast. These include: i) some simple preprocessing steps to identify, detect and eliminate potential false positives, ii) hub duplication to alleviate the impact of hub nodes on graph clustering or partioning and to enable soft clustering of proteins and their interactions based on dense sub-units of the graph, and finally iii) ensemble or consensus clustering using different topological measures to improve the robustness of the algorithm to noise. The last solution (of the three) will be the focal point of this talk and I will demonstrate why this approach shows great promise and how using it one can significantly improve the quality of the resulting clustering as evaluated by a range of statistical, information theoretic and domain specific quality measures.
This is joint work with my graduate students Sitaram Asur and Duygu Ucar.
Srinivasan Parthasarathy is an Associate professor in the Computer Science and Engineering Department at the Ohio State University (OSU). He heads the data mining research laboratory and has a joint appointment in the department of biomedical informatics at OSU. He is a recipient of an NSF CAREER award, a DOE Early Career Award, and an Ameritech Faculty fellowship. His papers have received several awards including an IEEE Data Mining 2002 best paper, a SIAM Data Mining 2003 best paper, the VLDB 2005 best paper and a "Best of SIAM Data Mining 2005" selection. He is on the editorial board of IEEE Intelligent Systems and is currently serving as Program Chair for the SIAM International
Conference on Data Mining in 2007.
Abstract:
In this talk I will describe our recent work on analyzing protein protein interaction networks. The objective is to find clusters of proteins that perhaps have common functionality. Such clusters can then be used to potentially identify novel functions of un-annotated proteins. A key challenge is that the graphs embedding such interactions have interesting topological and structurable properties that makes it less than amenable to standard clustering or graph partitioning approaches. A further complication is that it is also believed that the current state of knowledge about such graphs is incomplete in the sense that many of the interactions currently reported in the literature are believed to be false. Again clustering methods that are sensitive to noise and outliers do not work well in this context.
In this talk I will begin by describing recent solutions that we have developed in this context as they apply to the protein-protein interaction network of yeast. These include: i) some simple preprocessing steps to identify, detect and eliminate potential false positives, ii) hub duplication to alleviate the impact of hub nodes on graph clustering or partioning and to enable soft clustering of proteins and their interactions based on dense sub-units of the graph, and finally iii) ensemble or consensus clustering using different topological measures to improve the robustness of the algorithm to noise. The last solution (of the three) will be the focal point of this talk and I will demonstrate why this approach shows great promise and how using it one can significantly improve the quality of the resulting clustering as evaluated by a range of statistical, information theoretic and domain specific quality measures.
This is joint work with my graduate students Sitaram Asur and Duygu Ucar.
Bio:
Srinivasan Parthasarathy is an Associate professor in the Computer Science and Engineering Department at the Ohio State University (OSU). He heads the data mining research laboratory and has a joint appointment in the department of biomedical informatics at OSU. He is a recipient of an NSF CAREER award, a DOE Early Career Award, and an Ameritech Faculty fellowship. His papers have received several awards including an IEEE Data Mining 2002 best paper, a SIAM Data Mining 2003 best paper, the VLDB 2005 best paper and a "Best of SIAM Data Mining 2005" selection. He is on the editorial board of IEEE Intelligent Systems and is currently serving as Program Chair for the SIAM International
Conference on Data Mining in 2007.