







Using our enhanced methods, we are studying the effect of sequence on folding, particularly for WW domains, and trying to extend simulation capabilities to larger molecules on the millisecond timescale. Of particular interest is the effect of flexibility on catalytic activity, the effect of phosphorylation and other post translational modifications in structure and dynamics, and the connection between correlated motions and phylogenetic coevolution. We are collaborating with several experimental labs and theory groups in carrying out these studies.






Nearly 40% of the world's population is threatened by malaria [1]. Every year, malaria infects 350500 million and kills nearly 1 million people, mostly children [2]. Vector control via the usage of insecticides has proven to be an effective means of controlling and eliminating malaria in areas such as southern part of the United States during the early 20th century [3]. Unfortunately, resistance to the currentlyused insecticides is increasing among vectors such as mosquitoes, and very few alternative insecticides that are safe and inexpensive are available [4]. Needless to say, the problem is dire. In an effort to combat malaria, our lab is working with Dr. Frank Collins, Dr. Cate Hill (Purdue), and Dr. Mary Ann McDowell in an interdisciplinary collaboration to design the next generation of safe, inexpensive insecticides using a novel combination of in situ and in silico methods. Only just beginning, we are currently using bioinformatics techniques to identify viable insecticide targets in the proteomes of mosquitoes and other vectors which our collaborators can then verify experimentally. In addition, we will be drawing on our experience in molecular dynamics to perform virtual screening against the identified viable targets and provide a mechanism for measuring the toxicity to humans of potential insecticide compounds. It is our desire to develop general methods that can also be used to develop safe and inexpensive pesticides for other organisms.






The goal of this work, done in collaboration with The Center for Rare and Neglected Diseases at Notre Dame, is to investigate interactions between the parasite and host. In particular, we are looking at the relationship that P. falciparum has with the the \Beta_2 Adrenergic Gproteincoupled receptor. Some type of interaction was shown in 2003 by Harrison et al. [1] but an understanding of this process has proved elusive so far. By building a simulation based model of the GPCR our goal is twofold: 1) better understand the hostparasite interaction and 2) gain deeper insight into the activation mechanism of GPCRs.






Just as proteins are molecular machines, proteins are parts of interacting networks that accomplish tasks such as responding to internal and external signals, perform metabolism, translation, and many other tasks in the cell. We combine different approaches to statistical inference of proteinprotein interaction (PPI) networks, such as the use of error correction algorithms from information theory to clean experimental data and the use of weighted set cover algorithms to extract the most out of experimental data available. We combine these networks with biophysicallyinspired models to study in greater details particular families of interactors of great biophysical interest, such as kinases and GPCRGproteins.





Methodological Development 
 



Long timestep integrators for stiff oscillatory ODEs are found in classical mechanics. For the past decade, my group has produced theory that explains characteristics that good multiscale integrators for long time molecular dynamics (MD) simulations should have, such as preserving the geometric structure of the underlying Hamiltonian equations, as well as the reason why molecular dynamics works even in the presence of chaotic solutions that are overwhelmed by numerical error. We have also explained linear and nonlinear resonances that severely limit the time step possible for multiscale numerical integrators.






Our understanding of the solution of MD equations of motion has allowed us to pursue more aggressive techniques for lengthening the time step and allowing the study of time scales in the millisecond range. In particular, we have postulated the existence of an invariant density of normal modes for proteins, which provides a space that can be used to project the fine grained equations of motion. We have used the MoriZwanzig projection formalism to transform the ODE of Newton's equations of motion to a stochastic differential equation (SDE) that works in low frequency motion space. Our resulting methodology results in a speedup of 2 to 3 orders of magnitude over conventional MD, while preserving long time scales PDF and time correlations (the computational objectives as explained above). Short fine grained simulations that solve a timedependent formulation of the FokkerPlanck PDE allow us to derive the kinetic parameters of our SDE. We have applied our dimensionality reduction technique to the construction of most probable paths between molecular states as well as to dramatically improve the convergence of Monte Carlo Markov Chain (MCMC) methods to sample the PDFs of molecular simulation.






As previously mentioned, our main goal is to compute a network of states that interconvert in the dynamics of a protein, their populations, and the timescales of these transitions. A convenient framework for constructing such a network is to create Markov State Models (MSM) out of many simulations. MSM provide an intuitive framework for understanding the results of simulations, as well as providing many opportunities for analysis. We are studying methods to adaptively construct MSM that preserve detailed balance, applying graph theory to extract most probable paths and other kinetic information, and comparing to Nuclear Magnetic Resonance (NMR) and other experiments for validation.






Just as we have shown that numerical methods for Hamiltonian systems that can be interpreted in a backwarderror way to be solving a perturbed, modified Hamiltonian have much better behavior at long times, we are studying the properties that methods to solve SDE should have. For example, statistical mechanics requires that our coarsegrained equations of motion satisfy fluctuationdissipation in order to have an equilibrium distribution. We are studying the effect of time step and other parameters on kinetics and sampling, with preliminary results suggesting different requirements on the method depending on the application. This has led us to design superior integrators for SDEs and to make progress in the construction of computational error estimators.







Folding@Home is a distributed computing resource for molecular dynamics simulations run by Vijay Pande at Stanford University. Long simulations are broken up into small workunits that run on users computers. The results are then collected and analyzed. The amount of data presents several challenges in terms of maintenance and analysis methodologies: an iterative approach is infeasible, so we are developing a set of tools for the management and analysis that take advantage of as many distributed resources as possible. We are running several experiments on Folding@Home for the WW domain. This domain is comprised of 35 to 40 residues that form into 3 betastrands and is associated with protein signaling processes. The goal of one set of experiments is to better understand how the WW domain folds. In order to do so we are comparing experimental results for a set of engineered WW mutants with the simulation results.






ProtoMol is an objectoriented, component based, framework for molecular dynamics (MD) simulations. Originally designed for prototyping new MD methods, ProtoMol is easily extended and modular due to its objectoriented architecture. We use ProtoMol for testing and validating the methods we've developed such as NML as well as running MD simulations used in our collaborations with the experimental community such as the WW project.






Studying processing such as protein folding and protein dynamics generates large amounts of data: millions of trajectories over multiple projects results in several terabytes of data. In order to get results in a reasonable amount of time the analyses of the data must be done in a parallel fashion. the Protolyze and Prototools projects are our approach to this problem. The goal for Protolyze is to provide a framework to manage the analyses (such as submitting jobs to the Sun Grid Engine, coordinating database access, etc). Prototools is a repository of tools and workflows used to run various analyses (calculating RMSDs, secondary structure, etc).






OpenMM is a library of molecular dynamics (MD) method implementations designed to allow MD simulation packages to take advantage of hardware (GPU) acceleration with minimal effort. We are working with the OpenMM group to make a GPUaccelerated implementation of our NML method available to OpenMM users. We believe that doing so will encourage widespread availability of NML in widelyused MD simulation packages such as GROMACS and adoption by the MD community.






Cytoprophet is a project developed by the Laboratory for Computational Life Sciences at the Computer Science Department of the University of Notre Dame. It is a tool to help researchers to infer new potential protein (PPI) and domain (DDI) interactions. It is implemented as a Cytoscape plugin, where users input a set of proteins and retrieve a network of plausible protein and domain interactions with a score. Three algorithms are used for the estimation of PPI/DDI: Maximum Specificity Set Cover (MSSC) Approach, Maximum Likelihood Estimation (MLE) and the SumProduct Algorithm (SPA) for protein networks. To see more details, refer to the documentation.





