- Scientists & Leadership
- ISB Research
- Education & Outreach
Dr. Bonneau is a faculty member at New York University's Center for Comparative Functional Genomics, where he is a joint member of both the Biology and Computer Science departments. His affiliate position with the Institute for Systems Biology offers many exciting possibilities for exchange of ideas between these two centers, both focused on applying functional genomics to a wide range of systems. His research efforts are focused on making computational tools, algorithms and methods for systems-wide elucidation of biological systems. His research aims to develop computational methods at the intersection of two interrelated fields: protein structure and functional genomics. Rich is also currently the technical lead on two grid computing collaborations with IBM -- the first and second phases of the Human Proteome Folding Project.
Dr. Bonneau did his doctoral work at the University of Washington in Seattle working with David Baker on the state of the art protein structure modeling and prediction platform, Rosetta. Rich is currently a member of the Rosetta commons and continues to develop and work with Rosetta as part of several projects in the lab. Before joining NYU he worked as a Senior Scientist -- with Leroy Hood -- at the Institute for Systems Biology. Rich also oversees TACITUS's (www.tacitus.com) approach to data gaming for all applications that focus on genomics, computational biology and cell biology.
Areas of Research:
Rosetta de novo structure prediction: extracting function from de novo structure predictions.
Recent progress in de novo structure prediction methods has resulted in methods with increased accuracy that are applicable to greater numbers of proteins. When combined intelligently with other structure prediction methods, de novo structure prediction can contribute to systems biology in several ways. While still highly experimental such applications include 1) structural annotation on a genome wide scale and 2) synergy with experimental approaches to structural genomics such as the derivation of distance constraints from mass spectroscopy. I will describe the underlying methodologies common to current de novo prediction methods, focusing on core concepts rather than specific implementations, groups or methods. Possible applications of de novo structure prediction will also be reviewed. For more information on our latest results on 80 complete genomes, including many model organisms being actively studied at NYU click here. This work is being carried out in collaboration with David Baker and IBM WCGrid. WCGrid Message Board Posts by Dr. Bonneau, Description of HPF Project. Results from the Human Proteome Folding Project can be found at the Public Data Repository associated with the Yeast Resource Center.
Reference: Malmstrom L, Riffle M, Strauss CEM, Chivian D, Davis TN, et al. (2007) Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology. PLoS Biol 5(4): e76. doi:10.1371/journal.pbio.0050076. Download PDF >>
Regulatory network inference: learning the structure of biological circuits from data:
We have developed a methodology for deriving transcriptional regulatory interactions on a genome-wide scale, and have applied the method to predict a large portion of the gene regulatory network of the archaea, Halobacterium NRC-1. The learned network is predictive, learned entirely from data de novo, and was used to successfully predict the global expression of Halobacterium under novel perturbations (not part of the original training set) with predictive power similar to that seen over the training set. Methodological advancements over earlier work include an explicit treatment of time such that the network model can be fit using both steady-state measurements and heterogeneous time series simultaneously. The method contains a novel means for learning binary logic interactions between regulators that requires no discretization of data. This work was done in tight collaboration with Nitin Baliga, Vestienn Thorsson , David Reiss and Lee Hood at the ISB. There are many interesting future directions on this project including: adding tighter control via user defined constraints and addition of known interactions to the current framework, use of proteomic and metabolomic data, using the networks for engineering and applying the method to subsequently more and more complex organisms. This program, the Inferelator, is a companion program to cMonkey.
cMonkey: systems biology (bi)clustering using multiple relevant data-types.
Grouping genes into functionally related and putatively co-regulated clusters is an essential first step for the inference of regulatory networks (one could think of many reasons for doing this but network inference is a good problem to start with). It is widely known that regulatory relationships among genes can vary under diverse environmental settings, and that co-expressed genes are trivially under control of the same regulator(s). This leads to patterns of co-expression that are valid under some, not all, observed conditions.
With such considerations in mind, we have developed cMonkey, an unsupervised learning procedure for detecting putatively co-regulated gene clusters by integrating diverse systems biology data including: (1) mRNA and/or protein expression levels, (2) cis-regulatory sequences, and (3) functional association and physical interaction networks. The method determines, for each cluster detected, the significant (a) subset of experimental conditions under which its genes are co-expressed, (b) cis-regulatory sequence motifs that putatively mediate their regulation, and [c] highly-connected subnetworks in association networks that provide supporting evidence that the genes are functionally related.
We have performed this analysis on publicly-available data sets of widely varying size and quality, for four distantly related organisms covering all three domains of life: Helicobacter pylori, Escherichia coli, Saccharomyces cerevisiae, and Halobacterium NRC-1. In each of these organisms, cMonkey has enabled the discovery of both known regulons and novel putative regulons. When regulons correspond to previously characterized TF binding sites we see good agreement with the motifs detected by cMonky. Gene clusters detected using cMonkey provide a firm foundation for inferring genetic regulatory networks, for assigning putative functions for genes of unknown function.
This work is being carried out in collaboration with Nitin Baliga and David Reiss here at the ISB. We have performed this analysis on several organisms, but a clear challenge and the heart of future development lies in applying this method and adapting it to larger organisms such as some of the multicellular systems being studied at NYU or human as well as consortia of prokaryotic organisms, in practice this means we've got a lot of exciting work cut out for the near future.
Here are links to software that we are developing, for code developed at NYU or ISB the code is freely available, for codes developed as part of larger multi-institution efforts we provide links to those efforts. In general we are an open source shop.
cMonkey: Learns significant clusters, control elements and subnetworks from diverse systems-biology data. Written in R, free after publication.
Inferelator: Learns parsimonious regulatory networks from systems biology datasets. Companion to cMonkey. Written in R, free after publication.
Rosetta: a state of the art structure prediction and way more (licensed via the Rosetta commons)
Robetta: web server for rosetta structure prediction and Ginzu protein-domain parsing (submit a few of your favorite unknown proteins here).
BionetBuilder: build any network for any organism. The tool helps you build your network via a graphical user interface, then displays the network in cytoscape.
Cytoscape: A systems biology data-integration and viz platform. Initially developed to map expression data onto biological networks.
The Gaggle: Data-integration platform (developed by Paul Shannon). A way to manage many different data-types and views as a mutli-threaded gaggle controlled by the gaggle-boss.
Yeast-Resource center fold/fnx database: Rosetta and Ginzu predictions for Yeast genomes, and a whole lot more. This was done in collaboration with Trisha Davis, David Baker, Lars Malmstroem, Mike Riffle (all at UW). [available soon]
Human Proteome Folding Results: Rosetta and Ginzu predictions for over 90 complete genomes, and a whole lot more. Fold and function predictions generated on the WCGrid. This was done in collaboration with Trisha Davis, David Baker, Lars Malmstroem, Mike Riffle (all at UW). [availiable soon]
Teaching and Outreach Experience:
New Course Development, NYU dept. of Computer Science, dept. of Biology:
Designed and taught Bioninformatics and Genomics, a graduate introduction to bioinformatics in a systems biology context.. Team-taught BioCore (a new course introducing our graduate students to functional genomics and modern biology).
Professional Development for high hchool teachers 2007 w/ Steinhart school of education, NYU. This course aims to provide high school teachers with the very latest in the genomics revolution, and provides teachers with a inquiry-based unit on deciphering genomes designed for the classroom. (NYU Steinhardt)
Halobacterium Curriculum Development, 2002-2004. As part of ISB's ongoing commitment to revolutionizing high-school biology curriculum I have participated in our summer program aimed at creating an inquiry based high-school systems-biology curriculum by working with high school interns and local teachers.
High School Course Design, 1997-1998. NOVA High, Seattle. Designed and taught a high school biochemistry curriculum for students with little or no science education ages 14-22. See www.novaproj.org
Key collaborations within ISB:
Baliga Lab: Functional genomics of prokaryotes, computational biology, data visualization.
Hood Lab: Structure based annotation of Disease Biomarkers, grid computing.
Aderem Lab: Structure Based annotation of immune proteins.
Key collaborations outside ISB:
Dave Goodlett and Lars Malmstrom: University of Washington, annotation and functional genomics of Gram Negative pathogens.
IBM, World Community Grid: wcgrid.org
Iliana Avila-Campillo, Kevin Drew , John Lin, David J. Reiss, Richard Bonneau. BioNetBuilder, an automatic network interface. Bioinformatics. (2007) Bioinformatics. Feb 1;23(3):392-3. Epub 2006 Nov 30
Malmström L, Riffle M., Strauss CEM, Chivian, D, Davis TN., Bonneau R.3 and Baker D. Genome-wide superfamily assignments for Saccharomyces cerevisiae protein domains through integration of de novo structure prediction with the gene ontology. (2007) PLoS Biol 5(4): e76 doi:10.1371/journal.pbio.0050076
Andersen-Nissen E, Smith KD, Bonneau R, Strong RK, Aderem A. A conserved surface on Toll-like receptor 5 recognizes bacterial flagellin. (2007) J Exp Med. 2007 Feb 19;204(2):393-403. Epub 2007 Feb 5.
Madar, A., Bonneau, R. Learning global models of transcriptional regulatory networks from data. Chapter in Computational Systems Biology (2007). Humana Press, [In press]
Bonneau, R. (2007) De Novo Structure Prediction: methods and applications. Chapter 12 in Bioinformatics: From genomes to therapies. Bioinformatics- from Genomes to Therapies. (2007) Wiley-VCH. ISBN-10: 3-527-31278-1
Bonneau R, Reiss DJ, Shannon P, Hood L, Baliga NS, Thorsson V (2006) The Inferelator: a procedure for learning parsimonious regulatory networks from systems-biology data-sets de novo . Genome Biol. 7(5):R36.
David J Reiss, Nitin S Baliga, Bonneau R. (2006) Integrated biclustering of heterogeneous genome-wide datasets. BMC Bioinformatics. 7(1):280.
Shannon P, Reiss DJ, Bonneau R, Baliga NS (2006) The Gaggle: A system for integrating bioinformatics and computational biology software and data sources. BMC Bioinformatics. 7:176.
Flory MR, Lee H, Bonneau R, Mallick P, Serikawa K, Goodlett D, Morris D, Aebersold R. (2006) Quantitative proteomic analysis of the budding yeast cell cycle using acid-cleavable isotopecoded affinity tag reagents. Proteomics 2006 Dec;6(23):6146-57
Zhang, H., Loriaux, P., Eng, J., Bonneau, R., Smith, R & Aebersold, R. UniPep, a database for human N-linked glycosites: A resource for biomarker discovery. Genome Biology, [In Press]