Systems biology research is still in its infancy.
Maturation of the field will proceed as the many challenges that it
faces are addressed and successfully solved. The most
pressing challenges fall roughly into the following four
categories:
Experimental -- pertains to strategies for designing
experiments and collecting reliable data
Technological -- pertains to the development of new
instrumentation for making rapid, highly parallel, inexpensive and
accurate measurements of informational molecules and their
sequence, structure, modifications or processing, localization and
interactions with other components large and small.
Computational -- pertains to the development and
refinement of network theory and effective engineering of
simulation tools, so that descriptive networks can be replaced by
more accurate dynamic models of the system’s molecular
interactions.
Sociological -- pertains to effective communication
across disciplines, the dynamics of research teams, difficulties
obtaining funding, and the like.
Experimental challenges: Paradoxically, systems biology
research suffers from having both too much data and not
enough. The volumes of data generated, for example, by large-scale
microarrays and yeast two-hybrid experiments present two types of
challenges:
1) Data space is infinite and to gather
relevant information for a particular system, hypotheses must be
carefully formulated to search relevant data space. 2) Most
large data sets are replete with technical and biological noise, or
various types of systematic error. Technical noise arises from
methodological irreproducibility or sample variation. Biological
noise (i.e., the “false positive problem) stems from the
stochasticity of cell populations and the extreme sensitivity of
some of the techniques employed. The
integration of global data sets from different laboratories may represent a serious
challenge because of different approaches to the same data
measurement and different levels of data error. Thus,
data retrieved from outside sources, though potentially useful, may
not be integratable to one’s own data, nor may it be directly
pertinent to the system being explored. Hypotheses must be
carefully formulated to search relevant data space. Metrics
are needed to evaluate, validate and integrate these large
datasets, else conclusions drawn from them may be misleading.
It is anticipated that universally applicable standards and
attention to data quality will increase the information content of
high-throughput datasets. However, extracting relevant and
meaningful conclusions from the data will require application of
the systems biology cycle--hypothesis formation, experimental
design, modeling data into interaction networks, successive
refinements of models through perturbation studies, etc.
On the other hand, biology is still “data-poor.”
This is obviously true if data space is infinite. The
available experimental data for each species addresses only a very
sparse patchwork of cellular processes under a limited number of
experimental conditions. The current platforms for high-throughput
analyses are expensive and inaccessible to many researchers. In
some cases, experimentation is limited by sample availability, as
is true for many medical applications. Some experimental
methods will not detect a system’s rare elements or transient
interactions, with the result that important data will be missed
(i.e., the “false negative” problem). Moreover, only
one type of measurement is typically performed within an
experimental platform, whereas what one really wants is the
simultaneous measurement of multiple parameters in the same sample
at the same time. To this end, minaturization technologies such as
nanosensors and microfluidics devices are being invented for the
purpose of reducing sample volume, increasing sample throughput,
increasing measurement accuracy and multiplexing different types of
measurements on the same sample, perhaps even on a single cell!
Systems biology also suffers from the criticism that has
historically beset biochemistry, namely, that for most data
collection strategies, the cell must be destroyed in order for
interactions between its molecular components to be assessed, and
this calls into question whether the inferences drawn are indeed
accurate reflections of the dynamics of the living cell. This
difficulty is being partially addressed by advances in less
invasive technologies such as molecular imaging. As a general
point, to effectively investigate the “in between”
domain of parts and wholes, that is, the molecular interactions and
networks, effective reporter assays must be developed so that
system perturbations can be selectively targeted to specific cells
in a population or specific networks in a cell; and effects
accurately and quantitatively measured as dynamic changes that
occur over both space and time.
Technical challenges: A requirement of systems biology is
that to specify systems, millions of measurements must be
made. These include measurements to characterize mRNAs,
proteins, small molecules and other cellular components.
These measurements must identify, quantify, characterize component variations (e.g., mRNA splicing or protein processing or
modification, localize, measure turnover rates. The need
to minaturize, parallelize, automate and integrate the separate
components of procedures, as well as to increase the throughput and
reduce the cost of measurements, pushes us toward the development
of measurement tools employing microfluidic and nanotechnology
tools. These will lead to the digitalization of
biology— the ability to obtain information from single
molecules or the information content of single cells. We must
also develop more powerful and sensitive molecular imaging
techniques to characterize molecular behavior in vivo.
These challenges are beginning to be met by the
NanoSystems Biology Alliance.
Computational challenges: A goal of systems biology is to
formulate initial working models for the biological networks that
are predictive of both the kinetic and equilibrium behavior of the
system in question. This is especially true for regulatory
networks, which involve macromolecular complexes of
transcription factors and batteries of genes which are expressed or silenced
based on the DNA-protein binding interactions between the
gene’s cis-regulatory elements and components of the
transcription factor complex.
Because of the complexity of the feedback loops involved, it may not be clear which biomolecular
species to measure and to what accuracy. This problem might
be overcome if we could obtain a more clear understanding of which DNA
sequence motifs function as regulatory elements, and if we could
dissect large regulatory networks into smaller subnetworks or
modules by using various models of connectivity such as GO
annotations.
All-by-all genome comparisons of species of
varying evolutionary relatedness might assist with the
identification of non-coding sequences that are important to gene
regulation. Advances in network topology theory and
visualization tools might enable biologists to assemble data into
network models that better portray the kinetics of molecular
interactions across diverse types of elements. Simulations
based on these models should enhance and refine experimental
design, thereby speeding up application of the systems biology
cycle. Unfortunately, we are just at the beginning in
terms of the data that needs to be collected and the
algorithm development that must occur. The ISB has formulated 11
computational or mathematical challenges in contemporary biology:
- How to fully decipher the
(digital) information content of the genome
- How to do all-vs-all
comparisons of 1000s of genomes
- How to extract protein
and gene regulatory networks from 1 & 2
- How to integrate multiple
high-throughout data types dependably
- How to visualize
& explore large-scale, multi-dimensional data
- How to convert static
network maps into dynamic mathematical models
- How to predict protein
function ab initio
- How to identify
signatures for cellular states (e.g. healthy vs. diseased)
- How to build hierarchical
models across multiple scales of time & space
- How to reduce complex
multi-dimensional models to underlying principles
- Text searching to bring
the literature and experimental
data together
Sociological challenges: How do you get biologists and
engineers with expertise in experimental design and execution to
join forces with computer scientists and mathematicians with
expertise in algorithm development, and have everyone agree on the
choice of model system, biological process, and strategies for
investigating it? For systems biology research to be effective,
that’s what has to happen.
First, it is helpful to have all of the team members in one location, rather than spread across
several academic departments.
Second, interactions among team members will be more or less productive, depending on the amount of
cross-disciplinary training and experience possessed by the team
members; their willingness to learn new languages of science
(e.g., non-biologists learning the language of biology) and sets of
concepts; and the respect each has for the knowledge and experience
of the others.
Experimentalists want to be seen as more than
“technicians” and computational biologists as more than just
“programmers.” Thus, the organizational culture must be
developed in a way that communication works well within the
teams.
Team-oriented science poses predictable difficulties in terms of
ownership of data, proper attribution/credit in terms of journal
authorship, and career advancement for the individual
members. For people to be motivated to invest significant
energy into systems biology research, there must be effective
metrics for success that reward individual team members
accordingly.
Because systems biology is new and considered risky, funding the
research through federal grants is more difficult, given the
conservative temperament of many study sections. The problem is
exacerbated by the high costs involved with global data collection
and analysis. A viable project might well exceed the limits
of a standard R01 NIH research grant. Thus, sustainable
sources of funding must be identified and procured.
Scientists pursuing systems biology research are meeting these
challenges by establishing their own institutes or organizations.
There is a “self selection” for interested
team-oriented investigators who want to practice biology in a
cross-disciplinary or inter-disciplinary context. Such
organizations poise themselves midway between academia and
industry. They keep the freedom of relatively unconstrained
intellectual pursuit and long-terms goals, yet establish
implementation strategies closer to what one might find in
industry (i.e., interdisciplinary teams and centralized
high-throughput facilities.
Even after the field of systems biology is established,
fulfilling its promise will require that changes occur across a
broader context of inquiry and training. For example, primary
care physicians will need to become more sophisticated about
genetics and its effects on preventive, predictive and personalized
medicine.
|