Frequently Asked Questions
Q: What is a stochastic simulation?
Stochastic simulation, also known as kinetic Monte Carlo, is a numerical procedure for determining the dynamics of a continuous time Markov process.
A continuous time Markov process is a memoryless stochastic process that is used to describe all sorts of physical and chemical systems, including
biological systems. The solution of a Markov process includes two equivalent parts: a time-dependent probability distribution of states in state
space or trajectories of the state moving in state space over time. By computing an ensemble of trajectories, one can generate the distribution. Or,
by computing the distribution, one can sample it to obtain trajectories. If the state space of the Markov process is discrete, it is called a jump
Markov process and its probability distribution is governed by the Master equation. If the state space is continuous, it is called a continuous
Markov process and its distribution is governed by the Fokker-Planck equation. For all but the most trivial systems, the Master equation is
analytically unsolvable. For large dimensional systems (ie. many chemical species), the Fokker-Planck equation is impractical to solve. Stochastic
simulation is a way to generate trajectories of a Markov process and then to compute the distribution of all possible trajectories, effectively
sidestepping the problems with solving either the Master or Fokker-Planck equations. Of course, for some systems, stochastic simulation can be as
impractical, motivating the usage of hybrid stochastic simulation methods.
Q: How is stochastic simulation useful for describing biological systems?
Biological systems can be described as a system of biochemical reactions, including processes such as metabolism, signal transduction, and
gene expression. The reactions may involve bond-breaking (as in the classic definition of a chemical reaction) or may only involve strong
non-covalent interactions between molecules, such as hydrogen bonding, salt bridges, or hydrophobic 'forces'. Either way, one may measure the
thermodynamics and kinetics of these reactions and quantitatively describe their rates. Traditionally, the dynamics of a system of reactions is
described using ordinary differential equations (ODEs). With ODEs, the trajectory of the state moves deterministically through time to some
attractor (ie. starting from the same initial condition, the trajectory will always be the same and no two trajectories may ever cross). However,
through both analysis of theory and experimental observation, it has been noted that ODEs do not adequately describe the dynamics of biological
systems. Instead, we describe the system as the above-mentioned Markov process. Now, given the same initial condition, different
trajectories of the state will occur and trajectories may cross paths in time. This seemingly simple change results in a large amount of new
phenomena, including populations of cells with differing phenotypes, cells which spontaneously change their phenotype, and oscillations which only
occur because of the 'noise' or stochasticity.
Q: What is a hybrid stochastic simulation method?
It is no surprise that there is a connection between jump Markov, continuous Markov, and deterministic processes. In fact, using theory, one can
state the necessary conditions in order to approximate a jump Markov process as a continuous one and, likewise, a continuous Markov process as a
deterministic one. By measuring the validity of these conditions, while the simulation is running, one can approximate only the
part of the system that meets these conditions. Our recent research shows that, after partitioning the system into multiple different mathematical
representations, one can self-consistently merge these disparate descriptions back together again. Transitions in the jump Markov process can be
converted to differential Jump equations (a type of stochastic differential equation). The effects of reactions described as a continuous Markov
process may also be described using stochastic differential equations. Likewise, the deterministic subset may be described using ODEs. All of these
equations are coupled and may be simultaneously numerically integrated forward in time using a variety of well-characterized stochastic numerical
integrators. It is this last part that separates our hybrid methods from others. We can characterize the error of the solution using the theory of
Wiener-driven SDEs and their numerical solution. Other hybrid methods sometimes rely on quite heuristic approximations which we can adeptly avoid.
Perhaps most importantly, these hybrid stochastic simulation methods can be orders of magnitude faster than the original
stochastic simulation algorithm. The exact speed up depends on the number of reactions classified as either jump Markov, continuous Markov, or
deterministic processes.
Q: Why is the focus on
using supercomputers?
Even when using hybrid stochastic
algorithms, simulations of realistic (ie. very large) biochemical
networks require serious computing power. In order to compute an
accurate probability distribution of the system dynamics, numerous
independent trajectories must be computed. A single trajectory of the
system dynamics may be computed in a short period of time (milliseconds
to minutes), but an accurate distribution requires at least 5,000
trajectories. Furthermore, it's highly desirable to simulate multiple
networks, each time changing one or more parameters, to determine
either the effect of that parameter on the network (sensitivity analysis)
or find a desired behavior (global optimization). Supercomputers are
quickly becoming common tools to the research community. We believe
their user-friendliness and widespread usage will only increase as in silico design yields more and
more fruitful results. Hy3S is designed to be used on cluster-based
supercomputers (not necessarily vector processors) with the
accompanying network and storage infrastructure, but the code will
function well enough on a fast personal computer.
Q: Why does Hy3S use the NetCDF data format?
The
NetCDF
library and format is specifically designed to store huge amounts of
data in a platform-independent, self-describing optimized binary file.
Simulations of large networks produce an incredible amount of solution
data, requiring a data format that not only efficiently stores the
data, but also quickly retrieves it for analysis. NetCDF files may be
read and write in direct access mode, enabling access to hyperslabs of
non-contiguous blocks of data and reducing the memory requirements for
analysis. For example, even though a solution data file may contain
multiple trajectories of a system of multiple chemical species over
many time points, it is easy to read in only the data sampled from a
specific timepoint or about a certain chemical species. Instead of
reading in gigabytes of data, you can pick and choose which data to
read requiring much less time and using much less memory. The data may
be read from any operating system or architecture that has a C or
Fortran77/90/95 compiler. The API is available in additional
programming languages, including C++, Perl, Python, Java, MATLAB, and
Ruby.
SBML, or Systems Biology Markup Language, is
a standard XML format for describing biochemical networks. Hy3S will eventually
support the import and export of SBML files. We are waiting for the SBO classification terms in SBML v3.
Q: How is Hy3S different from other stochastic simulators?
To the best of our knowledge, this is the only stochastic simulation software that offers
- a fast hybrid algorithm,
- an easy to use GUI (usable on Windows, Linux, or Mac),
- parallelized code (MPI / Fortran95),
- useful features, such as cell replication events,
gamma-distributed events, system perturbations, parameter variation,
and initial condition variation
- the GPL license (LGPL is also available if you ask)
- developers who actually design biological systems themselves and
who will continue to add features useful to their own research
Q: Where is development of Hy3S going?
We're interested in creating simulation and analysis tools that enable computational biologists and engineers to
both study natural biological systems and design new synthetic ones. One goal is the ability to quickly simulate the stochastic dynamics of
any system of chemical or biochemical reactions, including non-linear ones with 'fast/continuous', 'fast/discrete', and
'slow/discrete' reactions. Another goal is to use those simulation results in a variety of design tools, such as global optimization and
sensitivity analysis. Of course, advances in the first goal will lead to faster results in the latter one. The mathematics of random dynamical
systems is relatively immature. Another goal is an advancement of the theory behind the numerical methods. The field is moving fast so stay tuned!