Hy3S -- Frequently Asked Questions

Stochastic simulation, also known as kinetic Monte Carlo, is a numerical procedure for determining the dynamics of a continuous time Markov process. A continuous time Markov process is a memoryless stochastic process that is used to describe all sorts of physical and chemical systems, including biological systems. The solution of a Markov process includes two equivalent parts: a time-dependent probability distribution of states in state space or trajectories of the state moving in state space over time. By computing an ensemble of trajectories, one can generate the distribution. Or, by computing the distribution, one can sample it to obtain trajectories. If the state space of the Markov process is discrete, it is called a jump Markov process and its probability distribution is governed by the Master equation. If the state space is continuous, it is called a continuous Markov process and its distribution is governed by the Fokker-Planck equation. For all but the most trivial systems, the Master equation is analytically unsolvable. For large dimensional systems (ie. many chemical species), the Fokker-Planck equation is impractical to solve. Stochastic simulation is a way to generate trajectories of a Markov process and then to compute the distribution of all possible trajectories, effectively sidestepping the problems with solving either the Master or Fokker-Planck equations. Of course, for some systems, stochastic simulation can be as impractical, motivating the usage of hybrid stochastic simulation methods.

Q: How is stochastic simulation useful for describing biological systems?

Biological systems can be described as a system of biochemical reactions, including processes such as metabolism, signal transduction, and gene expression. The reactions may involve bond-breaking (as in the classic definition of a chemical reaction) or may only involve strong non-covalent interactions between molecules, such as hydrogen bonding, salt bridges, or hydrophobic 'forces'. Either way, one may measure the thermodynamics and kinetics of these reactions and quantitatively describe their rates. Traditionally, the dynamics of a system of reactions is described using ordinary differential equations (ODEs). With ODEs, the trajectory of the state moves deterministically through time to some attractor (ie. starting from the same initial condition, the trajectory will always be the same and no two trajectories may ever cross). However, through both analysis of theory and experimental observation, it has been noted that ODEs do not adequately describe the dynamics of biological systems. Instead, we describe the system as the above-mentioned Markov process. Now, given the same initial condition, different trajectories of the state will occur and trajectories may cross paths in time. This seemingly simple change results in a large amount of new phenomena, including populations of cells with differing phenotypes, cells which spontaneously change their phenotype, and oscillations which only occur because of the 'noise' or stochasticity.

Q: What is a hybrid stochastic simulation method?

It is no surprise that there is a connection between jump Markov, continuous Markov, and deterministic processes. In fact, using theory, one can state the necessary conditions in order to approximate a jump Markov process as a continuous one and, likewise, a continuous Markov process as a deterministic one. By measuring the validity of these conditions, while the simulation is running, one can approximate only the part of the system that meets these conditions. Our recent research shows that, after partitioning the system into multiple different mathematical representations, one can self-consistently merge these disparate descriptions back together again. Transitions in the jump Markov process can be converted to differential Jump equations (a type of stochastic differential equation). The effects of reactions described as a continuous Markov process may also be described using stochastic differential equations. Likewise, the deterministic subset may be described using ODEs. All of these equations are coupled and may be simultaneously numerically integrated forward in time using a variety of well-characterized stochastic numerical integrators. It is this last part that separates our hybrid methods from others. We can characterize the error of the solution using the theory of Wiener-driven SDEs and their numerical solution. Other hybrid methods sometimes rely on quite heuristic approximations which we can adeptly avoid.

Perhaps most importantly, these hybrid stochastic simulation methods can be orders of magnitude faster than the original stochastic simulation algorithm. The exact speed up depends on the number of reactions classified as either jump Markov, continuous Markov, or deterministic processes.

Q: Why is the focus on using supercomputers?

Even when using hybrid stochastic algorithms, simulations of realistic (ie. very large) biochemical networks require serious computing power. In order to compute an accurate probability distribution of the system dynamics, numerous independent trajectories must be computed. A single trajectory of the system dynamics may be computed in a short period of time (milliseconds to minutes), but an accurate distribution requires at least 5,000 trajectories. Furthermore, it's highly desirable to simulate multiple networks, each time changing one or more parameters, to determine either the effect of that parameter on the network (sensitivity analysis) or find a desired behavior (global optimization). Supercomputers are quickly becoming common tools to the research community. We believe their user-friendliness and widespread usage will only increase as in silico design yields more and more fruitful results. Hy3S is designed to be used on cluster-based supercomputers (not necessarily vector processors) with the accompanying network and storage infrastructure, but the code will function well enough on a fast personal computer.

Q: Why does Hy3S use the NetCDF data format?

The NetCDF library and format is specifically designed to store huge amounts of data in a platform-independent, self-describing optimized binary file. Simulations of large networks produce an incredible amount of solution data, requiring a data format that not only efficiently stores the data, but also quickly retrieves it for analysis. NetCDF files may be read and write in direct access mode, enabling access to hyperslabs of non-contiguous blocks of data and reducing the memory requirements for analysis. For example, even though a solution data file may contain multiple trajectories of a system of multiple chemical species over many time points, it is easy to read in only the data sampled from a specific timepoint or about a certain chemical species. Instead of reading in gigabytes of data, you can pick and choose which data to read requiring much less time and using much less memory. The data may be read from any operating system or architecture that has a C or Fortran77/90/95 compiler. The API is available in additional programming languages, including C++, Perl, Python, Java, MATLAB, and Ruby.

SBML, or Systems Biology Markup Language, is a standard XML format for describing biochemical networks. Hy3S will eventually support the import and export of SBML files. We are waiting for the SBO classification terms in SBML v3.

Q: How is Hy3S different from other stochastic simulators?

To the best of our knowledge, this is the only stochastic simulation software that offers

a fast hybrid algorithm,
an easy to use GUI (usable on Windows, Linux, or Mac),
parallelized code (MPI / Fortran95),
useful features, such as cell replication events, gamma-distributed events, system perturbations, parameter variation, and initial condition variation
the GPL license (LGPL is also available if you ask)
developers who actually design biological systems themselves and who will continue to add features useful to their own research

Q: Where is development of Hy3S going?

We're interested in creating simulation and analysis tools that enable computational biologists and engineers to both study natural biological systems and design new synthetic ones. One goal is the ability to quickly simulate the stochastic dynamics of any system of chemical or biochemical reactions, including non-linear ones with 'fast/continuous', 'fast/discrete', and 'slow/discrete' reactions. Another goal is to use those simulation results in a variety of design tools, such as global optimization and sensitivity analysis. Of course, advances in the first goal will lead to faster results in the latter one. The mathematics of random dynamical systems is relatively immature. Another goal is an advancement of the theory behind the numerical methods. The field is moving fast so stay tuned!