How to Design Biochemical Networks and Run Hy3S Simulations

A brief guide to the GUI

First, a word on the status of the GUI and its author. I am not a Comp Sci major. The GUI is meant to be a quick and flexible way to create biochemical networks in order to simulate them. As such, it may look ugly. I've tried to conform the function of the GUI to a user's most likely expectation, but (like I said before) I am not a Comp Sci major. Some buttons don't work (yet). But, what does currently work is well tested and has few bugs (all of the features listed in the recent Bioinformatics paper do work). There are two goals that I have: 1) that the user may quickly enter in reactions, selecting from many rate laws, enter kinetic parameters and initial conditions, and set the simulation program parameters. 2) tasks which are tedious, boring, and otherwise droll should be automated by the GUI. This includes varying the kinetic parameters/initial conditions from one model to the next. The first goal has been achieved (at least I think so). The second is an ongoing process of first finding tasks which need to be repeated over and over in the course of modeling a natural network or designing synthetic ones and then coding the automation of that task in a fairly easy-to-use manner. And, lastly, should you have any harsh comments about the look, the feel, or the function of the GUI, please remember that (oo, I'll say it again) I am not a Comp Sci major. I'm only really concerned about the function of the GUI and I'll listen to other suggestions, but I'll make no promises on when they'll be addressed. On to the brief guide:

At the Matlab prompt, type:
>> designgui
This will open up the main window of the Hy3S GUI, which looks like this. (Please excuse the image capture in Windows. The graphics look slightly prettier than in Linux. :) )
At the top, there is a menu containing some important functions. Under 'File', you may create, open, save, import, or export a file. You may also exit the program from this menu. The next menu, 'Design Tools', has a button to create a random network. The code for this is still in development and this button doesn't work. The next menu contains a button to set and view some auxilliary model information.

In the main window, you can enter the start time, the end time, and the 'storage' time for the simulation. The storage time is the length of time in between saving the state of the system to disk. The number of time points saved to disk is floor((end time - start time) / storage time) + 1. Next, select what the initial volume of the system should be and whether there will be cell division. Cell division is a discrete event where, every time the event occurs, all of the user-specified molecules get reduced by half. The times at which cell division occur are Gaussian distributed with a mean and a standard deviation. The mean and standard deviation are entered in on the next two fields. If you enter a standard deviation of zero, you will cause cell division to be a deterministic event.

The next field is important: the Experiment Type. I'll explain what each of the experiment types are used for, but beginning users should try the program out with the 'Single Model' setting first before using the more advanced (and more interesting) features. The first experiment type causes the simulation program to simulate a single model with the specified reaction network and parameters. The second experiment type, 'Muliple Models - Varying the Kinetic parameters and Initial Conditions', causes the simulation program to simulate multiple models where each model may have its kinetic parameters and/or initial conditions varied from one model to the next. The simulation program will then simulate each model and store all of the trajectory solution data. The third experiment type, 'Multiple Models - Varying the Initial Conditions and System of Reactions', causes the simulation program to simulate multiple models where each model may have a different reaction network, initial conditions, and kinetic parameters (basically allowing everything to change from one model to the next). So far, only the first two Experiment Types are 'supported' (ie. the data format and simulation program can handle Experiment Type #3, but we have yet to release an automatic way to define each model or view multiple models in the data file.)

The function of the button underneath the Experiment Type will change when you select a new Experiment Type. For a single model, the button will allow you to set initial conditions for all of the chemical species. For varying the kinetic parameters/initial conditions, the button will allow you to set which kinetic parameters or initial conditions will be varied. For the 3rd experiment type, the button does not do anything (yet). The numbers and meaning of each experiment type may change in the future depending on the development of new algorithms.

The number of trials, or independent realizations of the biochemical network, should be entered in the field adjacent to the experiment type button. Below that, there is the system perturbation button. Clicking on it will take you to another window where you can view, add, or remove system perturbations. Below that, there is a handy text field telling you how big the state and time solution data will be once the simulation program finishes. To save time and initial disk space, the variables holding the solution data are not created by the GUI. Instead, when the simulation program runs, it will detect whether these variables exist and, if they do not, the program will add them to the data file. The text field here shows the final size of the data file once the solution variables are added and filled in with data.

At the bottom, the name of the currently opened file is shown. To save a file, enter in the name of the file and press 'create data file'. There is a button labeled '...' which allows you to select a file from a list of files in the current Matlab working directory (the list filters out all but .nc files).

On the extreme right, you'll find a button marked '(Un)Save Selected Species'. By selecting one or more species in the species box (using the control or shift button to select multiple entries) and pressing this button, you can specify that you do or do not want to save the selected species to disk. If, for example, you have 1203 unique chemical species in your biochemical network, but you're only interested in five of them then you may only save those five to disk. This reduces the size of the solution variable proportionally.

Now for the most important part: Adding reactions to the biochemical network.
Click on 'Add Reactions' and a new window will pop up. It will look like this.

On the left, you can (from the top) enter in the name of the reaction (default is the next integer number), enter in the reaction equation (Reactants --> Products), enter in all of the necessary kinetic parameters in terms of the specified units, and then enter in all of the names of the chemical species which are used in the selected rate law. On the right, you can select the rate law from the list. Not all rate laws will use all of the available boxes for kinetic parameters and 'dependent species'. After providing all of the needed information, clicking 'Add Reaction' will add that reaction to the list. The reaction equation must be in the form n String + ... + n String --> n String + ... + n String, where n is any positive integer and String is any alphanumeric name (although the first character must be a letter). The '+' separates one species from another. The '-->' separates the reactants and products. Each of these symbols must have spaces on both sides. So 'A+B-->C' will not work. If this sounds confusing, let me give an example:

For a simple A + B --> C reaction, enter in 'A + B --> C' as the reaction equation. If you click outside of that box, the GUI will 'guess' that the reaction kinetics follows a bi-molecular 2nd order rate law and it will select that one. If this is ok, then you just need to fill in the single necessary kinetic parameter and hit 'Add Reaction'. If, for some reason, you want the reaction A + B --> C to have a different rate law, then you need to select a rate law from the list on the right and enter in all of the necessary information.

The stoichiometry of the reaction will always be A + B --> C, but the rate at which the reaction occurs can be some other function, dependent on other species, and using multiple kinetic parameters. The GUI provides this flexibility because some reactions do not follow elementary kinetics and this is the most general way of allowing for that.

A better example of this is the following:

For the following enzymatic reaction, S --> P, the user may select 'Michaelis Menten standard' as the rate law. Two kinetic parameters are needed: the kcat and the Km. The rate law is dependent upon three chemical species, the substrate, S, the product, P, and the enzyme, E. Fill in the names of those chemical species below. Every time this reaction occurs one substrate molecule will be converted to a product molecule and the rate of this reaction will be computed according to the Michaelis Menten standard kinetics.

So the user has a great deal of flexibility in adding reactions. The GUI will not check for consistency or meaning. That is the user's responsbility as a modeler.

Clicking on a reaction in the list of reactions will display its information on the screen. Note: You can add reactions twice, so if you click on a reaction and then click 'add reaction', it will add that same reaction to the list (again). To remove a reaction, you need to select a reaction from the list in the main window click on 'delete reaction' from the main window.

If you make a mistake in adding a reaction you can either delete it (from the main window) or modify its information. To modify a reaction, select it from the list of reactions in the main window and hit 'Modify reaction'. It will bring you to the 'Add Reaction GUI window'. Modifying the information and clicking 'Add Reaction' will change the selected reaction accordingly. If you hit the 'Add reaction' button twice, the first time will save the modified information and the second time add another, identical reaction to the list. (Just something to watch out for.)

A note on units: All of the kinetic parameters are saved in macroscopic units (Molar, etc). The simulation program will convert to mesoscopic units (Molecules, etc). The only exception is the 'Generalized Power Law Kinetics'. The units of the kinetic parameter, k, are mesoscopic. The function of that rate law is simply r = k*X1^n1*X2^n2 ... with up to 5 different species. This rate law is useful for the following situation:

Consider the reaction: RNAP + P + O --> RNAP:P:O, which is often used to model the RNA Polymerase binding to a promoter site and also overlapping with an operator site. If the operator site is occupied, say by a repressor, the RNA Polymerase can not bind. The kinetics for this reaction are bi-molecular 2nd order, but the stoichiometry has three 'reactants'. Really, the promoter and operator site are on a single molecule of DNA, but to represent the differences between these two DNA sites, we consider them as two unique chemical species. We can adequately represent the kinetics of this reaction using a Power Law representation. The rate (or propensity) of the reaction is r = k*#RNAP*#P*#O, where k is the mesoscopic kinetic parameter corresponding to the converted 2nd order kinetics, or k = k_mac / Na / Volume (Na = Avogadro's #). If either the promoter or operator is bound by something else, this reaction may not occur. If both are present, the reaction occurs with the correct kinetics. I find it to be a good representation of what is really happening while being easy to extend to multiple DNA sites (ie. if there are 2+ operators that overlap with the promoter site).

Now for a quick overview of the other windows in the GUI:

All models must have their initial conditions set. To set the initial conditions, set the Experiment Type to 'Single Model' and press 'Initial Conditions' button. A window will pop up with a list of the chemical species in the network. Enter in the initial conditions in units of molecules for each chemical species. You may also select whether the species is Split On Division or 'SOD'. If the species is checked for SOD then each cell division will cause the number of molecules of this species to be halved. You may also select whether the species should be saved to disk or not. Clicking on 'Next 20 Species' will save the values entered and show the next 20 species in the biochemical network. Repeat this process for all of the species in the network.

All biochemical networks, regardless of their final experiment type, must have their initial conditions set using this window.
Suggestions on ways to better set initial conditions are welcome.

The next important window is for setting kinetic parameter and initial condition variations. Set the experiment type to #2 and press the 'Multiple Conditions' button. This will bring up a window where you view, add, and remove variations of kinetic parameters and initial conditions. To add a variation, select either a reaction or species from the lists, enter in the needed information, and press 'Add Variation'. For reactions with multiple kinetic parameters, you may select the kinetic parameter to vary by clicking on one of the radio buttons. You may vary the kinetic parameters or initial conditions using either a linear or log scale. A log scale is useful if varying a kinetic parameter with a very small start value and a very large end value.
To delete a variation, select it from the list on the left, and hit 'Delete Variation'.

The last important window is the System Perturbation one. From the main window, select 'System Perturbations'. A window will pop up and you may perturb any of the kinetic parameters or number of molecules of species by selecting the corresponding reaction or species and entering in the time and value of the perturbation. Adding and deleting perturbation acts similarly to adding and deleting variations. Multiple perturbations allow for a lot of interesting behavior.

It is straightforward to create step functions by perturbing the number of molecules of a species from one value to the next at a desired time. The easiest way to create a ramp function of the number of molecules of a chemical species is to add two additional reactions, such as

--> R with a forward kinetic constant of k1 and
R --> with a backward kinetic constant of k-1

where R is the species which will ramp up to some steady state value from a previous value. By perturbing k1 from 0 to some positive value, you cause the number of molecules of R to increase from 0 to a steady state equal to k1 / k-1. The larger k1, the larger the slope of the ramp function. You can control the final steady state by altering the k-1 parameter. Through a combination of multiple step and ramp functions, many interesting perturbation behaviors may be created.

Hopefully, the prior brief guide to the GUI will allow you to quickly get aquainted with how to create biochemical networks. The 'how' is probably a lot easier to answer than the 'what'. There's quite a few scientific publications that describe the 'what' so I won't expand on that here.

The command line simulation programs

As of Hy3S v1.1 there are 5 different simulation programs (with both serial and MPI modes for each program). One program is the Next Reaction variant of the stochastic simulation algorithm, developed by D.T. Gillespie and improved by Gibson and Bruck. The latter four are different implementations of the solution of a hybrid jump/continuous Markov process, which can simulate biochemical networks much faster than the original SSA while only creating a very small amount of error. The exact speed up depends on the number of molecules and rates of the reactions in the biochemical network. Each implementation has its own advantages and disadvantages. The details of each method are described in a publication in BMC Bioinformatics. Here, I will summarize the advantages/disadvantages of each program and describe their usage. The abbreviation 'SDE' refers to a system of stochastic differential equations. The MPI (parallelized) versions of each program have 'MPI' appended after 'SSA' or 'HyJCMSS' (eg. the MPI version of HyJCMSS-EM is HyJCMSS-MPI-EM). Running the MPI version of the programs does not require any additional parameters, excluding the typical mpirun ones. For example, to run the HyJCMSS-MPI-EM program, one would type 'mpirun -np 100 HyJCMSS-MPI-EM filename.nc 100 10 0.01 0.1'.

Program Name	SDE Numerical Integrator	Pros	Cons
SSA	None -- Next Reaction variant of the Gillespie algorithm	Essentially Exact	Extremely slow for "large" systems with many molecules and fast reactions
HyJCMSS-EM	Fixed time step Euler method	Much faster for "large" systems. Fastest SDE numerical integrator for non-stiff systems.	For stiff systems, species populations may go negative. Finding an accurate time step that produces a valid solution can be tedious. Error is proportional to the square root of the time step.
HyJCMSS-Mil	Fixed time step Milstein method	Increased accuracy. Error is proportional to the time step, allowing one to use a larger time step.	Evaluation of 2D Ito integrals decreases the speed of the simulation.
HyJCMSS-EM-Adaptive	Adaptive time step Euler method	Automatically chooses a time step based on the amount of error generated.	Does not always converge to the correct solution! Usage is not advisable, unless for educational purposes.
HyJCMSS-Mil-Adaptive	Adaptive time step Milstein method	Automatically chooses a time step based on the amount of error generated. Increased simulation efficiency when transient stiffness exists. With a reasonable tolerance, convergence to the correct solution is gauranteed.	Slower than fixed methods for systems with constant timescales, due to the computational overhead in the adaptive code.

The command line parameters for each simulation program are slightly different. If you type any of the program names by themselves, the syntax of the parameters and a short description will be printed to the screen. Also, the simulation parameters are summarized below.

Command Line Arguments of Each Simulation Program

Command Line Parameter	Description	In Which Methods	Default Value
Filename	NetCDF file name	All	None
Random Seed	The random number generator's seed value/TD>	All	Randomized (system dependent, however!)
Epsilon	Minimum number of molecules both reactant and product species required for approximation as a continuous Markov process	All HyJCMSS methods	100 molecules
Lambda	Minimum rate of reaction required for approximation to a continuous Markov process	All HyJCMSS methods	10 molecules/second
Multiple Slow Reaction (MSR) Tolerance	Maximum allowed effect of executing multiple slow reactions per numerical integration of the SDEs	All HyJCMSS methods	1 / Epsilon
SDE Tolerance	Maximum allowed value of the drift and diffusion errors	Adaptive Methods	1e-4
SDE Time Step	The maximum time step of the SDE numerical integrator	Fixed Methods (but may also be set for Adaptive methods to decrease memory requirements	0.10 seconds or the Save Time

Footer

Maintained by Howard Salis
Kaznessis Research Group
Dept. of Chemical Engineering & Materials Science
University of Minnesota
lastname@cems.umn.edu (spam is evil)

Site last modified at (none)