Documentation for the perl program mpa_5_17.pl by Daniel Segre' Department of Genetics, Harvard Medical School 200 Longwood Avenue, Boston, MA 02115, USA http://genetics.med.harvard.edu/~dsegre/ E-mail: dsegre@genetics.med.harvard.edu Phone: 1-617-4320062 Fax: 1-617-4320065 CONTENTS: 1 PURPOSE 2 OPTIMIZATION LIBRARY 3 INPUT FILES 4 OUTPUT FILES 5 DETAILED DESCRIPTION OF OUTPUT FILES 6 PARAMETERS 7 DATA VISUALIZATION 1) PURPOSE =========== The program reads stoichiometric parameters for a metabolic network, receives from the user additional constraints on metabolic fluxes, and produces phenotypic predictions of the rate of growth associated with such constraints, as well as a complete set of fluxes. For a complete description of the theoretical and biological background, see Segre', Vitkup and Church, "Analysis of optimality in natural and perturbed metabolic networks" Stoichiometric data used here are from Edwards and Palsson, PNAS (2000) 97, 5528-5533. 2) OPTIMIZATION LIBRARIES ========================== The program uses two main libraries for optimization: one for Linear Programming (LP), and one for Quadratic Programming (QP). In principle, any libraries other than the ones proposed here, could be utilized. Changing the libraries would require however modifying the subroutines in the program that read the optimization output and feed it back into the script. LP library ---------- Web site: http://www.gnu.org/software/glpk/glpk.html GLPK (GNU Linear Programming Kit) is a set of routines written in ANSI C and organized in the form of linear programming (MIP), and other related problems. GLPK is freely downloadable and open source. It can read standard LP files in mps format. Detailed installation instructions are available with the package. The file glpsol_dan.c serves as an interface between the perl script and the GNU LP library. Its executable version (now glpsol_dan.exe; called a.out in earlier version of the perl program) was obtained by compiling glpsol_dan.c on linux, as dexcribed in the comments at the beginning of glpsol_dan.c QP library ---------- Web site: http://www6.software.ibm.com/sos/features/qp.htm The IBM QP Solutions product includes two standalone solver programs incorporating library modules to solve quadratic programming problems, with linear constraints and a convex quadratic objective. OSLQSLV uses a two-stage, simplex based algorithm to mimimize the quadratic objective, while OSLQBSLV uses the interior point solver to mimimize a regularized form of the quadratic objective. NOTE: once the QP package is installed, the following setting of system variables is necessary in order for the library to actually work as part of the program: export OSL_HOME="/QP_directory/solutions" export PATH="$PATH:$OSL_HOME/bin" where home is the directory in which the IBM QP package is installed. More information on this topic is available with the library documentation. 3) INPUT FILES =============== S.par = S matrix of stoichiometric coefficients objvector.par = vector containing coefficients for objective function constraints.par = set of boundaries for fluxes 4) OUTPUT FILES ================ LP_QP_Vgrowth.dat = Growth rates with LP (FBA) and QP (MPA) method LP_fluxes.dat = list of fluxes from LP QP_fluxes.dat = list of fluxes from QP wt_fluxes.dat = list of fluxes from wild type enzyme.m = list of names of fluxes (Matlab executable file) reaction.m = list of reactions (Matlab executable file) 5) DETAILED DESCRIPTION OF OUTPUT FILES ======================================== OPTIMIZATION ALGORITHM FILES: Ecoli_complete_wt.mps --------------------- ecoli_LP_mut.mps ---------------- These two files contain the instructions for performing Linear Optimization in standard mps format. They correspond to the wild type and to the mutant respectively. They contain the list of metabolites and enzymes, as well as all constraints and the definition of the objective function. These two files are generated by the program after reading the parameter files, and are fed into the glpk optimization routine. OUTPUT DATA FILES LP_QP_Vgrowth.dat ----------------- Contains the optimal speed of growth for the mutant metabolic network. wt_fluxes.dat ------------- The list of all optimal fluxes for the wild type, ordered as in the lists of enzymes and reactions (See below). LP_fluxes.dat ------------- The list of all optimal fluxes for the mutant (to be compared with wt_fluxes.dat). REFERENCE FILES enzyme.m -------- reaction.m ---------- These files contain the list of enzymes and reactions respectively, in the order used throughout the program (and compatible with the parameter files). They are generated by the program and they are Matlab executable that define Cell arrays containing the ordered list of names (reactions). OTHER FILES LP_output.txt ------------- This is the original output generated by the GNU Linear Programming algorithm. It can be used for extracting additional information about the mutant, such as marginal values (shadow prices). This file is processed in the program after being generated by glpk, so as to extract the relevant flux values. 6) PARAMETERS ============== Variable that turns QP part ON/OFF (1=ON ; 0=OFF) $QP_enable=0; Position of growth flux for complete Ecoli $Vgr_position=602; 7) DATA VISUALIZATION ====================== The following short script in Matlab (Version 6, release 12) loads the fluxes results of the programs and plots the mutated fluxes vs. the wild type fluxes in a scatter plot. The script also uses the Matlab-readable list of enzymes to perform a user-friendly labelling of the points upon mouse clicking. %----------------------------------------------------------------------- load LP_fluxes.dat load wt_fluxes.dat enzyme figure; plot(abs(wt_fluxes),abs(LP_fluxes),'.') xlabel('wild type fluxes') ylabel('knockout fluxes') disp('Click to display flux name ; ESC to exit') title('Click to display flux name ; ESC to exit') gname(char(Enzyme)) %-----------------------------------------------------------------------