Pierre Collet

 

Genetic promoter computational model

by Pierre Collet - 2013-10-25

Introduction

In this article I will explain a model I've created when I was working at the Center for Research and Interdisciplinarity. My work was to create a genetic expression simulator in order to perform system biology easily. To correctly introduce my model, I need to say that this simulator has a discretized time and was not a stochastic one (based on Monte Carlo method). One interest of this simulator was to easily perform synthetic biology at no cost everywhere. This make this field more accessible to everyone.

By working on this project, I've learned and choose multiple model for different reactions such as promoter reactions or even enzymatic reactions. This post only discusses about biological promoters and no other biochemical reaction. By the way, I acknowledge Helena Shomar, a biologist engineer for her help in this work.


The model

According to the Uri Alon's book: "An Introduction to Systems Biology: Design Principles of Biological Circuits", a good mathematical model for a promoter could be



And for a repressed promoter:



These two sigmoid functions above describes the behavior of a promoter in function of the concentration of an X transcription factor. So basically, this function will return a value between 0 and Beta (the maximum production rate) with a pattern which looks like an S. [link sigmoid function]

So far, it's quite straightforward to use these two functions in a simulator. But very soon it becomes insufficient [check] to simulate complexes promoter. For instance, these functions only takes 1 input transcription factor. My goal was to create a computational model which allow more complex interaction between transcription factors and promoters.

As described in the Uri Alon's book, we can manage the behavior of multi-inputs promoters with Boolean function. A basic example would be a promoter which is activated only by the bounding of two transcription factors X and Y at the same time. This interaction makes an AND Boolean function.



But a Boolean function is not enough accurate to simulate the promoter behavior. It tells only if a promoter is activated or not. So let's see how to manage interactions more accurately.

First, with the AND, OR and NOT Boolean function it's possible to create every other Boolean functions. So let's focus on these 3 functions. Considering the model above, we can ignore the Beta parameter which not depends of the interaction between molecule and promoter. Moreover, it's necessary to express each interaction in function of a common basis. Let's see now what the basic model looks like with these considerations:



A repressed promoter can be defined in function of this activated promoter function:



The And interaction between two transcription factor can be considered as a min function.





And finally an OR expression can be considered as a Max function.





In this way it's possible to define the behavior of a promoter only by linking together these functions.

For instance, if a promoter is induced by a transcription factor (X) and repressed by another one (Y) it gives the following formula:





Implementation

The implementation is straightforward. Just need a grammar to express formulas, then transform them in AST and finally recursively execute it.

        Grammar :
    EXPR ::= ANDEXPR [OP_OR OREXPR]
    ANDEXPR ::= PAREXPR [OP_AND ANDEXPR]
    PAREXPR ::= (NOTEXPR | OP_LPAR OREXPR OP_RPAR)
    NOTEXPR ::= [OP_NOT] (OPERANDEXPR | BOOL_EXPR)
    BOOL_EXPR :: = (OP_TRUE | OP_FALSE)
    OPERANDEXPR ::= CONSTANTEXPR WORD
    CONSTANTEXPR ::= OP_LHOOK FLOATNUMBER OP_COMMA FLOATNUMBER OP_RHOOK
    WORD ::= CHAR [CHAR | NUMBER]
    NUMBER ::= (0|1|2|3|4|5|6|7|8|9) [NUMBER]
        CHAR ::= (a-z, A-Z)
        FLOATNUMBER ::= NUMBER [OP_DOT NUMBER]

    Default Operators:
    OP_OR ::= "|"
    OP_AND ::= "*"
    OP_LPAR ::= "("
    OP_RPAR ::= ")"
    OP_NOT ::= "!"
    OP_TRUE :: = "T"
    OP_FALSE ::= "F"
    OP_LHOOK ::= "["
    OP_RHOOK ::= "]"
    OP_COMMA ::= ","
        OP_DOT ::= "."

Let's see the corresponding tree for the formula "[1.3,2]X*![1.4,1]Y" :



By executing this tree, we get the result of the previously defined formula :



At the end, just multiply this result by the production rate, the terminator factor and then by the ribosome binding site factor of the gene to get the produced concentration of the protein.