Discover millions of ebooks, audiobooks, and so much more with a free trial

Only €10,99/month after trial. Cancel anytime.

Logical Modeling of Biological Systems
Logical Modeling of Biological Systems
Logical Modeling of Biological Systems
Ebook713 pages8 hours

Logical Modeling of Biological Systems

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Systems Biology is the systematic study of the interactions between the components of a biological system and studies how these interactions give rise to the function and behavior of the living system. Through this, a life process is to be understood as a whole system rather than the collection of the parts considered separately. Systems Biology is therefore more than just an emerging field: it represents a new way of thinking about biology with a dramatic impact on the way that research is performed. The logical approach provides an intuitive method to provide explanations based on an expressive relational language.

This book covers various aspects of logical modeling of biological systems, bringing together 10 recent logic-based approaches to Systems Biology by leading scientists. The chapters cover the biological fields of gene regulatory networks, signaling networks, metabolic pathways, molecular interaction and network dynamics, and show logical methods for these domains based on propositional and first-order logic, logic programming, answer set programming, temporal logic, Boolean networks, Petri nets, process hitting, and abductive and inductive logic programming.

It provides an excellent guide for all scientists, biologists, bioinformaticians, and engineers, who are interested in logic-based modeling of biological systems, and the authors hope that new scientists will be encouraged to join this exciting scientific endeavor.

LanguageEnglish
PublisherWiley
Release dateAug 8, 2014
ISBN9781119015215
Logical Modeling of Biological Systems

Related to Logical Modeling of Biological Systems

Related ebooks

Science & Mathematics For You

View More

Related articles

Reviews for Logical Modeling of Biological Systems

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Logical Modeling of Biological Systems - Luis Fariñas del Cerro

    1

    Symbolic Representation and Inference of Regulatory Network Structures

    Recent results have demonstrated the usefulness of symbolic approaches for addressing various problems in systems biology. One of the fundamental challenges in systems biology is the extraction of integrated signaling-transcriptional networks from experimental data. In this chapter, we present a general logic-based framework, called Abductive Regulatory Network Inference (ARNI), where we formalize the network extraction problem as an abductive inference problem. A general logical model is provided that integrates prior knowledge on molecular interactions and other information for capturing signal-propagation principles and compatibility with experimental data. Solutions to our abductive inference problem define signed-directed networks that explain how genes are affected during the experiments. Using in-silico datasets provided by the dialogue for reverse engineering assessments and methods (DREAM)) consortium, we demonstrate the improved predictive power and complexity of our inferred network topologies compared with those generated by other non-symbolic inference approaches, showing the suitability of our approach for computing complete realistic networks. We also explore how the improved expressiveness together with the modularity and flexibility of the logic-based nature of our approach can support automated scientific discovery where the validity of hypothesized biological ideas can be examined and tested outside the laboratory.

    1.1. Introduction: logical modeling and abductive inference in systems biology

    Systems biology is generally concerned with developing formal models that aim to describe the operation of various biological processes. Its study is based on the synthesis of a model or a theory from empirical experimental information. At the cellular level, systems biology aims to build models that describe, at some level of abstraction, the underlying operation of a cell at the genomic and/or protein level. The central challenge is then how to choose an appropriate framework that would (1) enable the construction of a model from experimental data and (2) empower such models with a predictive capability for new information beyond the one used to construct the model.

    As in many cases of such scientific exploration, the choice of the framework under which we formulate the model depends on the type of experimental data that is available at the time of the development of the scientific model. In general, at the initial stages of an investigation the available data is usually descriptive and qualitative rather than quantitative. As such we set out to develop a first model, based on some principles that we believe underlie the phenomena, where we are primarily interested in capturing the overall and general interrelation between the concepts of interest. It is then important to require a framework that is (1) high-level close to the human description of the phenomena and thus close to the experimental language, and (2) modular and flexible so that the models can easily be adapted to new information and other changes that might come about.

    Under these conditions and requirements for our language, a symbolic or logical framework is particularly suitable. A logical scientific theory normally offers a high-level declarative description that can be understood easily by the expert experimental scientists that provide the experimental data. Logical models are also highly modular where changes can often be isolated to parts of the model without the need for an overall complete reformulation of the model. Furthermore, within a logical approach we can employ abductive reasoning to help in the process of building a theory from experimental data. Abductive reasoning is a formalization of the explanatory scientific reasoning that is typically carried out by human scientists when they think about the phenomena they are studying, either when they are trying to understand their experimental findings, or when they are planning the next set of experiments to help them improve their understanding of the phenomena.

    Hence, in choosing a logical approach, we provide a framework that not only responds well to the object level requirement of describing the phenomena, but also to the meta level task of reasoning about the models developed thus far and deciding on their further investigation through new experiments, or indeed new desirable properties and principles that the model must adhere to. For molecular biology, logic is particularly suited as, at least currently, in many cases the theoretical models and experimentation of cell biology are developed following a rationale at the qualitative rather than quantitative level. The nature of much of the experimental data is descriptive with the aim to first understand the qualitative interrelations between the various constituents and processes in the cell.

    In this chapter, we have developed a logical model of regulatory cell networks, covering both transcriptional networks and upstream signaling regulatory networks. We have implemented a qualitative model that is based on general biological principles and which exploits current prior knowledge of molecular interactions that are already known. The approach, called ARNI, for abductive inference of regulatory networks, constructs causal signed-directed networks of interactions between genes from high-throughput experimental data. These networks rely on the simple and general underlying principles that signals from the environment propagate along paths of protein interactions to reach the regulatory components of cells (i.e. production of genes) and that genes are under the influence of multiple overlapping inputs, which might be compatible or competitive to each other. The networks also exhibit several important motifs including feedback loops (positive and negative), which allow a gene to control its own expression, and feed-forward loops (coherent or incoherent), whereby a gene has both direct and indirect connections to its target¹. Each of these motifs governs fundamental properties of the overall dynamic behaviorof the network such as robustness, oscillations, memory and bistability [ALO 07, YEG 04].

    Our construction of regulatory networks relies on abductive reasoning as an automated form of the scientific reasoning of rationalizing the high throughput experimental data. Indeed, the problem of signaling network reconstruction naturally maps to an abductive task. Specifically, (1) gene expression data constitutes the experimental data; (2) the given (partial) knowledge is a logic-based theory governing biological phenomena, as for instance the notions of gene regulation, interactive potential; (3) biological constraints like sign consistency between interacting gene expressions are captured via integrity constraints and (4) sentences about unknown compatible and competitive gene regulations are the abducible information that can be assumed to form a network. Thus, assuming the general possible structure of signaling networks an abductive computation results in the inference of possible signed-directed networks, in terms of compatible and competitive gene regulations, that conform to the available experimental observations.

    As argued above, our logical approach offers a high-level declarative model with suitable and increased expressiveness for the wide applicability to a variety of signaling network problems and challenges. We demonstrate these properties of the approach through a series of evaluation experiments that test the effectiveness of the abductive networks and explore the expressiveness of the logical framework. We also examine the usefulness of our abductive approach in the meta-level scientific reasoning, as a scientific assistant and how this, together with the modularity of the approach, can support the further development and improvement of the initially constructed networks.

    Our approach follows a series of works that rely on logical abduction for addressing various problems in systems biology. Abduction has been used to learn/revise metabolic pathways [RAY 10, TAM 06] and to hypothesize on the function of genes [RAY 08, KIN 04]. Abductive reasoning is also used in [TRA 09, LAZ 13] for meta-level reasoning over hypotheses but de-novo topology inference is not considered in these existing contributions. More directly related to our work is the approach in [PAP 05], which uses abductive logic programming to infer gene dependencies to explain the changes in the gene expression levels. Our work advances that in [PAP 05] in several ways, specifically by allowing the use of prior knowledge, modeling and reasoning about competitive gene influences and presenting a framework that can act as a scientific assistant to biologists for testing the validity of new hypotheses.

    In comparison with non-symbolic approaches such as gene co-expression networks based on statistical principles [ROT 13, HE 09] and physical network models [YEA 04, OUR 07, HUA 09], logical approaches like ours offer improved expressiveness, as they enable the inference of networks with more complex regulatory structures, and added modularity that allows the logic model to be easily adapted to new available information (e.g. addition of new constraints).

    This chapter is structured as follows. Section 1.2 presents the ARNI approach with its main key components. Section 1.3 describes the results on evaluating the predictive power of our approach and demonstrates the increased expressive power of ARNI. Section 1.4 explores ARNI as a scientific assistant for biological hypothesis testing and section 1.5 concludes the chapter with a discussion on related work and future directions.

    1.2. Logical modeling of regulatory networks

    In this section, after briefly summarizing the basic notions and terminology from abduction, we study how the problem of inferring regulatory networks can be formalized as an abductive problem. We analyze the general biological features of the problem and develop the underlying logical model over which the task of constructing regulatory networks from experimental data can be understood and computationally realized in terms of abduction.

    1.2.1. Background

    An atomic formula (or atom in brief) is a proposition or an n-ary predicate P followed by an n-tuple of terms.A positive literal is an atom ϕ, and a negative literal is a negated atom, written as not ϕ, where not is the negation as failure operator. Positive or negative atoms are referred to as literals. A clause is a rule of the form ϕ ϕ1, … , ϕn, where ϕ is the head atom and ϕi are the body literals. Clauses can also be facts (when n = 0), or denials of the form ic ← ϕ1, ϕ2, … , ϕm, where the symbol ic means false and ϕi are literals. A clause is said to be ground if it contains no variables, definite if all its body literals are positive, and normal if it includes at least one negative body literal. A normal logic program is a set of normal clauses. In general, a model I of a set Π of normal clauses, is a set of ground atoms such that, for each ground instance rg of a clause r in Π, I satisfies the head of rg whenever it satisfies the body. A model I is said to be minimal if it does not strictly include (in terms of set inclusion) any other model. Normal logic programs may have one, none, or several minimal models. It is usual to identify these minimal models, called stable models, as the possible meanings of a program [GEL 88].

    Abduction is a process of reasoning from observations to possible causes. In essence, it is concerned with the construction of explanations, Δ, that conform with given observations and prior knowledge, Π, and that, together with Π, are consistent with given integrity constraints, IC. Abductive explanations are usually restricted to ground atoms from a predefined set called abducibles. Intuitively, abducibles are undefined information in a given knowledge base, whose truth value can be assumed to (partially) complete the knowledge base. In logic terms, given a set Π of normal clauses, expressing prior knowledge and observations, a set IC of denials, and a set of abducible ground atoms, with terms from the Herbrand domain of Π, an abductive reasoning problem consists of finding a set of abducibles Δ ⊆ such that IC is satisfied in a canonical model of Π ∪ Δ. We assume as canonical models, the stable models of Π ∪ Δ. Such stable models are also referred to as generalized stable models of the abductive task [KAK 90].

    DEFINITION 1.1.– Let the tuple AC = 〈Π, IC, 〉, be an abductive problem, where Π is a normal logic program, IC is a set of denial clauses, and is a set of ground abducible atoms. A generalized stable model of AC is a stable model of Π ∪ Δ for some Δ ⊆ that satisfies the IC, denoted Π ∪ Δ IC. The set Δ is referred to as an abductive solution of AC.

    Different abductive proof procedures have been proposed (e.g. [KAK 90, KAK 00, KAK 01]). In these approaches, a minimality criterion, expressed in terms of subset-minimality, is often enforced on the construction of abductive solutions. But, whereas minimality of explanations is desirable in applications of abduction such as planning and diagnosis, extracting regulatory networks that conform with observed gene expression data means computing maximal networks that are biological meaningful (i.e. satisfy biological integrity constraints), that are consistent with prior knowledge about the observed genes (e.g. existing knowledge of a gene being an activator or an inhibitor), and that, together with the prior knowledge, satisfies the observed data. The computation of any such network, in terms of collection of regulations between genes (i.e. compatible or competitive gene regulations), would require an abductive task for which abductive solutions (i.e. the regulations between genes) are not minimal but in fact maximal.

    The answer set programming (ASP) paradigm provides the ideal environment for efficient computation of maximal abductive solutions, as it combines a declarative modeling language with high-performance problem solving computational capabilities [GEB 12]. To understand how an abductive problem, with prior knowledge Π, abducibles and integrity constraints IC, is modeled in terms of an ASP problem, it is easy to think of it as a special type of open program, 〈Π ∪ IC, , 〉 [BON 01] where the set of open predicates (i.e. predicates that are not defined in the program) is the set of abducibles, and denotes that no new terms, in addition to those in the Herbrand base of Π, are considered in [BON 02]. Abducibles can indeed be seen as ground Boolean atoms whose truth value is not defined in the program Π, although it is constrained by IC. In biological terms, our (abductive) problem of extracting genes regulatory network assumes that information about regulations between genes (i.e. compatible regulation or competitive regulations), which are our abducibles, is unknown and therefore open to Boolean assignments. Open programs can be transformed into semantically equivalent normal logic program representations (see [BON 02] for a precise definition of such semantical equivalence), which, in turn, can be expressed as ASP problems with a choice statement over subsets of (see [GEB 12] for the mapping between normal logic programs and choice statements). A choice statement is an expression of the form , where ai are (possibly ground) atoms. This expression informally means that a subset of is included in a stable model (i.e. answer set solution) of the given ASP problem. As the set of ground abducibles could be large, choice statements can be expressed more concisely using conditional literals [GEB 12]. Conditional literals are expressions of the form a : t1 : … : tn, where a and ti are literals, informally denoting the list of elements in the set {a | ti, … , tn}. Clearly the expansion of conditional literals is domain dependent, i.e. it depends on the definition of the literals ti. So, for example, given the following literals p(1), p(2), p(3) and q(2), a choice statement {r(1), r(3)} could also be written as {r(X) : p(X), not q(X)}.

    The formalization of an abductive problem in terms of an ASP problem allows better control on the size of the subset of abducibles that can be included in a final solution, taking also into account different weights that could be given to different abducibles (if required by the problem domain). For instance, we may want to specify that a solution (i.e. answer set) should include the maximal (respectively, minimal) number of abducibles that are consistent with the prior knowledge and the integrity constraints. An ASP problem would in this case include, together with the prior knowledge and the integrity constraints, the optimization expression maximize (respectively, minimize) over the set of the abducibles. An optimization expression is of the form minimize{l1 = w1@p1, … , ln = wn@pn}, and similarly for the case of maximize but with the term minimize replaced by maximize, where wi and pi represent the weight and the priority of the literal li. Informally, optimization expressions are directives to instruct an ASP solver to compute optimal stable models by minimizing (or maximizing) a weighted sum of elements. It is easy to see that using a maximize expression for the choice of subsets over the set of abducibles, and assuming that each abducible has the same weight and same priority (i.e. maximize{a1, a2, … , am}), we basically model the requirement that optimal solutions (i.e. stable models) will include maximal number of abducibles (in terms of set inclusion). The satisfiability of the integrity constraints will be implicitly guaranteed by the computation of the optimal stable models as the ASP problem directly includes the IC.

    To analyze further the difference between our emphasis on maximality versus the more conventional notion of minimality of abductive solutions, and its biological relevance in the computation of regulatory networks, we consider a simple illustrative example. Suppose that our abductive task is to compute an acyclic directed graph with four nodes a, b, c and d that links two of these nodes, say a and b, called seed nodes, by passing through the other two nodes c and d and satisfying the following constraints: (1) seed nodes cannot be linked directly, (2) any two nodes can have at most one link between them, (3) a seed node can either be a source (i.e. its links are all directed out), or a sink (i.e. its links are all directed in) and (4) no other node is a source or a sink (i.e. if a link exists from node Y to node X, then there must exist a link directed out from node X and a link directed into node Y). Essentially, constraint (4) guarantees the formation of paths between seed genes. We show how this abductive problem is formalized within the ASP paradigm and discuss differences between minimal and maximal solutions.

    Figure 1.1 shows the ASP formalization of our abductive task 〈Π, IC, 〉. It can be shown that this representation corresponds to a normal logic program transformation of an open program 〈Π ∪ IC, , ). The ASP problem in Figure 1.1 returns many answer set solutions corresponding to different possible subsets (including the empty set) Δ ⊂ that are consistent with the constraints. These are determined by means of the choice statement {r(X,Y) : node(X) : node(Y) : X ! = Y}. So, in this example, abductive solutions are finite sets of ground instances of r(X,Y), i.e. directed links between nodes, that satisfy the constraints (i)-(iv). To compute just solutions that have minimal abductive assumptions, the above ASP problem can be augmented with the optimization expression #minimize{r(X,Y)}. Clearly, in this case, the smallest set of abducibles that satisfy constraints (1)-(4) is the empty set, and the solver will return the solution with Δ = as optimal solution. We could consider the addition of constraints to force as many links as possible to be abduced. For instance, constraint (5) every node must be linked in the graph could be added to the set of ICs by including the two denials :–node(X), not connected_out(X). and :–node(X), not connected_in(X). The empty solution would in this case not be computed, as it would violate constraint (v); but the minimize optimization statement would generate, as optimal, all possible solutions satisfying all constraints that guarantee all nodes to be connected but with the minimum number of links. The abductive problem accepts in this case four minimal abductive solutions, which are graphically given in Figure 1.2, where an arrow between two nodes (e.g., d and c) represents a ground abduced r atom (e.g., r(c, d)). Although logically correct, such solutions are not biologically very meaningful. In real biological networks, genes (nodes in the graph) are often involved in multiple interactions (i.e. multiple incoming links or multiple outgoing links). This redundant structure of parallel overlapping inputs, ensures robustness under random failure and provides adaptability to the environment [BAR 04].

    Figure 1.1. An abductive task as an ASP problem

    Figure 1.2. Minimal abductive solutions that satisfy constraints (1)-(5)

    What we need in our problem is to compute maximal networks. This is achieved by requiring abductive solutions to be maximal. By adding to the same ASP problem in Figure 1.1 the constraint (v) described above and the optimization expression #maximize{r(X,Y)} over the choice of subset of abducibles, the abductive problem would have, in this case, still four solutions but maximal. The solutions are graphically described in Figure 1.3.

    Figure 1.3. Maximal abductive solutions for abductive task in Figure 1.1

    In summary, the task of computing regulatory networks from gene expression data can be formalized as an abductive task where maximal abductive solutions are computed to give maximal signed-directed gene regulations that are consistent with biological constraints and given gene expression data.

    1.2.2. Logical model of signed-directed networks

    In our ARNI abductive framework, the background knowledge Π is composed of a rule-based model, called formal model, an extensional knowledge, called prior knowledge, and information about experimental data. The former expresses biological knowledge on how interactions of genes are expected to affect the concentration of genes; the prior knowledge captures any known information about specific genes, including interactive potential between two genes and functions of genes, which is normally available from online biological databases. Abducibles are unknown signed-directed regulations between genes (the biological analogy of directed links in the graph example given above). Integrity constraints over the abducibles are of four different categories: (1) constraints that enforce signed-directed regulations to be compatible with existing/established knowledge (e.g. already known regulations or compatibility with known type of regulation of the gene), (2) constraints about compatibility of the signed-directed regulations with experimental data, (3 ) constraints that express logical consistency of the extracted logical model, and finally category (4) that includes constraints about biological consistency. We describe below each of the components of our ARNI framework.

    1.2.2.1. Prior knowledge

    Gene interactions can be of two types, protein-DNA interactions (PDI) and protein-protein interactions (PPI). PDI are directed links from a transcription factor to a regulated gene, whereas PPI interactions are undirected links between proteins. Signed-directed regulations between genes can be of two types, compatible and competitive. These types of gene regulations are in general unknown and therefore constitute the incomplete part of prior biological knowledge. Computing a regulatory network that conforms with observed gene expression data means discovering those unknown signed-directed regulations between genes, or signed-directed links, that cause the observed data, in a way that is consistent with given biological constraints.

    The domain of genes considered in our abductive task is given by the set of genes that are present in a biological experiment. We denote this set with . Known potential interactions between genes are expressed in the prior knowledge as logical facts of the form interactive_potential(gi, gj), which state that there is a form of interaction between genes gi and gj. PDI interactions are normally unidirectional whereas PPI interactions are bidirectional. Therefore our prior knowledge will include only one ground fact of the form interactive_potential(gi, gj) for any known potential PDI interaction, and for any known PPI interaction between pairs of genes gi and gj, two ground facts interactive_potential(gi, gj) and interactive_potential(gj, gi). We denote with IPprior the following set of ground facts:

    [1.1]

    It is important to note that the information of interactive_potential in the prior knowledge does not fully capture the regulatory effects between genes as it does not express the type of signed-directed interaction between two genes. This information is expressed by our abducibles, and it has to be consistent with any known information about the regulatory potential of a gene. Known regulatory potential of a gene is extracted from online biological databases and expressed in our prior knowledge as ground facts of the form regulatory_potential(gi, s) where gi is a gene and s is the type of regulation, which can be 1 (for activation) or –1 (for inhibition). For instance, the statement regulatory_potential(gi, 1) (respectively, regulatory_potential(gi, –1)) in the prior knowledge captures the fact that the effect of the regulator gene gi on any other gene can only be of type activation (respectively, inhibition). When no information about the regulatory potential of a gene is included in the prior knowledge (because unavailable), then that gene can be assumed to have either positive or negative effect on any other gene. Again, our abductive inference process takes into account these two possibilities when reasoning about the effects of gene interactions and, as explained later in section 1.2.2.3, integrity constraints will guarantee that such assumptions are made in a consistent manner. We denote with RPprior the following set of ground facts:

    [1.2]

    As mentioned above signed-directed regulations between genes are the unknown abducibles. It is possible, however that for some pair of genes, say gi and gj in , specific information exists about their signed-directed regulation. Any such knowledge is expressed as atoms of the form established_regulation(gi, gj, s) where gi and gj are different genes in and s is again the type of regulation. For instance, a ground atom of the form established_regulation(gi, gj, 1) states that gj is a known activator of gi, whereas a ground atom of the form established_regulation(gi, gj, –1) denotes that gj is a known inhibitor of gi. Again, our integrity constraints guarantee that abduced signed-directed regulations between genes are consistent with any already known type of regulation. We denote with ERprior the following set of ground facts:

    [1.3]

    Finally, information about experimental data is also part of the prior knowledge. This includes the expression value of the genes measured in an experiment², represented using ground facts of the form exp_data(gi, s), where gi is a gene and s is the state of the gene, which can be equal to 1 (respectively, –1) to denote that the expression value of gi has increased (respectively, decreased). Specific information about genes that have been potentially overpowered during the biological regulation process is also computed from the experimental data and added to the prior knowledge as ground facts of the form overpowered(g, gi, gj), where g, gi and gj are different genes. This fact captures the biological notion that the effect of gene gi on g has overpowered the effect of gene gj on g. For this, to occur the degree of interdependency between the expression value of gi and g, multiplied by the degree by which the expression value of gene gi has increased, is higher then the inter-dependency between the expression value of gj and g, multiplied by the degree by which the expression value of gene gj has decreased. This function is computed using statistical packages provided by R/Bioconductor project [GEN 04]. Last, but not least, experimental data also includes the notion of a subset of genes, within the large pool , that are considered to be seed genes. This information is represented using ground facts of the form seed(gi), which states that gene gi is a seed gene.

    [1.4]

    In summary, the prior knowledge of our ARNI’s background knowledge, denoted with BPrior, is given by the union of specific subsets of the sets (1.1)-(1.4).

    1.2.2.2. Rule-based underlying model

    The core rules of our model seek to connect a set of genes (i.e. the seed genes), which have been affected in a biological experiment, to each other, either directly or indirectly by using the information about PDI and PPI interactions given in the prior knowledge, and to abduce signed-directionality between linked genes that are consistent with the (biological) integrity constraints explained in section 1.2.2.3. This consists of computing all possible paths that connect seed genes within a given maximum length, using the following rule-based logic:

    [1.5]

    [1.6]

    [1.7]

    [1.8]

    Rule [1.5] has the effect of constructing a path within the maximum length boundary (MaxLength) that links two seeds genes (i.e. G1 and G2). The path is recursively computed by checking that no gene is revisited more than once (i.e. rule [1.7]), and that only relevant genes, according to the existing prior knowledge of interactive potentials between genes, are added to a path (i.e. rule [1.8]). The latter case is captured by the use of the abducible predicate relevant_ip(G1, G2), and the following integrity constraint:

    [1.9]

    The abducibles relevant_ip(gi, gj) identify all the genes from a given pool that, according to prior biological knowledge are biologically relevant in regulations that can directly or indirectly affect the given seed genes. The use of these abducibles allows us to constrain the space of our regulation network in a biologically meaningful way making the computation process more manageable. Assumptions about relevant_ip(G1, G2) may also be abduced in order to satisfy other constraints, discussed later, so to guarantee their connectiveness with other genes the following constraint is enforced:

    [1.10]

    Paths generated by the above clauses are sequences of genes, which are connected with each other according to the abduced relevant_ip(gi, gj) directed link³. But to generate a regulatory network, the directed links have to be signed. The inference of the sign for each abduced directed link is generated by means of the following integrity constraint:

    [1.11]

    [1.12]

    [1.13]

    [1.14]

    [1.15]

    where predicates compatible(G1, G2, S) and competitive(G1, G2, S) are also abducibles and they fully capture the notion of a signed-directed link between two genes. Note that the above constraints [1.11]–[1.15], together with the constraints on sign consistency given later, define in effect the notion of relevant interactive potential between two genes in terms of either compatible or competitive influence. In addition to constraints [1.9]–[1.15], abduced signed-directed links have to be consistent with existing knowledge: for some pairs of genes, the signed-directed link might already be known. In this case, the prior knowledge would include ground instances of the predicate established_regulation and any abduced compatible fact will have to be consistent with this prior. This is captured by constraint [1.16]. Similarly, the abduced type of compatible or competitive influence that a gene has on another gene has to be consistent with the type of regulatory potential that that gene is known to have (if any). This is expressed in constraints [1.17]–[1.18]. Constraint [1.19], instead, guarantees that competitive regulations are limited to links with an already known regulatory effect. This is done to further limit the solution space for this abducible. Biologists could remove this constraint whenever they intend to pursue a more exploratory analysis:

    [1.16]

    [1.17]

    [1.18]

    [1.19]

    In summary, the rule-based underlying model, Π of our ARNI approach is given by rules in clauses [1.5]–[1.19]. Constraints in clauses [1.5]–[1.19] are part of the IC component of our abductive problem, of which constraints [1.9] and [1.16]–[1.19] guarantee the compatibility of the abduced signed-directed links with the existing knowledge.

    1.2.2.3. Integrity constraints

    As mentioned at the beginning of section 1.2.2.1, our abductive problem is to identify unknown compatible and competitive gene regulations (i.e. signed-directed links) that form a regulatory network which consistently satisfies the observed data. The main abducibles in our ARNI approach are therefore ground facts of compatible(gi, gj, s) and competitive(gi, gj, s), whose first two arguments are genes in and the third argument s, which is a binary variable over the set {1, –1}, denotes the causal effect of the interaction between the two genes gi and gj. For example, an instance of the form compatible(g1, g2, 1) (respectively, compatible(g1, g2, –1)) means that gene g2 activates (respectively, inhibits) gene g1. Abduced sign-directed gene regulations have to be consistent with the four different classes of constraints described in section 1.2.2.2. Constraint of the first class (i.e. compatibility with existing knowledge) are the above constraints [1.9] and [1.12]–[1.16]. We present integrity constraints of classes (b)–(d) and explain their biological relevance.

    Activation and inhibition regulations between genes is formalized by instances of the abducibles compatible(G1, G2, 1) and compatible(G1, G2, –1). So, why do we also need to infer competitive influence (i.e. competitive(Gi, Gj, S))? The biological motivation for modeling competitive gene influences is to reflect the underlying structure of real biological networks, where crosstalk between signaling pathways, regulatory feedback mechanisms and redundancy are common aspects of a biological system. The incoherent network motifs of feed forward loop (FFL) and negative feedback loop, discussed in section 1.1, inherently consist of competitive gene influences. Any inference method aiming to detect such motifs, needs to either rely on multiple experiments to expose each of the influences individually, or to model the concept of competitive gene influences explicitly, as done in our approach. The latter case has the added advantage that network motifs can be detected using less experimental data, and competitive gene influences can be placed within the same network as their compatible counterparts. Including these regulations in the final solutions is also important for the applicability of the inferred networks within the scope of planning future experiments and network based drug discovery/repositioning. Following an experimental perturbation, competitive gene influences could compensate for the intended experimental response and thus rendering the experiments non-informative. Similarly, competitive gene influences that are enhanced in the presence of a drug might lead to unforeseen side effects. Overlooking the problem of competitive gene influences can result in inconsistencies between the observed and predicted drug effects/experiment outcomes, hindering the process of knowledge discovery.

    The inference of compatible(Gi, Gj, S) and competitive(G1, G2, S) has to comply not only with existing knowledge, but also with experimental data, biological principles of sign consistency and internal logical consistency of the model. These principles are expressed in our ARNI approach as domain specific integrity constraints. This is where our ARNI approach benefits from its abductive logic-based inference process. According to the type of biological experiments and investigation in hand, different classes of constraints could be added or deleted without affecting the formal framework (e.g. in order to compute specific types of regulatory networks (e.g. networks with specific regulatory motifs: and gates, or gates, etc.).

    One of the key biological principles is sign consistency. Sign consistency states that inferred gene interactions must satisfy two main gene dependency rules: compatible gene influence and competitive gene influence. The compatible gene influence postulates that the state of a target gene G1 is directly related to the state of an activator G2 and inversely related to the state of an inhibitor G2. To specify these principles we make use of an additional predicate, called state, which takes two arguments, a gene and a state value. The state value of a gene can be 1 to signify the gene expression is increased, and value –1 to represent that the gene expression has decreased. A ground literal of the form state(g1, 1) means that the expressive value of gene g1 has increased during the experiment. Since not all states of relevant genes are measurable in an experiment, the information about the state of each gene in our pool is only partially present in our background knowledge. To guarantee full consistency of our regulatory network, the state predicate is therefore considered to be an additional abducible. Integrity constraints for sign consistency include:

    [1.20]

    [1.21]

    [1.22]

    [1.23]

    The incompatibility of the competitive gene inference with experimental data implies that the abductive inference of competitive (G1,G2,S) cannot be driven by the data. The search space explosion in allowing the competitive regulators to be abduced without any constraints is practically prohibitive and hinders the usability of the inferred networks. Therefore, the following constraints and related definitions [1.24]–[1.28] are included in our model to capture two typical cases of competitive regulators that bypass the sign consistency principle:

    [1.24]

    [1.25]

    [1.26]

    [1.27]

    [1.28]

    Integrity constraint [1.24] guarantees that competitive regulators are only inferred if there is an exception that holds. A gene, say g1, can have an inconsistent state with respect to the state of its regulator, say g2 provided that there exist at least one other compatible gene, say g3 that consistently regulates g1, hence overpowering the influence of g2. This principle is captured by rule [1.25]. Exceptions of the above form, are derived from the data by means of an overpowered influence function that determines the truth of the condition overpowered(g, gi, gj). Once pre-calculated from the data (see section 1.2.2.1), this information is added as fact to the prior knowledge to express the biological notion that the effect of gene gi on g has overpowered the effect of gene gj on g.

    Because of the way the overpowered facts are computed there is the additional implicit constraint that genes that can participate in competitive regulations, must have been observed as either up-regulated or down-regulated. Given the sparsity in the microarray data, where the signal is fragmented due to the noise, and the abstraction of all biological regulation to gene regulation, such situations are not very common. In the absence of additional priors information (e.g. kinetic information, promoter affinities), that can give information on the relative impact of competitive influences, our model includes an additional exception case based on the biological principle of how competitive regulators can participate in some specified network motifs. These are captured by rules [1.26]–[1.28] that correspond, respectively, to the three scenarios in Figure 1.4, where the dashed links represent the competitive influence link involved in the overpowered exception.

    Figure 1.4. Network motifs of competitive influence

    The sign value of the three sign-directed links that are involved in these motifs have to satisfy one of the predefined incoherent feed forward loop cases, expressed by the fact iff(S1, S2, S3) and graphically illustrated in Figure 1.5. Note that the three motif examples given in Figure 1.4 have all the same configuration iff(1, 1, –1). Similar groups of three motifs, one for each of the four possible configurations of incoherent feed forward loops, could happen in regulatory networks.

    Figure 1.5. Configurations of incoherent feed forward loops, (iff(S1, S2, S3)

    During the inference process many compatible and competitive abducibles can be generated. It is important to guarantee that a gene is not assumed to be at the same time a compatible and a competitive regulator of another gene. This is captured by the integrity constraint [1.29]. Similarly, a compatible (respectively, competitive) gene cannot be both activator and inhibitor of another gene. Constraints [1.30]–[1.31] make sure that this principle is satisfied during the inference of signed directed links between genes, whereas constraint [1.32] enforces that a gene can have only one unique state value (i.e. can either decrease or increase its expressive value during a single experiment).

    [1.29]

    [1.30]

    [1.31]

    [1.32]

    The state of a gene is an abducible in our model. This is because given an experiment it is not guaranteed that data about the expression value for each gene will be available (i.e. the background knowledge may include only a subset of the set [1.4]). So for genes that have an expression value the abduced state needs to be consistent with the available experimental data. This is captured by the following constraint [1.33]. For the remaining genes in our identified pool, called in this case hidden genes, any of the two states could be abduced provided that the overall set of IC is satisfied.

    [1.33]

    In summary, the full set of integrity constraints included in our ARNI abductive approach, denoted with IC, is given by constraints [1.9]–[1.33].

    1.2.2.4. Inferring signed-directed networks and explanatory reasoning

    As mentioned in section 1.1, in our ARNI approach we can employ abductive reasoning for both inferring a signed-directed regulatory networks from experimental data and enable explanatory scientific reasoning about signal propagations over the generated network in order to help biologists plan the next sets of experiments or improve their understanding of the phenomena in hand. The first abductive reasoning task makes use of the full logical model described in this section. Specifically, it uses as background knowledge the model Π and the knowledge BPrior, which includes a set of experimental data. The set of abducibles is the collection of all ground instances of the abducible predicates compatible, competitive and state, together with all ground instances of the auxiliary abducible relevant_ip. All these abducible notions are necessary because of the limited available knowledge (i.e. biological information already existing in online databases and the given experimental data), and the desire to generate realistic signed-directed regulatory networks that have complex structures (e.g. include feedback loops, competitive regulations, etc.). The set of integrity constraints IC includes all the constraints described in this section. Hence, the question that we are interested in answering in this first type of abductive reasoning task is: what is a realistic signed-directed regulatory network that has generated the given set of experimental data? An answer to this question is the abductive inference of a maximal set of signed-directed links between genes with relevant interactive potential that are consistent with the given integrity constraints and the genes’ expression level described by the experiment data. The collection of all abduced compatible and competitive predicates, computed in this answer, formally describe such a signed-directed regulatory network. This abductive reasoning task can be formally defined as follows:

    DEFINITION 1.2.– Abductive inference of regulatory networks. Let the background knowledge B = BPrior ∪ Π, IC be the set of integrity constraints [1.9]–[1.33], be the set of all possible ground instances of the abducible predicates compatible, competitive, state and relevant_ip. An signed-directed regulatory network inference is the abductive task B, IC, 〉 with abductive solution a set Δ ⊆ such that:

    for any δ ∈ /Δ

    The ARNI abductive task may compute more the one possible maximal regulatory network. If the prior knowledge of regulatory_potential(gi, gj) is complete for all genes in and the gene expression value of every gene is available in the experimental data, then there would be only a single maximal

    Enjoying the preview?
    Page 1 of 1