2010 InductiveLogicProgrammingAsAbdu

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Abductive Logic Programming Task; Top-Directed Abductive Learning Task

Notes

Cited By

Quotes

Author Keywords

Abstract

We present a novel approach to non-monotonic ILP and its implementation called TAL (Top-directed Abductive Learning). TAL overcomes some of the completeness problems of ILP systems based on Inverse Entailment and is the first top-down ILP system that allows background theories and hypotheses to be normal logic programs. The approach relies on mapping an ILP problem into an equivalent ALP one. This enables the use of established ALP proof procedures and the specification of richer language bias with integrity constraints. The mapping provides a principled search space for an ILP problem, over which an abductive search is used to compute inductive solutions.

Introduction

Inductive Logic Programming (ILP) [Lav94] is a machine learning technique concerned With the induction of logic theories from positive and negative examples and has been successfully applied to a Wide range of problems [D50]. Its main virtue, the highly expressive representation language, is also the cause of its high computational complexity. Some ILP systems attempt to efficiently find a less then perfect hypothesis by using heuristics to navigate the search space effectively [Qui96], [Ri095]. Others focus on completeness and aim for perfect accuracy With respect to the examples, searching the space thoroughly for an optimal solution. Among these XHAIL [Ray09a] has identified Abductive Logic Programming (ALP) [Kak92] as a means to deal With incomplete theories and provide semantics for negation as failure (NAF) [C1a77]. XHAIL, like other inverse entailmemfi (IE) based systems, abductively derives a lower bound for the search space that is then generalised. In contrast, Top-down ILP systems like [MugOES, Bra99, B0s94] construct the hypothesis by specialising an overly general theory Without a lower bound. However existing top-down systems limit the expressiveness of the language and the possible outcome of the learning (egg. concepts learned must be observed in the training data, recursion is not allowed and the use of negation is limited).

Abductive proof procedures have been extensively employed as part of ILP systems (egg. [Esp00]) or extended for inductive reasoning (egg. [Adé95]). In contrast to these existing approaches, we propose a novel mechanism that maps an [LP problem into an equivalent ALP instance. An ILP task is thus translated into an ALP problem Whose solution is translated back into a solution of the original problem. The resulting top-down ILP system, called TAL (Top—directed Abductive Learning), offers several advantages over existing techniques. TAL is able to handle negation Within the learning process and is able to learn non—monotonic hypotheses, relying on the semantics of the underlying abductive proof procedure employed; allows expressive language bias specifications that subsume mode declarations and can be combined With integrity constraints; performs non—observational [M0y03] and multiple predicate learning [Ma198]; and makes use of constraint solving techniques. Non—monotonic ILP has been successfully applied to bioinformatics [R.ay08] and requirement engineering [Alr09] and as showed in [C0r09] and [Ray09b] can also be employed to perform theory revision.

In the particular case of definite theories, the TAL search space includes hypotheses that are not found by established Inverse Entailment based systems like PROGOL [Mug95] 0r ALECTO [M0y03] and provides a more effective solution to learning interdependent concepts compared to the state of art ILP systems, e.g. [Kin109, Ray09a]. Though not explored in depth here, the principled search space characterised by ALP includes abductive solutions that represent partial inductive solutions, Which can be measured in terms of some scoring function (e.g. accuracy on the given set of examples) thereby enabling the use of heuristic based search strategies.

The paper is organised as follows. First, we introduce the notation and relevant background concepts. We then describe the representation underlying the learning system and discuss the learning mechanism. Then, we present through examples some of the main features of the system7 and discuss related work. We conclude With final remarks and directions for future work.

1. Abductive and Inductive Logic Programming

ALP and ILP are extensions of logic programming. They both search for a hypothesis that is able to account for some given evidence. ALP constructs hypotheses in the form of ground facts. ILP systems generate rules that are able to discriminate between positive and negative examples that represent the training data. In general, ILP is regarded as a machine learning technique and used When a certain knowledge base must be enriched With rules that are also able to Classify new examples. We assume in the following that the reader is familiar With first—order logic and logic programming [L1087]. Following Prolog [Sha94] conventions, predicates, terms and functions are represented With an initial lower case letter and variables are represented With an initial capital letter.

Definition 1.1. An ALP task is defined as [math]\displaystyle{ \langle g, T, A, I \rangle }[/math] where [math]\displaystyle{ T }[/math] is a normal logic program, [math]\displaystyle{ A }[/math] is a set of abducible facts, [math]\displaystyle{ I }[/math] is a set of integrity constraints and [math]\displaystyle{ g }[/math] is a ground goal. [math]\displaystyle{ \Delta \in ALP \langle g, T, A, I \rangle }[/math] is an abductive solution for the ALP task [math]\displaystyle{ \langle g, T, A, I\rangle }[/math], if [math]\displaystyle{ \Delta \subseteq A, T \cup \Delta }[/math] is consistent, [math]\displaystyle{ T \cup \Delta \models g }[/math] and [math]\displaystyle{ T \cup \Delta \models I }[/math]. [math]\displaystyle{ ALP \langle g, T, A, I \rangle }[/math] denotes the set of all abductive solutions for the given ALP task.

Note that the abductive task, as defined, deals With ground goals, thus being a specific case of the setting in [Kak92]. The notion of entailment is not fixed since, as discussed later, the approach proposed in this paper is not committed to a particular semantics.

Definition 1.2. An ILP task is defined as (E, B, S) Where E is a set of ground positive or negative literals, called examples, B is a background theory and S is a set of Clauses called language bias. The theory H E ILP(E,B,S) called hypothesis is an inductive solution for the task (E,B,S) if H g S H is consistent with B and B U H |= E. ILP(E,B,S) denotes the set of all inductive solutions for the given task.

We consider the case where B and H are normal logic programs and E is a set of ground literals (with positive and negative ground literals representing positive and negative examples, respectively).

The space of possible solutions is inherently large for all meaningful applications so different levels of constraints are imposed to restrict the search for hypotheses. When possible, besides the background knowledge about the modelled world, some a priori knowledge on the structure of the hypothesis can be employed to impose an instance—specific language bias S. Mode declarations are a common tool to specify a language bias.

Definition 1.3. A mode declaration is either a head or body declaration, respectively m0deh(s) and m0deb(s) where s is called a schema. A schema s is a ground literal containing placemarkers. A placemarker is either 7+type7 (input)7 77type7 (output)7 7#type7 (ground) where type is a constant.

Given a schema s, 5* is the literal obtained from s by replacing all placemarkers with different variables X1, ..., Xn; type(s*) denotes the conjunction of literals 151(X1), ..., tn(Xn) such that 15.- is the type of the placemarker replaced by the variable Xi; ground(s*) is the list of the variables that replace the ground placemarkers in 5, listed in order of appearance left to right. Similarly inputs(s*) and outputs(s*) are, respectively, the lists of the variables replacing input and output placemarkers in s. For example, for mode declaration m3 in Sec. 3, s = even(+mnf), 5* = even(X), type(s*) = muf(X), outputs(s*) = ground(s*) = H, inputs(s*) = [X].

A rule 7‘ is compatible with a set M of mode declarations iff (a) there is a mapping from each head/body literal l in 7‘ to a corresponding head/body declaration m E M with schema s and l is subsumed by 5*; (b) each output placemarker is bound to an output variable; (C) each input placemarker is bound to an output variable appearing in the body before or to a variable in the head; (d) each ground placemarker is bound to a ground term; (e) all variables and terms are of the corresponding type. The use of head variables in output mode declarations is not discussed here for space constraints.

In the next sections the language bias S is specified in terms of a set M of mode declarations, and denoted as S(M). Each mode declaration m E M is uniquely identified by a label idm, called mode declaration identifier.

2. TAL

An ILP problem can be seen as a search for a hypothesis where we Choose how many rules to use and for each rule which predicates to use in the head and in each of the body conditions and how many body conditions are used. Additionally, different Choices are possible on how arguments are unified or grounded. In this section, we present the mapping of a set M of mode declarations into a top theory T that constrains the search by imposing a generality upper bound on the inductive solution. An abductive proof procedure is instantiated on this top theory together with the background theory. The abductive derivation identifies the heads of the rules (of a hypothesis solution) and the conditions needed to cover positive examples and exclude negative examples, ensuring consistency. The abductive solution is guaranteed to have a corresponding inductive hypothesis H that is a solution with respect to the examples.

2.1. Hypothesis Representation

We use a list—based representation to encode an inductive solution, as a set of rules, into a set of facts by mapping each literal in the Clauses into instances of its corresponding mode declaration. This allows the representation of rules as abducible facts. An inductive hypothesis H g S(M) is composed of a set of rules {71, ..., 7‘n}. Each rule 7‘ is associated with an output list, i.e. an ordered list of the variables in 7‘ that replace output placemarkers in body literals or input placemarkers in the head. The output identifier associated with each of these variables is the position of the variable in the output list. Given a set M of mode declarations, each rule 7‘ of the form [1 e [27 ..., ln7 compatible with M , can be represented as an ordered list l = [11,12,...,1n] where each 11 is a tuple (Mm, [01,...,cp],[01,...,0q]); Mm is the identifier of the mode declaration in M that Z. maps to; each cj is a ground term replacing a ground placemarker in the mode declaration identified by Mm; each 0k is an output identifier and encodes the fact that the km input variable in Z. (in order of appearance left to right) unifies with the variable indicated by 0k. We refer to this transformation of a rule 7‘ into a list l as the function Z = tM(7‘).

Example 2.1. Given the following three mode declarations M = {m1 : m0deh(p(+any))7 m2 : m0deb(q(+any7 #any», m3 : m0deb(q(+any7 iany))} the rule 7‘ = p(X) e q(X,Y),q(Y7 a)7 compatible with M7 is associated to the output list [X,Y] where X replaces the input placemarker in m1 and Y replaces the output placemarker in m3. The output identifier of X is the integer 1 and the output identifier of Y is 2. 7‘ is represented as the list l = [(ml7 H, [])7 (m37 H, [1])7 (m27 [a]7 [2])] = tM(7‘). The first element of the list associates the head p(X) to mode declaration ml. The second element associates the condition g(X7 Y) to mode declaration m3 and links the first and only input variable to X. The third element of the list associates the condition q(Y7 a) to mode declaration m27 links the input variable to Y and instantiates the second argument to a.

It is easy to see that every possible rule within S(M) can be encoded according to this representation. Also7 it is always possible to derive a unique rule 7‘ from a well-formed list Z. We refer to this transformation as 7‘ = tM_1(l).

Rules in a final inductive hypothesis theory H , are associated with a unique rule identifier Tub an integer from 1 to the maximum number7 MNR7 of rules allowed in H. The abductive representation A = TM(H) of an hypothesis theory H , is the set of facts 7‘ule(7‘Z-d7 l) 6 A7 one for each rule 7‘ 6 H7 such that (a) 73d is the rule identifier of 7‘; and (b) l = tM(7‘). The inverse transformation H = TM_1(A) is similarly defined.

2.2. Mapping mode declarations into an abductive top theory

The first computational step in TAL is the generation of a top theory T from a set M of mode declarations, as defined below, where prule and rule are abducible predicates.

Definition 2.2. Given a set M of mode declarations7 T = f (M ) is constructed as follows:

  • For each head declaration m0deh(s)7 with unique identifier Mm, the following Clause

is in T 5* e type(s*),pruleU-Hd7 [7617,” ground(s*)7 HD, 7‘ule_7’d(RId)7 bodyU-Hd7 inputs(s*)7 [7617,” ground(s*)7 ill)

  • The following Clause is in T

body(RId7 _, Rule) P Tule(RId,Rule) (2.2) (2.1)

  • For each body declaration m0deb(5)7 with identifier idm the following Clause is in T

b0dy(RId. Inputs. Rule) P append(Rule. [(7d7n. ground(s*). L7nk5)] . NRule). prule(RId. NRule). linkJMLT’iCLblES (inputs (5* ). Inputs. Links). 5*. (2.3) type(s*). appendflnputs. outputs(s*). Outputs). b0dy(RId. Outputs. NRule)

As previously defined, 5* contains newly defined variables instead of placemarkers in the schema 5. Since the abductive derivation is instantiated on B U T7 for all predicates appearing in head mode declarations the procedure can both use the partial definition in B or learn a new Clause7 unifying the current goal with 5*. 5* can also be a negative condition corresponding to a body mode declaration whose schema is a negative literal. 7‘ule_7'd(RId) is true whenever I g RId g MNR.

b0dy(7‘,7',c) keeps track of the rules that are built and provides the Choice of extending a partial rule (rule (2.3)) or delivering a rule as final (rule (2.2)). The first argument is the rule identifier; the second is the list of the outputs collected from the literals already added to the rule; the third is the list representing a partial rule. link_va7‘7'able5([a1,...,am],[b1,...,bn],[01,...,0m]) succeeds if for each element in the first list ai, there exist an element in the second list bj such that ai uni— fies with bj and 02' = j. append(l1,l2,l3) has the standard Prolog definition [Sha94]. The abducible prule is used to control the search through integrity constraints. For example7 if we are not interested in searching for rules in which two mode declarations 7' and j appear together in the body7 then an integrity constraint of the type A p7‘ule(_,R1),membe7‘((7’,_,_),R1),membe7‘((j,_,_),R1) can be added to I to prune such solutions in the abductive search.

In order to maintain the well-formedness of our rule7s encoding and avoid trivial states of the search7 a set of fixed integrity constraints I f (omitted for brevity) is used in the abductive search.

2.3. Learning

Definition 2.3. Given an ILP task (E7 B7 M)7 H = TM_1(A) is an inductive solution derived by TAL ifi A E ALP(g7 BUT7 I7 A) where T = f(M)7 g is the negated conjunction of the literals in E7 I is a set of integrity constraints that includes If and A = {7‘ule/27 p7‘ule/2}

Procedurally, an initial translation produces the ALP task introducing the new the— ory T. An abductive proof procedure derives the abductive hypothesis A that is then transformed into the final inductive solution.

Theorem 2.4. Let us consider a theory B, a 5613M of mode declarations, a conjunction of (possibly negated) literals E, and a setH ofmles, such thatH g 5(M). Then BUH l-SLDNF E ZfiB U T U A lTSLDNF E, where T = f(M) and A = TM(H)

As corollaries of Theorem (2.4)7 it is possible to establish soundness and completeness of our TAL system based on the properties of the underlying abductive system and on the soundness and completeness of SLDNF w.r.t. the semantics.

3. Example

The following is a modified version of the well—known example proposed in [Yam97], where even and odd numbers are both target concepts and learnt from three examples of odd numbers. The even predicate is partially defined, i.e. base case even(0).

Figure 1: Top theory T for the even-odd example.

We assume the set I’ of integrity constraints to restrict the language bias, which establishes that rules whose head is even(X) or 0dd(X) cannot have in the body even(X) or odd(X) literals. The final I is the union of I ’ and I f. The set M of mode declarations is transformed into the top theory T given in Figure 1. The instantiated abductive task [math]\displaystyle{ \langle E, B \cup T, I, \{rule/2,prule/2\} \rangle }[/math] accepts then as a possible solution the set A translated into the inductive hypothesis H as follows[1]:

In the abductive search, the standard Prolog selection rule is adopted that selects clauses in order of appearance in the program. Since no head of clause in B unifies with the positive examples, the derivation uses one of the rules defined in T. The selection of the body literal from the rule results in four derivation branches in the search tree, one for each of the four “body” clauses whose head unifies with it. A partial abductive hypothesis is generated, equivalent to the rule 0dd(X) e X = 5(Y), even(Y). At this point, the condition even(5(5(5(5(0))))), part of the current goal, is not entailed by B so one of the rules in T is used. It can be seen as an “artificial” example conditional to the partial hypothesis. The derivation results in the creation of an additional rule in the final hypothesis that defines the predicate even. The computation continues, thus excluding inconsistent hypotheses and those that entail also negative examples resulting in the final A. Partial rules are derived and used throughout the search so they can be referenced to define concepts that depend on them. It is also interesting to observe that the search is guided by the examples and thus only significant solutions are explored. The inductive solution H for this inductive problem is either not found by other ILP systems like PROGOL, or derived after a “blind” search as discussed in Sec. 5. The learning is non—observational (i.e. the even predicate is not observed in the examples). TAL is also able to learn the base case of the recursion. If the fact even(0) is deleted from B and the mode declaration m0deh(even(#nat)) is added to M , TAL returns three solutions with the same set of examples: the first has the same definition of odd as in H and defines even(5(5(5(5(0))))) as base case, the second and the third are the same as in H with even(5(5(0))) and even(0) respectively as base cases.

4. A case study

We employ a case study to compare TAL with the only other system capable of solving the same class of ILP problems XHAIL[2]. The following case study, taken from [Ray09a], represents a simple model of metabolic regulation for the bacterium E. coli and includes a formulation of the Event Calculus [Sha99] formalism. The target predicate happen5 is used to characterise the bacterium feeding mechanism based on the availability of sugar. See [Ray09a] for a more extensive explanation of the example.

As discussed in [Ray09a], XHAIL generates Kernel Sets that serve as lower bound for the final hypothesis, through iterations of increasing in size abductive explanations until a satisfactory solution is found. Intuitively, a Kernel Set is computed in two phases. A first abductive phase finds the set A of the head of the rules in the Kernel Set and a second deductive phase constructs the body of the rules by computing all the ground instantiations of the body mode declarations that are implied by B U A. Kernel Sets are generalised in a final inductive phase. Instead, TAL explores candidate solutions in a top-down manner, backtracking whenever the current solution leads to failure in the abductive derivation. The partial hypotheses are already in their final form and are implicitly tested for correctness whenever a new example is selected in the abductive derivation.

5. Discussion and Related Work

We have implemented TAL in YAP Prolog [Cos08] using a customised implementation of the ASYSTEM [KakUI] that integrates the SLDNFA proof procedure with constraint solving techniques. TAL has been tested on various non—monotonic ILP problems like the examples proposed in [Kim09], [Alr09] and [Ray07]. It has also been used to perform Theory Revision [Wr096], i.e. to change, according to some notion of minimality, an existing theory [Cor09]. This work is motivated by the project [Ban08] that seeks to exploit the proposed approach in the context of learning privacy policies from usage logs. We performed preliminary experiments in this direction applying an extended version of TAL to the Reality Mining dataset [Eag06] where examples of refused calls were used to learn general rules. A score based on accuracy and complexity of the rules was employed to prune the search space.

The idea of a top theory as bias for the learning has been initially introduced in TOPLOG [Mug08], which performs deductive reasoning on the background knowledge extended with the top theory. Candidate hypotheses are derived from single positive examples and then the best ones are selected after a hill climbing search. SPECTRE [Bos94] also requires a user—provided overly general theory that is specialised by unfolding clauses until no negative examples are covered. HYPER [Bragg], specialises an overly general theory, deriving rules that are subsumed by those in the theory. Thus the number of rules in the final hypotheses cannot increase. FOIL and related approaches like [Coh94] perform an informed hill climbing search. These systems are not fully non—monotonic since they disallow negation in the background knowledge or in the hypotheses. In contrast to TOPLOG, TAL generates candidate hypotheses also considering negative examples, excluding a priori solutions that entail some of the negative examples. In general, we regard TAL a generalisation of other top-down ILP systems. Constraining it to consider only positive examples, the background theory and mode declarations being definite, would result in the same rule generation mechanism as TOPLOG. Different search strategies can be easily implemented by modifying the abductive procedure. Partial solutions can be associated with a score with respect to the examples (e.g. the sum of entailed examples over the total). This would enable the use of informed search techniques and strategies like, for instance, hill climbing or beam search that can be used to prune the space, exploring the most promising solutions. Similarly to TOPLOG [Mug08], our approach can also be applied directly to a grammar based language bias specification, instead of generating the top theory from mode declarations. Systems based on Inverse Entailment (IE), compute a bottom clause, or set of clauses (e.g. the Kernel Set) that constrains the search space from the bottom of the generality lattice. For problems dealing with definite theories our system manages to solve a wider class of problems than PROGOL, since one single example can generate more than one rule in H . IMPARO [Kim09] solves a class of problems whose solutions are not found by other IE based systems, namely connected theories where body conditions are abductively proved from the background theory. These problems, that are solved in Imparo by applying Induction; Failure (IoF), can also be solved by TAL, as shown in the example given in this paper. The IoF mechanism is in our system embedded in the abductive search that always includes in the search space the generation of a new rule whenever a condition is not entailed by the current theory. XHAIL can find hypotheses computed under IoF by exploring non—minimal abductive explanations but the search is not guided by the background theory and a partial hypothesis[3]. This highlights another advantage of TAL: the computation of clause heads in the hypothesis is naturally interleaved with the generation of the body and it does not take place in a separate phase as in IMPARO and XHAIL. Moreover, all rules are constructed concurrently and their partial definitions can be used. [Kak00] propose a system for inductive learning of logic programs that compared to TAL is limited to observational predicate learning. Finally, [Adé95] introduces induction in the abductive SLDNFA procedure, defining an extended proof procedure called SLDNFAI. In contrast, TAL defines a general methodology and does not commit to a particular proof procedure. Moreover, SLDNFAI does not allow a fine specification of the language bias, makes no use of constraints on the generated hypotheses and is limited to function—free definite clauses.

6. Conclusions and further work

We have presented a novel approach to non-monotonic ILP that relies on the transformation of an ILP task into an equivalent ALP task. We showed through an example how the approach is able to perform non-observational and multi—predicate learning of normal logic programs by means of a top-down search guided by the examples and abductive integrity constraints where a partial hypothesis is used in the derivation of new rules. In contrast, techniques based on IE perform a blind search or are not able to derive a solution. The mapping into ALP offers several advantages. Soundness and completeness can be established on the basis of the abductive proof procedure employed. Constraint solving techniques and optimised ALP implementations can be used and abductive integrity constraints on the structure of the rule can be employed. Furthermore, the search space makes use of partial hypotheses that allows the use of informed search techniques, thus providing a general framework that can scale to learning problems with large datasets and theories. We obtained promising result in this direction and we are currently evaluating the use of heuristics and informed search techniques. We plan to investigate the properties of the mapping and the relationships with the search space of other ILP techniques.

Acknowledgements

The authors are grateful to Krysia Broda, Robert Craven, Tim Kimber, Jiefei Ma, Oliver Ray and Daniel Sykes for their useful discussions. This work is funded by the UK EPSRC (EP/F023294/1) and supported by IBM Research as part of their Open Collaborative Research (OCR) initiative.

Footnotes

  1. prule abducibles are omitted for brevity.
  2. Relying; on the results reported in [Ray09a], the computation time for this study appears to differ by one order of magnitude. [Ray09a] reports that a prototype XHAIL implementation took a couple of seconds to compute H on a 1.66 GHZ Centrino Duo Laptop PC with 1 GB of RAM, while TAL took 30 ms to find H and 180 ms to explore the whole space of hypotheses limited to two clauses with at most two conditions on a 2.8 GHZ Intel Core 2 Duo iMac with 2 GB of RAM. Unfortunately, XHAIL is not publicly available so we are not able to perform an empirical comparison of the performance of the two systems.
  3. In the “odd/even” example the only way for XHAIL to find the correct solution is to extend the minimal abductive solution 0dd(5(5(5(5(5(0))))) with even(5(5(5(5(0)))) and even(5(5(5(5(5(5(0)))))) that have to be somehow chosen from a set of infinite candidates.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 InductiveLogicProgrammingAsAbduDomenico Corapi
Alessandra Russo
Emil C. Lupu
Inductive Logic Programming As Abductive Search10.4230/LIPIcs.ICLP.2010.542010