2010 ANumericalRefinementOperatorbas

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Refinement Operator, Numerical Refinement Operator.

Notes

Cited By

Quotes

Abstract

We present a numerical refinement operator based on multi-instance learning. In the approach, the task of handling numerical variables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance learners available in the Weka workbench (like MISVMs). These clauses are used in a boosting approach that can take advantage of the margin information, going beyond standard covering procedures or the discrete boosting of rules, like in SLIPPER. The approach is evaluated on the problem of hexose binding site prediction, a pharmacological application and mutagenicity prediction. In two of the three applications, the task is to find configurations of points with certain properties in 3D space that characterize either a binding site or drug activity: the logical part of the [[clause constitute[[s the points with their properties, whereas the multi-instance model constrains the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statistical learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.

1 Introduction and Background

It has often been acknowledged that numerical learning in ILP is limited because of the Choice Of logic programming as representation language [1, 2]. Function symbols are not interpreted in logic programming, they simply are seen as functors 0f Herbrand terms. For instance, the + function symbol being not interpreted, both terms of the following equation cannot be unified and the equation X+Y : 0 cannot be solved. To solve this problem, the hypothesis representation language has been extended by a Constraint Programming Language (CLP) [3]. A large number of CLP languages have been proposed, some with complete and efficient solvers. In ILP, the interpreted predicate symbols are Often the same as the ones used in attribute—value learning, like: S, 2, E, but also linear, non-linear, arithmetic or trigonometric functions have been used [4].

The large family of systems able to learn constraints are all based on the technique introduced in the Classical INDUCE system [5] and later popularised and developed in the system REMO [6] and other systems [3, 1, 4, 2]. This technique separates learning the logical part of the hypothesis from learning its constraint part (usually nominal and numerical constraint variables). If we refer to the covering test definition, for the positive examples, at least one of the possible matching substitutions between the logical part of the hypothesis and the logical part of the positive example must satisfy the constraint part. Conversely, for the negative examples, for all possible substitutions, none must satisfy the constraint part. The key idea is to first compute the set of substitutions matching the hypothesis, logical part with the learning examples, and then from the induced tabular representation, where constraint variables are attributes, learn the constraint part of the hypothesis. Zucker and Ganascia note that such a tabular representation is a multi—instance representation in the general case (the constraints are satisfied by at least one matching substitution to a positive example, and none to a negative example), and that multi—instance learners have to be used to learn the hypothesis, constraint part. The different approaches can be compared with respect to the way they define the hypothesis, logical part and when they delegate learning to an attribute—value or a multi—instance learner. INDUCE completely seperates the two processes and first searches for a good logical part (following an log—based approach) which is then specialized by constraints. A subsequent approach [6] sets a single logical part beforehand, either user—specified or built from the examples. Other systems [1,4] limit the constraint part such that they only deal With a single matching substitution, limiting the interest of delegating numerical learning to attribute—value learners. For instance, Anthony and Frisch [1] only allow a constraint variable to appear in the Clause’s head and to limit the number of matchings to one.

In this paper, we present an approach that does not limit the logical part of a hypothesis: we search in the hypothesis space for a good logical part Which, when introducing constraint variables (presently limited to numerical ones), delegates contraint learning to a multi—instance learner. This is different from the Classical INDUCE system and more recent approaches, given that intertwining logical and constraint learning can better guide the search. This also introduces some interesting properties that can be leveraged by a boosting approach (to be explained below). In the following, we present the technical details of the approach.

2 Method

Before we can describe the method in detail, we have to introduce some notation. Let D : {($1,111),(xn,yn)} denote a training set of Classified examples. Each example is described by a set of tuples from several relations over nominal and continuous variables, denoted by cm, and assigned to a Class yi. We restrict ourselves to binary Classification problems in this paper (yi 6 {+17 71]»). The size of the training set is denoted by [D] : n. We follow standard multi—instance terminology and make a distinction between mamples and instances: an example is defined as a bag of instances (to be defined later). As we follow a boosting approach in the outer loop of the algorithm, we have a weight 10, associated with each example, which is initialized to i

In the following we will deal with negation—free program Clauses. Given a set of Clauses, we let t denote the index of the t—th Clause Ct. Clauses are learned one after the other, using the generalisation of boosting to real—valued weak hypotheses [7] (see below). Hence, t not only denotes the index of a Clause, but also the index of the boosting iteration.

Due to the size of the search space, clauses are built in a greedy manner, with one refinement after the other. A refinement consists of the addition of one or several literals to the body of a clause according to the modes of a language bias declaration. The refinement operator providing all specializations of a clause is denoted by [math]\displaystyle{ \rho(C) }[/math].

In the following, our starting point is a Clause C, which is to be refined in a subsequent step:

[math]\displaystyle{ ... }[/math]

Upper—Case Characters X and Y denote, similar to the definition of the training set D above, the identifier of an example (X) and its Class Y (either 1 or +1). Given such a Clause, its variables can be obtained by a function

[math]\displaystyle{ ... }[/math]

Additionally, we have functions Varn(C) picking the nominal variables of a Clause and VarC(C) picking the continuous variables (Var(C) : Varn(C) U VarC(C)).

For simplicity and without loss of generality, we assume that exactly one literal is added to Clause C in the course of a refinement C’ E p(C):

[math]\displaystyle{ ... }[/math]

We make the assumption that at least one additional continuous variable is available after a refinement. In other words for each C’ E p(C) we assume there exists an Xk+17l E VarC(C’).

It is Clear that due to multiplicities (l : n and m : n relationships between the head variable X and the body variables) multi—instance problems over the body variables arise. As our goal is to improve the capability of ILP learning systems to handle continuous variables we let the multi—instance problems range only over those variables of a Clause. The structure of a Clause and the remaining variables only serve to give us the definition of a multi—instance problem. To be more precise, we obtain a dataset for multi—instance learning from first materializing the relation from the body (ranging over all variables Var(C)) and subsequently projecting it onto the variables {X} U VarC(C).

Proceeding in this way, the question is (a) how to guide the search for suitable Clauses and (b) how to decide when to stop refining a Clause.

For the former question we decided to use the margin of the Classifier (in the sense of boosting). Consider the output of the Clause together with the multi— instance Classifier is given by a function h(.), which tells us not only the Class of the prediction (its sign), but also its confidence. Then the mean margin of M.) can be defined as

[math]\displaystyle{ ... }[/math]

As to decide when to stop refining a Clause, we need a criterion that acts as a regularisation parameter for the multi—instance learner. A natural Choice is to limit the number of attributes in the datasets that are passed to it. It translates to limiting the number of constraint variables that can be introduced in the logical part, also regularising its complexity.

For the outer loop generating one Clause plus multi—instance Classifier after the other, we employ the generalization of AdaBoost to real—valued weak hypotheses [7]. For each example covered by a Clause Ct, the function ht(.) (defined in terms of the Clause itself plus its multi—instance Classifier ft(.)) will provide a different prediction. For the examples not covered by the Clause, the weak hypothesis abstains on them and outputs a prediction ofO. In that sense, this is more general than SLIPPER’S rules [8], which either abstain or predict the positive Class. The boosting algorithm will focus on those examples in the later stages forcing the weak learner to search for good logical structures that can discriminate between them:

[math]\displaystyle{ ... }[/math]

Overall, the model that is learned is a sequence of Clauses Ct along with associated multi—instance models ft. Both Ct and ft give ht, the weak Classi— fiers that are boosted in the outer loop of the algorithm. Additionally, we have the weights originating from the boosting iterations: ((h17oz1)7 . . . 7 (hT7 ozT)) : (((C17 f1)7 ozl)7 . . . 7 ((CT7 fT)7 ozT)). In the following we call the described method NuRMI (Numerical Refinement operator based on Multi—Instance learning).

3 Experiments

3.1 Datasets

3.2 Experimental Setup

3.3 Experimental Results

4 Conclusion

Footnotes


References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 ANumericalRefinementOperatorbasStefan Kramer
Erick Alphonse
Tobias Girschick
Fabian Buchwald
A Numerical Refinement Operator based on Multi-instance Learning2010