# 2010 ANumericalRefinementOperatorbas

Subject Headings: Refinement Operator, Numerical Refinement Operator.

## Quotes

### Abstract

We present a numerical refinement operator based on multi-instance learning. In the approach, the task of handling numerical variables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance learners available in the Weka workbench (like MISVMs). These clauses are used in a boosting approach that can take advantage of the margin information, going beyond standard covering procedures or the discrete boosting of rules, like in SLIPPER. The approach is evaluated on the problem of hexose binding site prediction, a pharmacological application and mutagenicity prediction. In two of the three applications, the task is to find configurations of points with certain properties in 3D space that characterize either a binding site or drug activity: the logical part of the [[clause constitute[[s the points with their properties, whereas the multi-instance model constrains the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statistical learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.

### 1 Introduction and Background

It has often been acknowledged that numerical learning in ILP is limited because of the Choice Of logic programming as representation language [1, 2]. Function symbols are not interpreted in logic programming, they simply are seen as functors 0f Herbrand terms. For instance, the + function symbol being not interpreted, both terms of the following equation cannot be uniﬁed and the equation X+Y : 0 cannot be solved. To solve this problem, the hypothesis representation language has been extended by a Constraint Programming Language (CLP) [3]. A large number of CLP languages have been proposed, some with complete and efﬁcient solvers. In ILP, the interpreted predicate symbols are Often the same as the ones used in attribute—value learning, like: S, 2, E, but also linear, non-linear, arithmetic or trigonometric functions have been used [4].

The large family of systems able to learn constraints are all based on the technique introduced in the Classical INDUCE system [5] and later popularised and developed in the system REMO [6] and other systems [3, 1, 4, 2]. This technique separates learning the logical part of the hypothesis from learning its constraint part (usually nominal and numerical constraint variables). If we refer to the covering test deﬁnition, for the positive examples, at least one of the possible matching substitutions between the logical part of the hypothesis and the logical part of the positive example must satisfy the constraint part. Conversely, for the negative examples, for all possible substitutions, none must satisfy the constraint part. The key idea is to ﬁrst compute the set of substitutions matching the hypothesis, logical part with the learning examples, and then from the induced tabular representation, where constraint variables are attributes, learn the constraint part of the hypothesis. Zucker and Ganascia note that such a tabular representation is a multi—instance representation in the general case (the constraints are satisﬁed by at least one matching substitution to a positive example, and none to a negative example), and that multi—instance learners have to be used to learn the hypothesis, constraint part. The different approaches can be compared with respect to the way they deﬁne the hypothesis, logical part and when they delegate learning to an attribute—value or a multi—instance learner. INDUCE completely seperates the two processes and ﬁrst searches for a good logical part (following an log—based approach) which is then specialized by constraints. A subsequent approach [6] sets a single logical part beforehand, either user—speciﬁed or built from the examples. Other systems [1,4] limit the constraint part such that they only deal With a single matching substitution, limiting the interest of delegating numerical learning to attribute—value learners. For instance, Anthony and Frisch [1] only allow a constraint variable to appear in the Clause’s head and to limit the number of matchings to one.

In this paper, we present an approach that does not limit the logical part of a hypothesis: we search in the hypothesis space for a good logical part Which, when introducing constraint variables (presently limited to numerical ones), delegates contraint learning to a multi—instance learner. This is different from the Classical INDUCE system and more recent approaches, given that intertwining logical and constraint learning can better guide the search. This also introduces some interesting properties that can be leveraged by a boosting approach (to be explained below). In the following, we present the technical details of the approach.

### 2 Method

Before we can describe the method in detail, we have to introduce some notation. Let D : {(\$1,111) ,(xn,yn)} denote a training set of Classiﬁed examples. Each example is described by a set of tuples from several relations over nominal and continuous variables, denoted by cm, and assigned to a Class yi. We restrict ourselves to binary Classiﬁcation problems in this paper (yi 6 {+17 71]»). The size of the training set is denoted by [D] : n. We follow standard multi—instance terminology and make a distinction between mamples and instances: an example is deﬁned as a bag of instances (to be deﬁned later). As we follow a boosting approach in the outer loop of the algorithm, we have a weight 10, associated with each example, which is initialized to i

In the following we will deal with negation—free program Clauses. Given a set of Clauses, we let t denote the index of the t—th Clause Ct. Clauses are learned one after the other, using the generalisation of boosting to real—valued weak hypotheses [7] (see below). Hence, t not only denotes the index of a Clause, but also the index of the boosting iteration.

Due to the size of the search space, clauses are built in a greedy manner, with one reﬁnement after the other. A reﬁnement consists of the addition of one or several literals to the body of a clause according to the modes of a language bias declaration. The reﬁnement operator providing all specializations of a clause is denoted by $\rho(C)$.

In the following, our starting point is a Clause C, which is to be reﬁned in a subsequent step:

$...$

Upper—Case Characters X and Y denote, similar to the deﬁnition of the training set D above, the identiﬁer of an example (X) and its Class Y (either 1 or +1). Given such a Clause, its variables can be obtained by a function

$...$

Additionally, we have functions Varn(C) picking the nominal variables of a Clause and VarC(C) picking the continuous variables (Var(C) : Varn(C) U VarC(C)).

For simplicity and without loss of generality, we assume that exactly one literal is added to Clause C in the course of a reﬁnement C’ E p(C):

$...$

We make the assumption that at least one additional continuous variable is available after a reﬁnement. In other words for each C’ E p(C) we assume there exists an Xk+17l E VarC(C’).

It is Clear that due to multiplicities (l : n and m : n relationships between the head variable X and the body variables) multi—instance problems over the body variables arise. As our goal is to improve the capability of ILP learning systems to handle continuous variables we let the multi—instance problems range only over those variables of a Clause. The structure of a Clause and the remaining variables only serve to give us the deﬁnition of a multi—instance problem. To be more precise, we obtain a dataset for multi—instance learning from ﬁrst materializing the relation from the body (ranging over all variables Var(C)) and subsequently projecting it onto the variables {X} U VarC(C).

Proceeding in this way, the question is (a) how to guide the search for suitable Clauses and (b) how to decide when to stop reﬁning a Clause.

For the former question we decided to use the margin of the Classiﬁer (in the sense of boosting). Consider the output of the Clause together with the multi— instance Classiﬁer is given by a function h(.), which tells us not only the Class of the prediction (its sign), but also its conﬁdence. Then the mean margin of M.) can be deﬁned as

$...$

As to decide when to stop reﬁning a Clause, we need a criterion that acts as a regularisation parameter for the multi—instance learner. A natural Choice is to limit the number of attributes in the datasets that are passed to it. It translates to limiting the number of constraint variables that can be introduced in the logical part, also regularising its complexity.

For the outer loop generating one Clause plus multi—instance Classiﬁer after the other, we employ the generalization of AdaBoost to real—valued weak hypotheses [7]. For each example covered by a Clause Ct, the function ht(.) (deﬁned in terms of the Clause itself plus its multi—instance Classiﬁer ft(.)) will provide a different prediction. For the examples not covered by the Clause, the weak hypothesis abstains on them and outputs a prediction ofO. In that sense, this is more general than SLIPPER’S rules [8], which either abstain or predict the positive Class. The boosting algorithm will focus on those examples in the later stages forcing the weak learner to search for good logical structures that can discriminate between them:

$...$

Overall, the model that is learned is a sequence of Clauses Ct along with associated multi—instance models ft. Both Ct and ft give ht, the weak Classi— ﬁers that are boosted in the outer loop of the algorithm. Additionally, we have the weights originating from the boosting iterations: ((h17oz1)7 . . . 7 (hT7 ozT)) : (((C17 f1)7 ozl)7 . . . 7 ((CT7 fT)7 ozT)). In the following we call the described method NuRMI (Numerical Reﬁnement operator based on Multi—Instance learning).

## References

;

volumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 ANumericalRefinementOperatorbasA Numerical Refinement Operator based on Multi-instance Learning2010
 Author Erick Alphonse +, Tobias Girschick +, Fabian Buchwald + and Stefan Kramer + title A Numerical Refinement Operator based on Multi-instance Learning + year 2010 +