2010 ANumericalRefinementOperatorbas

(Alphonse et al., 2010) ⇒ Erick Alphonse, Tobias Girschick, Fabian Buchwald, and Stefan Kramer. (2010). “A Numerical Refinement Operator based on Multi-instance Learning.” In: Proceedings of the 20th International Conference on Inductive logic programming. ISBN:978-3-642-21294-9

Subject Headings: Refinement Operator, Numerical Refinement Operator.

Notes

Cited By

Google Scholar: ~ 8 Citations
DL-ACM: ~ 2 Citations

Quotes

Abstract

We present a numerical refinement operator based on multi-instance learning. In the approach, the task of handling numerical variables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance learners available in the Weka workbench (like MISVMs). These clauses are used in a boosting approach that can take advantage of the margin information, going beyond standard covering procedures or the discrete boosting of rules, like in SLIPPER. The approach is evaluated on the problem of hexose binding site prediction, a pharmacological application and mutagenicity prediction. In two of the three applications, the task is to find configurations of points with certain properties in 3D space that characterize either a binding site or drug activity: the logical part of the [[clause constitute[[s the points with their properties, whereas the multi-instance model constrains the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statistical learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.

1 Introduction and Background

It has often been acknowledged that numerical learning in ILP is limited because of the Choice Of logic programming as representation language [1, 2]. Function symbols are not interpreted in logic programming, they simply are seen as functors 0f Herbrand terms. For instance, the + function symbol being not interpreted, both terms of the following equation cannot be uniﬁed and the equation X+Y : 0 cannot be solved. To solve this problem, the hypothesis representation language has been extended by a Constraint Programming Language (CLP) [3]. A large number of CLP languages have been proposed, some with complete and efﬁcient solvers. In ILP, the interpreted predicate symbols are Often the same as the ones used in attribute—value learning, like: S, 2, E, but also linear, non-linear, arithmetic or trigonometric functions have been used [4].

The large family of systems able to learn constraints are all based on the technique introduced in the Classical INDUCE system [5] and later popularised and developed in the system REMO [6] and other systems [3, 1, 4, 2]. This technique separates learning the logical part of the hypothesis from learning its constraint part (usually nominal and numerical constraint variables). If we refer to the covering test deﬁnition, for the positive examples, at least one of the possible matching substitutions between the logical part of the hypothesis and the logical part of the positive example must satisfy the constraint part. Conversely, for the negative examples, for all possible substitutions, none must satisfy the constraint part. The key idea is to ﬁrst compute the set of substitutions matching the hypothesis, logical part with the learning examples, and then from the induced tabular representation, where constraint variables are attributes, learn the constraint part of the hypothesis. Zucker and Ganascia note that such a tabular representation is a multi—instance representation in the general case (the constraints are satisﬁed by at least one matching substitution to a positive example, and none to a negative example), and that multi—instance learners have to be used to learn the hypothesis, constraint part. The different approaches can be compared with respect to the way they deﬁne the hypothesis, logical part and when they delegate learning to an attribute—value or a multi—instance learner. INDUCE completely seperates the two processes and ﬁrst searches for a good logical part (following an log—based approach) which is then specialized by constraints. A subsequent approach [6] sets a single logical part beforehand, either user—speciﬁed or built from the examples. Other systems [1,4] limit the constraint part such that they only deal With a single matching substitution, limiting the interest of delegating numerical learning to attribute—value learners. For instance, Anthony and Frisch [1] only allow a constraint variable to appear in the Clause’s head and to limit the number of matchings to one.

In this paper, we present an approach that does not limit the logical part of a hypothesis: we search in the hypothesis space for a good logical part Which, when introducing constraint variables (presently limited to numerical ones), delegates contraint learning to a multi—instance learner. This is different from the Classical INDUCE system and more recent approaches, given that intertwining logical and constraint learning can better guide the search. This also introduces some interesting properties that can be leveraged by a boosting approach (to be explained below). In the following, we present the technical details of the approach.

2 Method

Before we can describe the method in detail, we have to introduce some notation. Let D : {($1,111),(xn,yn)} denote a training set of Classiﬁed examples. Each example is described by a set of tuples from several relations over nominal and continuous variables, denoted by cm, and assigned to a Class yi. We restrict ourselves to binary Classiﬁcation problems in this paper (yi 6 {+17 71]»). The size of the training set is denoted by [D] : n. We follow standard multi—instance terminology and make a distinction between mamples and instances: an example is deﬁned as a bag of instances (to be deﬁned later). As we follow a boosting approach in the outer loop of the algorithm, we have a weight 10, associated with each example, which is initialized to i

In the following we will deal with negation—free program Clauses. Given a set of Clauses, we let t denote the index of the t—th Clause Ct. Clauses are learned one after the other, using the generalisation of boosting to real—valued weak hypotheses [7] (see below). Hence, t not only denotes the index of a Clause, but also the index of the boosting iteration.

Due to the size of the search space, clauses are built in a greedy manner, with one reﬁnement after the other. A reﬁnement consists of the addition of one or several literals to the body of a clause according to the modes of a language bias declaration. The reﬁnement operator providing all specializations of a clause is denoted by [math]\displaystyle{ \rho(C) }[/math].

In the following, our starting point is a Clause C, which is to be reﬁned in a subsequent step: