1988 QueriesandConceptLearning

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Concept Learning System; Query-Based Learning (QBL) System, Angluin's Exact Identification QBL System, Exact Identification Algorithm, Probably Approximately Correct (PAC) Identification Algorithm, Valiant's Criterion, Equivalence Query, Membership Query, Subset Query, Superset Query, Disjointness Query, Exhaustiveness Query.

Notes

Cited By

Quotes

Author Keywords

Abstract

We consider the problem of using queries to learn an unknown concept. Several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries. Examples are given of efficient learning methods using various subsets of these queries for formal domains, including the regular languages, restricted classes of context-free languages, the pattern languages, and restricted types of prepositional formulas. Some general lower bound techniques are given. Equivalence queries are compared with Valiant's criterion of probably approximately correct identification under random sampling.

1. Introduction

A successful learning component in an expert system will probably rely heavily on queries to its instructors. For example, Sammut and Bancrji's (1986) system uses queries about specific examples as part of its strategy for efficiently learning a target concept. Shapiro‘s (19817 19827 1983) Algorithmic Debugging System uses a variety of types of queries to the user to pinpoint errors in Prolog programs. In this paper, we use a formal framework to study the power of several types of queries for concept-learning tasks.

We consider the problem of identifying an unknown set [math]\displaystyle{ L }[/math] from some finite or countable hypothesis space [math]\displaystyle{ L_1, L_2, \cdots }[/math] of subsets of a universal set [math]\displaystyle{ U }[/math]. The usual assumption is that one is given an arbitrarily or stochastically generated sequence of elements of [math]\displaystyle{ U }[/math] each classified as to whether it is in [math]\displaystyle{ L_* }[/math][1]. Instead, we will assume that the learning algorithm has access to a fixed set of oracles that will answer specific kinds of queries about the unknown concept [math]\displaystyle{ L_* }[/math]. The types of queries we consider are the following:

  • Membership. The input is an element [math]\displaystyle{ x \in U }[/math] and the output is yes if [math]\displaystyle{ x \in L }[/math] and no if [math]\displaystyle{ x \notin L_* }[/math].
  • Equivalence. The input is a set [math]\displaystyle{ L }[/math] and the output is yes if [math]\displaystyle{ L = L_* }[/math] and no if [math]\displaystyle{ L \neq L_* }[/math]. If the answer is no, an element [math]\displaystyle{ x \in L \oplus L_* }[/math], is returned. ([math]\displaystyle{ L \oplus L_* }[/math] is the symmetric difference of [math]\displaystyle{ L }[/math] and [math]\displaystyle{ L_* }[/math] that is, the set of elements in one but not both of the sets.)
  • Subset. The input is a set [math]\displaystyle{ L }[/math] and the output is yes if [math]\displaystyle{ L \subseteq L_* }[/math] and no otherwise. If the answer is no, an element <mat> x\in L - L_*</math> is returned.
  • Superset. The input is a set [math]\displaystyle{ L }[/math] and the output is yes if [math]\displaystyle{ L \supseteq L_* }[/math] and no otherwise. If the answer is no, an element [math]\displaystyle{ x \in L_* - L }[/math] is returned.
  • Disjointness. The input is a set [math]\displaystyle{ L }[/math] and the output is yes if [math]\displaystyle{ L \cap L_* }[/math] is empty and no otherwise. If the answer is no, an element [math]\displaystyle{ x \in L \cap L }[/math], is returned.
  • Exhaustiveness. The input is a set [math]\displaystyle{ L }[/math] and the output is yes if [math]\displaystyle{ L \cup L_* }[/math], is [math]\displaystyle{ U }[/math] and no otherwise. If the answer is no, an element [math]\displaystyle{ x \notin L \cup L_* }[/math], is returned.

For the queries other than membership, the returned element [math]\displaystyle{ x }[/math] is called a counterexample. The particular selection of a counterexample is assumed to be arbitrary, that is, a successful identification method must work no matter what counterexample is returned. We shall also consider restricted versions of each of these queries, for which the responses are just yes and no, with no counterexample provided.

Except in Section 2.2, we assume that if the input to a query is a set [math]\displaystyle{ L }[/math], then it must be one of the elements [math]\displaystyle{ L_i }[/math] of the original hypothesis space. In particular, our lower bounds are strongly dependent on this requirement. The issue of how hypotheses are represented is very important, too important to be given any detailed treatment in this paper. We assume the hypotheses are simply represented as indices: the index [math]\displaystyle{ i }[/math] represents the hypothesis [math]\displaystyle{ L_i }[/math].

1.1 An example: Poker hands

To illustrate the definitions above, we consider a concrete example from the game of poker. Let the target concept L* be the set of all ordered pairs of poker hands such that the first hand beats the second hand. We choose a representation of a card that gives its value and suit, for example, QH or 20. A hand is an unordered set of five cards. A pair of hands is an ordered pair of hands with no cards in common. Then the universal set U is simply all pairs of hands7 and the target concept is the subset consisting of those pairs in which the first hand beats the second. Thus,

[math]\displaystyle{ ({QSi10H,QH,2D,5C‘},{JS,3H,4H,JH,ZC}) }[/math]

is an element of U that is not an element of Lt, since a pair of twos does not beat a pair of jacks.

We must also choose a description language to express subsets of U . This choice is crucial to the success of a learning method, but it is not the focus of this paper. We assume that the description language contains at least one expression for the target concept; the goal of learning is to find one such expression. We now give some examples of queries for this domain.

A membership query gives a specific pair of hands, for example,

[math]\displaystyle{ ({ZS, 10H, QH, 2]),50}, {JS, 3H, 4H, JH, 20}). }[/math]

The reply, in this case, is no, because the first hand does not beat the second.

An equivalence query specifies an expression in the description language, for example, “all pairs of hands in which the first hand contains two or three cards with equal value.” This is not a correct description of the target concept, so the reply is no combined with a counterexample, i.e., a specific pair of hands that is a positive example of the target concept and a negative example of the queried expression or vice versa. Thus the counterexample could be the pair of hands

[math]\displaystyle{ ({2s,10H,QH,2D,5C}, {.IS,3H,4H, JH,2(J}). }[/math]

This pair satisfies the description, but is not in the target concept L*, Or the counterexample might be

[math]\displaystyle{ ({25,10H,QH,3D,5C},{7S,3H,4H,JH,2C}), }[/math]

which does not satisfy the description, but is nonetheless in the target concept.

1.2 Exact and probabilistic identification

The abstract examples we consider include sets of strings generated by grammars or recognized by automata and sets of truth-value assignments that satisfy certain classes of propositional formulas. We consider two criteria of successful identification, exact and probabilistic.

An identification method exactly identifies a set [math]\displaystyle{ L_* }[/math] with access to certain types of queries if it always halts and outputs an index [math]\displaystyle{ i }[/math] such that [math]\displaystyle{ L_i = L_* }[/math] Note that this is not a limiting criterion of identification — the identification method is allowed one guess, and that guess must be exactly right. In contrast, Valiant (1984) has introduced a criterion of probabilistic identification, and here we give a definition suitable for this paper. There is some probability distribution [math]\displaystyle{ D }[/math] on the universal set [math]\displaystyle{ U }[/math]. The probability of element [math]\displaystyle{ x }[/math] with respect to [math]\displaystyle{ D }[/math] is denoted [math]\displaystyle{ Pr(z) }[/math]. There is a sampling oracle EX( ), which has no input. Whenever EX( ) is called, it draws an element [math]\displaystyle{ x \in U }[/math] according to the distribution [math]\displaystyle{ D }[/math] and returns the element [math]\displaystyle{ x }[/math], together with an indication of whether or not at is in the target set [math]\displaystyle{ L_* }[/math].

Successful identification is parameterized with respect to two positive quantities, the accuracy parameter c and the confidence parameter 6. We define a notion of the difference between two sets [math]\displaystyle{ L }[/math] and [math]\displaystyle{ L' }[/math] with respect to the probability distribution as

[math]\displaystyle{ d(L,L')=\sum_{x\in L\oplus L'} Pr(x) }[/math]

The probability of getting an element in one but not the other of the two sets L and L’ in one call to EX( ) is precisely d(L, L’).

An identification method is said to probably approximately correctly identify L if it always halts and outputs an index 2' such that

[math]\displaystyle{ ... }[/math]

That is‘ with high probability (quantified by 6) there is not too much difference (quantified by 6) between the conjectured set and the target set.

Probably approximately correct identification (abbreviated pac-identification) is the criterion of success if the sampling oracle EX( ) is available, otherwise we use exact identification. For more on pac—identification, see Blumer, Ehrenfeucht, Haussler. and Warinuth (1986, 1987) or Angluin and Laird (1987).

2. Equivalence queries

We now consider the different types of queries in more detail. This section concerns equivalence queries.

2.1 Exhaustive search

Given only equivalence queries, then one strategy is exhaustive search: enumerate the indices 2' = 1,2,..., querying each L1- until one gets an answer of yes, at which point one halts and outputs index 2 This achieves exact identification.

if the queries are restricted to the hypothesis space, L1,L2,..., then exhaustive search is sometimes the best that can be done. Suppose the hypothesis space is the set of singleton subsets of the set of all 2” binary strings of length n. Then it is clear that a strategy using equivalence queries can only eliminate one hypothesis with each query, and so may use as many as 2" — 1 queries.

We can strengthen this result: the following adversary forces any method of exact identification using equivalence, membership, subset. and disjointness queries to make 2” — 1 queries in the worst case. (However. the (:counterexample returned by a single superset query will disclose the unknown set.)

The adversary maintains a set S of all the unliminated binary strings of length n. Initially 8 contains all 2" binary strings of length 21. As long as S contains at least two distinct strings, the adversary proceeds as follows. If the next query is a membership query With the string :13, then the adversary answers with no. If the next query is an equivalence or subset query with the singleton set {I} then the adversary answers no along with the counterexample m. If the next query is a disjointness query with the singleton set {2:} then the adversary answers with yes. In each case. if r is a member of S, the adversary removes it from S.

It is not difficult to see that the responses of the adversary are compatible with the unknown hypothesis being {x} for any element x still in S$ and at most one element is removed from S with each query. Thus to be correct the algorithm must make at least 2” e 1 queries.

2.2 A logarithmic strategy: Majority vote

If the input L to an equivalence query is not required to be a member of the hypothesis space. a strategy based on “majority vote” can achieve exponentially fewer queries than exhaustive search. In this case we assume that the inputs to equivalence queries are arbitrary subsets of the universal set U, and the hypothesis output can also be an arbitrary subset of U. Suppose the hypothesis space is finite: L1, . . . , LN. Then instead of inaking N — 1 queries, we may make [logNj queries, as follows. (Note: we use log to denote the base 2 logarithm, and [n to denote the natural logarithm.)

A set S is maintained at all the indices of hypotheses compatible with all the counterexamples seen so far. S initially contains all N indices‘ Itei‘ate the following process until it halts.

If S contains just one element. 2', then halt and output Li. Otherwise define A43 to be the set of all elements a such that for more than IS I / 2 elements i of S , x E L). Make an equivalence query with MS as input. If the answer is yes, halt and output 11/13. Otherwise, there is a counterexample x. If x 6 M3. remove from 8 all i such that w 6 L1: otherwise, remove from 5' all 2‘ such that z e? Li.

A query is made only if ]S[ > 1, and every query answered 710 must reduce the cardinality of C by at least one half. Thus, a total of [logNj queries suffices. This result is a special ease of the results of Barzdin and Freivald (1972).

A direct implementation of the majority-vote strategy, keeping an explicit list of all ueliminated hypotheses, is infeasible for most domains; In Section 2.5 we show that a simple algorithm indirectly implements the majority—vote strategy for k-CNF formulas. Approximating the majority vote leads to questions of how to sample “fairly" from the space of uneliminated hypotheses. Littlestone (1987) shows that the niaj01'ity-Vote method may not be query~0ptimal in some domains.

2.3 Some general lower bound techniques

Since in this paper we restrict the inputs to the queries to the elements of the hypothesis space, we now develop some simple lower bound techniques that generalize the lower bound proof in Section 2.1.

Lemma 1 Suppose the hypothesis space contains a class of distinct sets L1, . . . , LN, and a set Ln such that far any pair of distinct indices i and j,

[math]\displaystyle{ ... }[/math]

Then any algorithm that exactly identifies each of the hypotheses Li using restricted equivalence, membership, and subset queries must make at least N ~ 1 queries in. the worst case.

PROOF: We describe an adversary. Initially S contains 1)2,...,N. If [SI > 1 the adversary proceeds as follows. For a restricted equivalence query with the hypothesis L, the reply is no, and the (at most one) 2' such that, L1- = L is removed from S. For a membership query with the element cc, if a: E Ln then the reply is yes. Otherwise, the reply is no, and the (at most one) element 2' of S such that x E Li is removed from S . For a subset query with the hypothesis L, if L is a subset of La then the reply is yes. Otherwise, the reply is no and any element a E L — Ln is selected as the counterexample. The (at most one] element 2' of S such that x E L is removed from S .

At any point, for each i E S, L» is compatible with the replies to the queries so far. A correct exact identification algorithm must reduce the cardinality of S to at most one. Each query removes at most one element from the set 8.. so N — 1 queries are required in the worst case, which proves Lemma 1.

If the “intersection set” Ln is not itself an element of the hypothesis space, then Lemma 1 can be strengthened as follows.

Lemma 2 Suppose the hypothesis space contains a class of distinct sets L1, . . .,LN, and there exists a set Ln which 1's not a hypothesis, such that for any pair of distinct z'ndz'ces 2' and 1',

[math]\displaystyle{ ... }[/math]

Then any algorithm that exactly identifies each of the hypotheses LZ- using equivalence, membership, and subset queries must make at least N — 1 queries in the worst case.

PROOF: The proof of Lemma 1 may be modified as follows. The replies to queries are the same, except that a counterexample must be provided when an equivalence query is answered no. Let L be the input to an equivalence query Since Lfl is not in the hypothesis space, L 75 L0. The counterexample is any element 2: 0f L(BLn. If m is from Lfl.‘ there is nothing further to do, but if a is from L — Ln, the (at most one) element 2' in S such that a: E L is removed from S. The rest of the proof is unchanged, proving Lemma 2.

There are dual results for equivalence, membership. and superset queries, which we state without proof.

Lemma 3 Suppose the hypothesis space contains a class of distinct sets L1, . . . ,LN, and a set LU such that for any pair of distinct z'na’z'cesz' and j,

[math]\displaystyle{ ... }[/math]

Then any algorithm that exactly identifies each of the hypotheses Li using restricted equivalence, membership, and superset queries must make at least N E 1 queries z'n the worst case

Lemma 4 Suppose the hypothesis space contains a class of distinct sets L1. . . . ,LN, and there exists a set LU which 2"; not a hypothesis, such that for any pair of distinct z'ndz'ees z' and j.

[math]\displaystyle{ ... }[/math]

Then any algorithm that exactly identifies each of the hypotheses Li using equivalence, membership, and superset queries must make at least N ~ 1 queries in the worst case.

2.4 Equivalence queries and stochastic equivalence

If the source of information about the unknown concept L* is a domain expert, then it may be unreasonable to expect correct answers to equivalency queries, but the stochastic equivalence testing described below may be a practical substitute. Stochastic testing is a feature of Knobe and Knobe’s (1976) method for identifying context-free grammars.

An identification method that uses equivalence queries and achieves exact identification may be modified to achieve pac—identification using calls to EX( ) instead of equivalence queries. The idea is instead of asking an equivalence query about L, the identification method calls EX( ) a number of times and checks to see that L is compatible with the element it returned and classified by EX( ). If not, then the. identification method proceeds as though the equivalence query had returned the answer no with :z as a counterexample. If L is compatible with all the samples drawn, then the identification method proceeds as though the equivalence query had returned the answer yes.

Suppose that when an equivalence query returns yes, the identification method simply halts and outputs the index j such that Lj 2 L. If the identification method makes

[math]\displaystyle{ ... }[/math]

calls to the EX( ) oracle in place of the i-th equivalence query, then the probability that the identification method will output an index j such that d(L3~,L*) Z c is at most (1 — c)“. Thus, the probability that at any stage the identification method will output a hypothesis that is not an e—approximation of L* is at most

[math]\displaystyle{ ... }[/math]

Thus the modified method achieves pac—identification of L*.

What about the converse? Can the EX( ) oracle and pac—identification be replaced by equivalence queries and exact identification? Not if efficiency must be preserved. Consider the hypothesis space of all singleton subsets of the set of binary strings of length n. An identification algorithm using only equivalence queries must make 2" — 1 queries in the worst case.

However, for pac-identification in this domain, it suffices to make

[math]\displaystyle{ ... }[/math]

calls to EX( ). (This quantity grows linearly in n for fixed 6 and 6.) If at least one of these calls returns a positive example :5, output the set {x}. If all the calls return negative examples, output any set {y} such that y has not appeared among the negative examples.

Let B(z) denote the probability of string 2 with respect to the unknown distribution D. The probability that after q calls to EX( ) there exists a string 2 with D(z) > 6/ 2 that has not been drawn is easily shown to be at most 6.

If all the strings 2 such that 0(2) > e/ 2 have been drawn, then either we output the correct set {x}, or x was not drawn and and we output some set {y} such that y was not drawn, in which case,

[math]\displaystyle{ ... }[/math]

Hence this method achieves pac-identification.

Littlestone (1987) shows that in certain circumstances equivalence queries and errors of prediction are very closely related. His proof gives methods of converting identification algorithms that use equivalence queries into prediction algorithms and Vice versa. Combined with the transformation above, Littlestone's result gives a method of converting prediction algorithms into algorithms for pac—identification.

2.5 Equivalence queries: k—CNF and k-DNF formulas

We now show that k-CNF and k-DNF formulas can be efficiently identified using only equivalence queries

Let Ic—CNF denote the set of propositional formulas in conjunctive normal form over the n variables m1, . . . , an with at most It literals per clause. If X is a formula and a is an assignment of truth~values to the n variables, a(X) denotes the truth—value assigned to X.

We assume that there is an unknown k-CNF formula (1),, and that the EX( ) oracle returns pairs (a, s) where a is a truth—value assignment and s := + if a((jn) is true and s = - otherwise.

Upper bounds

Valiant (1984) gives a method that runs in time polynomial in nk, 1/61 and ln(1/<5) that achieves pac—identification of Ic-CNF formulas A simple modification of this algorithm uses equivalence queries, achieves exact identification of the h—CNF formulas, and runs in time polynomial in nk.

Initially let qt be the conjunction of all clauses C over n variables with at most k literals per clause. (There are no more than (2n + 1)k such clauses.) At' every point the Set of assignments satisfying (1’) will be a subset of' the set of assignments satisfying (Zn. Iterate the following process until the equivalence query returns yes, at which point halt and output 4').

Test (,6 for. equivalence with 45, If it is not equivalent let a be the counterexample. Then a(ctt) is true and a((t) lS false.Re1nove from gt all clauses C such that (1(0) is false.

At least» one clause is removed from gt for each negative answer to an equivalence query. By the time one removes all clauses from 45 that are not implied by 41*, ab is equivalent to (1’)“ so the claim follows. The number of equivalence queries is bounded by (2n + 1)k.

The algorithm given above implements the majority vote method described in Section 2.2 for the class of k—CNF formulas. The set S of hypotheses consistent With the examples seen so far consists of every formula that ‘is a conjunction 'of some subset of the unliminated clauses. If an assignment assigns true to all the unliminated clauses, then it assigns true to every hypothesis in S, so the majority vote is true. If an assignment assigns false to some unliminated clause c, then for every hypothesis in S that is assigned true, there is another hypothesis in S (obtained by conjoining c) that is assigned false, so the majority vote is false. Hence the conjunction of all the unliminated clauses gives the majority vote value for all assignments

There is a logically dual method for k—DNF formulas, that is, formulas in disjunctive normal form over n variables with at most It literals per term. Haussler (1986, in press) and Littlestone (1987) have described other methods of identifying h-CNF'and k-DNF formulas that may use many fewer queries.

Lower bounds.

Lemma 1 may be applied to the class of l-CNF formulas. Consider the class of all formulas of the form

[math]\displaystyle{ ... }[/math]

where each 13,; is either sci 01 ~03). There are 2” such formulas, each one a l-CNF formula satisfied by exactly one assignment, and no two formulas are satisfied by the same assigmnent. Thus, the hypothesis space of 1- CNF formulas satisfies the conditions of Lemma 1, which implies that any algorithm that exactly identifies every 1-CNF formula over n variables using restricted equivalence, membership, and subset queries must make 2” — 1 queries in the worst case.

Dually, we may consider the class of 1-DNF formulas of the form

[math]\displaystyle{ P_1+P_2+\cdots+ P_n }[/math],

where each R: is either at or fix), to see that the hypothesis space of 1— DNF formulas satisfies the conditions of Lemma 3. Thus any algorithm that exactly identifies every l—DNF formula using restricted equivalence, membership, and superset queries must make 2'” — 1 queries in the worst case.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1988 QueriesandConceptLearningDana AngluinQueries and Concept Learning10.1007/BF001168281988
  1. . We direct the reader to Angluin and Smith (1983) for a survey of inductive inference.