2013 ReasoningwithProbabilisticandDe

(Dechter, 2013) ⇒ Rina Dechter. (2013). “Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms.” In: Synthesis Lectures on Artificial Intelligence and Machine Learning Journal, 7(3). doi:10.2200/S00529ED1V01Y201308AIM023

Subject Headings:

Notes

Free first chapter http://www.morganclaypool.com/doi/suppl/10.2200/S00529ED1V01Y201308AIM023/suppl_file/Dechter_Ch1.pdf

Cited By

http://scholar.google.com/scholar?q=%22Reasoning+with+Probabilistic+and+Deterministic+Graphical+Models%3A+Exact+Algorithms%22+2013

Author Keywords

graphical models, Bayesian networks, constraint networks, Markov networks, induced-width, treewidth, cycle-cutset, loop-cutset, pseudo-tree, bucket-elimination, variable-elimination, AND/OR search, conditioning, reasoning, inference, knowledge representation

Quotes

Abstract

Graphical models (e.g., Bayesian and constraint networks, influence diagrams, and Markov decision processes) have become a central paradigm for knowledge representation and reasoning in both artificial intelligence and computer science in general. These models are used to perform many reasoning tasks, such as scheduling, planning and learning, diagnosis and prediction, design, hardware and software verification, and bioinformatics. These problems can be stated as the formal tasks of constraint satisfaction and satisfiability, combinatorial optimization, and probabilistic inference. It is well known that the tasks are computationally hard, but research during the past three decades has yielded a variety of principles and techniques that significantly advanced the state of the art.

In this book we provide comprehensive coverage of the primary exact algorithms for reasoning with such models. The main feature exploited by the algorithms is the model's graph. We present inference-based, message-passing schemes (e.g., variable-elimination) and search-based, conditioning schemes (e.g., cycle-cutset conditioning and AND/OR search). Each class possesses distinguished characteristics and in particular has different time vs. space behavior. We emphasize the dependence of both schemes on few graph parameters such as the treewidth, cycle-cutset, and (the pseudo-tree) height. We believe the principles outlined here would serve well in moving forward to approximation and anytime-based schemes. The target audience of this book is researchers and students in the artificial intelligence and machine learning area, and beyond.

Preface

1. Introduction

Over the last three decades, research in artificial intelligence has witnessed marked growth in the core disciplines of knowledge representation, learning and reasoning. This growth has been facilitated by a set of graph-based representations and reasoning algorithms known as graphical models.

The term “graphical models” describes a methodology for representing information, or knowledge, and for reasoning about that knowledge for the purpose of making decisions by an intelligent agent. What makes these models graphical is that the structure of the knowledge can be captured by a graph. The primary benefits of graph-based representation of knowledge are that it allows compact encoding of complex information and its efficient processing.

1.1 Probabilistic vs. Deterministic Models . . . 1

The concept of graphical models has mostly been associated exclusively with probabilistic graphical models. Such models are used in situations where there is uncertainty about the state of the world. The knowledge represented by these models concerns the joint probability distribution of a set of variables. An unstructured representation of such a distribution would be a list of all possible value combinations and their respective probabilities. This representation would require a huge amount of space even for a moderate number of variables. Furthermore, reasoning about the information, for example, calculating the probability that a specific variable will have a particular value given some evidence would be very inefficient. A Bayesian network is a graph-based and far more compact representation of a joint probability distribution (and, as such, a graphical model) where the information is encoded by relatively small number of conditional probability distributions as illustrated by the following example based on the early example by Lauritzen and Spiegelhalter [Lauritzen and Spiegelhalter, 1988].

This simple medical diagnosis problem focuses on two diseases: lung cancer and bronchitis. There is one symptom, dyspnoea (shortness of breath), that may be associated with the presence of either disease (or both) and there are test results from X-rays that may be related to either cancer, or smoking, or both. Whether or not the patient is a smoker also affects the likelihood of a patient having the diseases and symptoms. When a patient presents a particular combination of symptoms and X-ray results it is usually impossible to say with certainty whether he suffers from either disease, from both, or from neither; at best, we would like to be able to calculate the probability of each of these possibilities. Calculating these probabilities (as well as many others) requires the knowledge of the joint probability distribution of the five variables (Lung Cancer (L), Bronchitis (B), Dyspnea (D), Test of X-ray (T), and smoker (S)), that is, the probability of each of their 64 value combinations when we assume a bi-valued formulation for each variable (e.g., X-ray tests are either positive (value 1) or negative (value 0). Alternatively, the joint probability distribution can be represented more compactly by factoring the distribution into a small number of conditional probabilities. One possible factorization, for example, is given by

P.S;L;B;D; T / D P.S/P.LjS/P.BjS/P.DjL;B/P.T jL/ :

This factorization corresponds to the directed graph in Figure 1.1 where each variable is represented by a node and there is an arrow connecting any two variables that have direct probabilistic (and may be causal) interactions between them (that is, participate in one of the conditional probabilities).

Figure 1.1: A simple medical diagnosis Bayesian network.

The graph articulates a more compact representation of the joint probability distribution, in that it represents a set of independencies that are true for the distribution. For example, it expresses that the variables lung cancer and bronchitis are conditionally independent on the variable smoking, that is, if smoking status is known then knowing that the patient has (or doesn’t have) lung cancer has no bearing on the probability that he has bronchitis. However, if it is also known that shortness of breath is present, lung cancer and bronchitis are no longer independent; knowing that the person has lung cancer may explain away bronchitis and reduces the likelihood of dyspnea. Such dependencies and independencies are very helpful for reasoning about the knowledge.

While the term “graphical models” has mostly been used for probabilistic graphical models, the idea of using a graph-based structure for representing knowledge has been used with the same amount of success in situations that seemingly have nothing to do with probability distributions or uncertainty. One example is that of constraint satisfaction problems. Rather than the probability of every possible combination of values assigned to a set of variables, the knowledge encoded in a constraint satisfaction problem concerns their feasibility, that is, whether these value combination satisfy a set of constraints that are often defined on relatively small subsets of variables.

Figure 1.2: A map of eight neighboring countries.

The structure associated with these set of constraints is a constraint graph where each variable is represented by a node and two nodes are connected by an edge if they are bound by at least one constraint. A constraint satisfaction problem along with its constraint graph is often referred to as a constraint network and is illustrated by the following example.

Consider the map in Figure 1.2 showing eight neighboring countries and consider a set of three colors — red, blue, and yellow, for example. Each of the countries needs to be colored by one of the three colors so that no two countries that have a joint border have the same color. A basic question about this situation is to determine whether such a coloring scheme exists and, if so, to produce such a scheme. One way of answering these questions is to systematically generate all possible assignments of a color to a country and then test each one to determine whether it satisfies the constraint. Such an approach would be very inefficient because the number of different assignments could be huge. The structure of the problem, represented by its constraint graph in Figure 1.3, could be helpful in simplifying the task. In this graph each country is represented by a node and there is an edge connecting every pair of adjacent countries representing the constraint that prohibits that they be colored by the same color

Just as in the Bayesian network graph, the constraint graph reveals some independencies in the map coloring problem. For example, it shows that if a color is selected for France the problem separates into three smaller problems (Portugal - Spain; Italy - Switzerland; and Belgium - Luxembourg - Holland) which could be solved independently of one another. This kind of information is extremely useful for expediting the solution of constraint satisfaction problems.

Whereas a Bayesian network is an example of a probabilistic graphical model, a constraint network is an example of a deterministic graphical model. The graphs associated with the two problems are also different: Bayesian networks use directed graphs, indicating that the information regarding relationship between two variables is not symmetrical while constraint graphs are undirected graphs. Despite these differences, the significance of the graph-based structure and the way it is used to facilitate reasoning about the knowledge are sufficiently similar to place both problems in a general class of graphical models. Many other problem domains have similar graph based structures and are, in the view of this book, graphical models. Examples include propositional logic, integer linear programming, Markov networks, and Influence Diagrams.

1.2 Directed vs. Undirected Models . . . 4
1.3 General Graphical Models . . . 6
1.4 Inference and Search-based Schemes . . . 7
1.5 Overview of the Book . . . 8

2. What are Graphical Models . . . 9

2.1 General Graphical Models . . . 9
2.2 The Graphs of Graphical Models . . . 11
2.2.1 Basic Definitions . . . 11
2.2.2 Types of Graphs . . . 12
2.3 Constraint Networks . . . 14
2.4 Cost Networks . . . 17
2.5 Probability Networks . . . 19
2.5.1 Bayesian Networks . . . 20
2.5.2 Markov Networks . . . 22
2.6 Mixed Networks . . . 25
2.7 Summary and Bibliographical Notes . . . 27

3. Inference: Bucket Elimination for Deterministic Networks . . . 29

3.1 Bucket-Elimination for Constraint Networks . . . 31
3.2 Bucket Elimination for Propositional CNFs . . . 36
3.3 Bucket Elimination for Linear Inequalities . . . 40
3.4 The induced-Graph and Induced-Width . . . 41
3.4.1 Trees . . . 42
3.4.2 Finding Good Orderings . . . 42
3.5 Chordal graphs . . . 44
3.6 Summary and Bibliography Notes . . . 46

4. Inference: Bucket Elimination for Probabilistic Networks . . . 47

4.1 Belief Updating and Probability of Evidence . . . 47
4.1.1 Deriving BE-bel . . . 48
4.1.2 Complexity of BE-bel . . . 54
4.1.3 The impact of Observations . . . 56
4.2 Bucket elimination for optimization tasks . . . 60
4.2.1 A Bucket Elimination Algorithm for mpe . . . 60
4.2.2 A Bucket Elimination Algorithm for map . . . 63
4.3 Bucket Elimination for Markov Networks . . . 63
4.4 Bucket Elimination for Cost Networks and Dynamic Programming . . . 65
4.5 Bucket Elimination for mixed networks . . . 66
4.6 The General Bucket Elimination . . . 71
4.7 Summary and Bibliographical Notes . . . 71
4.8 Appendix: Proofs . . . 72

5 Tree-Clustering Schemes . . . 75

5.1 Bucket-Tree Elimination . . . 75
5.1.1 Asynchronous Bucket-tree propagation . . . 81
5.2 From Bucket Trees to Cluster Trees . . . 83
5.2.1 From buckets to Clusters; the Short Route . . . 83
5.2.2 Acyclic graphical models . . . 84
5.2.3 Tree Decomposition and Cluster Tree Elimination . . . 86
5.2.4 Generating Tree Decompositions . . . 89
5.3 Properties of CTE for General Models . . . 92
5.3.1 Correctness of CTE . . . 93
5.3.2 Complexity of CTE . . . 95
5.4 Illustration of CTE for specific models . . . 96
5.4.1 Belief updating and probability of evidence . . . 96
5.4.2 Constraint Networks . . . 98
5.4.3 Optimization . . . 101
5.5 Summary and Bibliographical Notes . . . 102
5.6 Appendix: Proofs . . . 102

6 AND/OR Search Spaces and Algorithms for Graphical Models . . . 107

6.1 AND/OR Search Trees . . . 109
6.1.1 Weights of OR-AND Arcs. . . 112
6.1.2 Pseudo Trees . . . 114
6.1.3 Properties of AND/OR Search Trees . . . 115
6.2 AND/OR Search Graphs . . . 116
6.2.1 Generating Compact AND/OR Search Spaces . . . 118
6.2.2 Building Context-Minimal AND/OR Search Graphs . . . 118
6.3 Finding Good Pseudo Trees . . . 122
6.3.1 Pseudo Trees Created from Induced Graphs . . . 122
6.3.2 Hypergraph Decompositions . . . 124
6.4 Value Functions of Reasoning Problems . . . 124
6.4.1 Searching and/or Tree (AOT) and and/or Graph (AOG) . . . 128
6.5 General AND-OR Search - AO(i) . . . 130
6.5.1 Complexity . . . 131
6.6 AND/OR Search Algorithms For Mixed Networks . . . 133
6.6.1 AND-OR- Algorithm . . . 135
6.6.2 Constraint Propagation in AND-OR- . . . 137
6.6.3 Good and Nogood Learning . . . 139
6.7 Summary and Bibliographical Notes . . . 140
6.8 Appendix: Proofs . . . 141

7 Combining Search and Inference: Trading Space for Time . . . 143

7.1 The Cutset-Conditioning Scheme . . . 143
7.1.1 Cutset-conditioning for Constraints . . . 143
7.1.2 General Cutset-conditioning . . . 146
7.1.3 Alternating Conditioning and Elimination . . . 147
7.2 The Super-Cluster Schemes . . . 150
7.3 Trading Time and Space with AND/OR Search . . . 152
7.3.1 AND/OR cutset-conditioning . . . 152
7.3.2 Algorithm adaptive caching (AOC.q/) . . . 154
7.3.3 Relations between AOC(q), AO-ALT-VEC(q) and AO-VEC(q) . . . 157
7.3.4 AOC(q) compared with STCE(q) . . . 161
7.4 Summary and Bibliographical Notes . . . 162
7.5 Appendix: Proofs . . . 163

Conclusion

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2013 ReasoningwithProbabilisticandDe	Rina Dechter			Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms				10.2200/S00529ED1V01Y201308AIM023		2013