2015 CapabilityModelsandTheirApplica

(Zhang et al., 2015) ⇒ Yu Zhang, Sarath Sreedharan, and Subbarao Kambhampati. (2015). "Capability Models and Their Applications in Planning". In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. ISBN:978-1-4503-3413-6

Subject Headings: Agent Capability Model.

Notes

Cited By

Quotes

Author Keywords

Capability models; Agent theories and models; Teamwork in human-agent mixed networks

Abstract

One important challenge for a set of agents to achieve more efficient collaboration is for these agents to maintain proper models of each other. An important aspect of these models of other agents is that they are often not provided, and hence must be learned from plan execution traces. As a result, these models of other agents are inherently partial and incomplete. Most existing agent models are based on action modeling and do not naturally allow for incompleteness. In this paper, we introduce a new and inherently incomplete modeling approach based on the representation of capabilities, which has several unique advantages. First, we show that the structures of capability models can be learned or easily specified, and both model structure and parameter learning are robust to high degrees of incompleteness in plan traces (e.g., with only start and end states partially observed). Furthermore, parameter learning can be performed efficiently online via Bayesian learning. While high degrees of incompleteness in plan traces presents learning challenges for traditional (complete) models, capability models can still learn to extract useful information. As a result, capability models are useful in applications in which traditional models are difficult to obtain, or models must be learned from incomplete plan traces, e.g., robots learning human models from observations and interactions. Furthermore, we discuss using capability models for single agent planning, and then extend it to multi-agent planning (with each agent modeled separately by a capability model), in which the capability models of agents are used by a centralized planner. The limitation, however, is that the synthesized "plans" (called c-plans) are incomplete, i.e., there may or may not be a complete plan for a c-plan. This is, however, [[unavoidable[[for planning using partial and incomplete models (e.g., considering planning using action models learned from partial and noisy plan traces).

1. INTRODUCTION

One impottant challenge for a set of agents to achieve more efﬁcient collaboration is for these agents to maintain proper models of others. These models can be used by a centralized planner (e.g., on a robot) or via a distributed planning process to perform task planning and allocation, or by the agents themselves to reduce communication and collaboration efforts. In many applications, these models of other agents are not provided and hence must be learned. As a result, these models are going to be inherently partial and incomplete. Thus far, most traditional agent models are based on action modeling (e.g., [18, 7]). These models are not designed with partial information in mind and hence are complete in nature. In this paper, we introduce a new and inherently incomplete modeling approach based on the representation of capabilities. We represent a capability as the ability to achieve a panial state given another partial state. A capability can be fulﬁlled (or realized) by any action sequence (or plan) that can implement a transition between the two partial states. Each such action sequence is called an operation in this paper. A capability model can encode all possible capabilities for an agent in a given domain, as well as capture the probabilities of the existence of an operation to fulﬁll these capabilities (to implement the associated transitions). These probabilities determine which capabilities are more likely to be used in planning.

Compared to traditional agent models which are complete in nature, capability models have their unique beneﬁts and limitations. In this aspect, capability models should not be considered as a competitor to complete models. Instead, they are useful when complete models are difﬁcult to obtain or only panial and incomplete information can be retrieved to learn the models. This is often true when humans must be modeled.

The representation of a capability model is a generalization of a two time slice dynamic Bayesian network (2—TBN). While a 2— TBN is often used to represent a single action (e.g., [19]), a capability model can encode all possible capabilities for an agent in a given domain. In a capability model, each node in the ﬁrst time slice (also called a fact node) represents a variable that is used in the speciﬁcation of the initial world state. For each fact node, there is also a corresponding node in the second time slice, which represents the fact node in the eventual state (i.e., after applying a capability). These corresponding nodes are called eventual nodes or e—nodes. The state speciﬁed by the fact nodes is referred to as the initial state, and the state speciﬁed by the e—nodes is referred to as the eventual state. The edges between the nodes within the initial and eventual states, respectively, encode the correlations between the variables at the same time instance (i.e., variables in synchrony). The edges from the fact nodes to e—nodes encode causal relationships. Both types of edges can be learned (e.g., using learning techniques in [24, 25] for causal relationships). In the case that no prior information is provided, the safest approach is to not assume any independence. Consequently, we can specify a total ordering between the nodes within the initial and ﬁnal states, and connect a node at a given level to all nodes at lower levels, as well as every fact node to every e—node. After model construction, parameter learning of a capability model can be performed efﬁciently online via Bayesian learning.

Figure 1: Capability model (as a generalized 2-TBN) for a hu- man delivery agent (denoted as AG) in our motivating example with 6 fact node and e-node pairs. This model can encode all possible capabilities of the agent in this domain. Each fact node corresponds to a variable in the speciﬁcation of the initial state. e-nodes are labeled with a dot on top of the variable. Edges from the fact nodes to the corresponding e-nodes are simpliﬁed for a cleaner representation. Any partial initial state coupled with any partial eventual state may imply a capability.

One of the unique advantages of capability models is that learning for both model structure and parameters is robust to high degrees of incompleteness in plan execution traces. This incompleteness occurs commonly. For example, the execution observer may not be constantly monitoring the executing agent, and the executing agent may not always report the full traces [26]. Capability models can be learned even when only the initial and ﬁnal states are partially observed in plan traces for training. While high degrees of incompleteness in plan traces presents learning challenges for traditional models, capability models can still learn to extract useful information out of them, e.g., probabilities of the existence of an operation to achieve certain eventual states given certain initial states.

Furthermore, compared to traditional models for planning, capability models only suggest transitions between partial states (as capabilities), along with the probabilities that specify how likely such transitions can be implemented. Hence, a “plan” synthesized with capability models is incomplete, and we refer to such a plan as a c-plan (i.e., involving the application of capabilities). More specifically, we show that “planning” with capability models occurs in the belief space, and is performed via Bayesian inference. At any planning step, the current planning state is a belief state. Applying a capability to this belief state requires the planner to update this state using Bayesian inference on the capability model. After the discussion of single agent planning, we then extend to multi—agent planning ^[1]. In such cases, we can consider planning as assigning subtasks to agents without understanding precisely how these subtasks are to be handled. Moreover, multi—agent planning can be performed when part of the agents are modeled by capability models, and the other agents are modeled by complete models. This situation can naturally occur in human—robot teaming scenarios, in which human models need to be learned and the robot models are given. Similarly, the synthesized multi—agent c—plans are incomplete (unless all plan steps use robot actions). c—plans are useful when we have to plan using partial and incomplete models (e.g., considering planning using action models learned from partial and noisy plan traces), since they inform the user how likely complete plans can be realized by following their “guidelines”.

The rest of the paper is organized as follows. First, we provide a motivating example in Section 2. In Section 3, we discuss capability models in detail. We discuss how to use capability models in planning in Sections 4 and 5. The relationships between capability models and other existing approaches to modeling the dynamics of agents are discussed as related work in Section 6. We conclude afterwards.

2. MOTIVATING EXAMPLE

We start with a motivating example for capability models. In this example, we have a set of human delivery agents, and the tasks involve delivering packages to their destinations. However, there are a few complications here. An agent may be able to carry a package if this agent is strong enough. While an agent can use a trolley to load a package, this agent may not remember to bring a trolley initially. An agent can visit an equipment rental ofﬁce to rent a trolley, if this agent remembers to bring money for the rental. A trolley increases the chance of an agent being able to deliver a package (i.e., how likely the package is deliverable by this agent). Figure 1 presents the capability model for a human delivery agent, which is a generalized 2—TBN that can encode all possible capabilities for the agent in this domain. Similar to 2—TBN, fact nodes (e—nodes) in a capability model are in synchrony in the initial (eventual) state. Note that any partial initial state coupled with any partial eventual state may imply a capability.

In this example, the correlations between the variables are clear. For example, whether an agent can carry a package is dependent on whether the agent is strong; whether an agent can deliver a package may be inﬂuenced by whether the agent has a trolley. These correlations are represented as edges between the variables within the initial and eventual states, respectively and identically. Assuming that no prior information is provided for the causal relationships, we connect every fact node with every e—node (denoted as a single edge in Figure 1).

After obtaining the model structure, to perform parameter learning for the capability model of an agent, this agent can be given delivery tasks spanning a period of time for training purpose. However, the only observations that a manager has access to may be the number of packages that have been successfully delivered by this agent during this period. While this presents signiﬁcant challenges for learning the parameters of traditional complete models, the useful information for the manager is already encoded: the probability that this agent can deliver a package. Capability models can learn this information from the incomplete traces.

Now, suppose that the manager also has a set of robots (with action models) to use that can fetch items for the delivery agents. Since it is observed that the presence of a trolley increases the probability of successful delivery based on the capability models, the manager can make a multi—agent c—plan, which ensures that the robots deliver a trolley at least to some of the delivery agents. This also illustrates that it is beneﬁcial to combine capability models with other complete models (i.e., action models) when they are available (e.g., for robots in this example).

3. CAPABILITY MODEL

In our settings, the environment includes a set of agents [math]\displaystyle{ \Phi }[/math] working inside it. For each agent, for simplicity, we assume that the state of the world and this agent is speciﬁed by a set of boolean variables [math]\displaystyle{ X_\phi (\phi \in \Phi) }[/math]. This implies that an agent can only interact with other agents through variables that pertain to the world state.

More speciﬁcally, for all [math]\displaystyle{ X_i \in X_\phi }[/math], [math]\displaystyle{ X_i }[/math] has domain [math]\displaystyle{ D(X_\phi) = \{true, false\} }[/math]. To specify partial state, we augment the domains of variables to include an unknown or unspeciﬁed value as in [1]. We write [math]\displaystyle{ D^+ (X_i) = D(X_i) \cup \{u\} }[/math]. Hence, the (partial) state space is denoted as [math]\displaystyle{ S = D^+(X_1) \times D^+(X_2) \times \cdots \times D^+(X_N) }[/math], in which [math]\displaystyle{ N = |\cup_\phi X_\phi| }[/math]

3.1 Capability

First, we formally deﬁne capability for an agent [math]\displaystyle{ \phi \in \Phi }[/math]. The state space of agent [math]\displaystyle{ \phi }[/math] is denoted as [math]\displaystyle{ S_\phi }[/math].

DEFINITION 1 (CAPABILITY). Given an agent [math]\displaystyle{ \phi }[/math], a capability speciﬁed as a mapping [math]\displaystyle{ S_\phi \times S_\phi \rightarrow [0, 1] }[/math], is an assertion about the probability of the existence of complete plans (for [math]\displaystyle{ \phi }[/math] that connect an initial state (i.e., the ﬁrst component on the left hand side of the mapping) to an eventual state (i.e., the second component).

A capability is also denoted as [math]\displaystyle{ S_I \Rightarrow S_E }[/math] (i.e., the initial state [math]\displaystyle{ \Rightarrow }[/math] the eventual state) when we do not need to reference the associated probability value. The probability value is denoted as [math]\displaystyle{ P(S_I \Rightarrow S_E) }[/math], which we will show later is computed based on the capability model via Bayesian inference. There are a few notes about Deﬁnition 1. First, both [math]\displaystyle{ S_I }[/math] and [math]\displaystyle{ S_E }[/math] can be partial states: the variables that have the value [math]\displaystyle{ u }[/math] are assumed to be unknown (in the initial state) and unspeciﬁed (in the eventual state). In this paper, we refer to a complete plan that fulﬁlls (or realizes) a capability as an operation, which is going to be decided and implemented by the executing agent during execution. Although capabilities seem to be similar to high—level actions in HTN, the semantics is different from angelic uncertainty in HTN [12]. While the existence of a concretization is ensured in HTN planning via reduction schemas, it is only known with some level of certainty that a concretization (i.e., operation) exists for a capability in a capability model, and the capability model does not provide speciﬁcs for such a concretization. More discussions on this are provided in Section 6 when we discuss the relationships between capability models and existing approaches to modeling agent dynamics.

Capability models also address the qualiﬁcation and ramiﬁcation problems, which are assumed away in STRIPS planning (and planning with many complete models). More speciﬁcally, an operation for a capability [math]\displaystyle{ S_I \Rightarrow S_E }[/math] may be dependent on variables with unknown values in [math]\displaystyle{ S_I }[/math], and updating variables with unspeciﬁed values in [math]\displaystyle{ S_E }[/math]. This is a unique characterization of capability and critical for learning capability models with incomplete observations.

For example, a capability may specify that given coffee beans, an agent can make a cup of coffee. Thus, we have a capability {Has coﬂee beans = true} ⇒ {Has coﬂee = true}. An operation for this capability to make a coffee may be dependent on the availability of water initially, which is not speciﬁed in the capability. Similarly, this operation may negate the fact that the kitchen is clean, which is not speciﬁed in the capability either. Note that a capability may be fulﬁlled by operations with different speciﬁcations of the initial and eventual states, as long as these speciﬁcations all satisfy the speciﬁcation of this capability.

A capability model of an agent is capable of encoding all capabilities of this agent given a domain; it is designed to encode the following probability distribution:

[math]\displaystyle{ T(X_\phi, \dot{X}_\phi)=\displaystyle\int_O^T P(X_\phi, \dot{X}_\phi, t)dt \quad\quad(1) }[/math]

in which T represents the maximum length of any operation (i.e., number of actions) for [math]\displaystyle{ \phi }[/math] ^[2] as represents the initial state and 2% represents the eventual state. Furthermore, [math]\displaystyle{ P(X_\phi, \dot{X}_\phi }[/math] is the probability of any operation resulting in [math]\displaystyle{ \dot{X}_\phi }[/math] in exact time [math]\displaystyle{ t }[/math] given [math]\displaystyle{ X_\phi }[/math]. Hence, the probability that is associated with a capability [math]\displaystyle{ S_I \Rightarrow S_E }[/math](i.e.,[math]\displaystyle{ P(S_I \Rightarrow S_E) }[/math] encodes:

[math]\displaystyle{ P(\dot{X}_\phi =S_E | X_\phi = S_I) = \displaystyle\int_t P(\dot{X}_\phi = S_E,t | X_\phi = S_I)dt \quad\quad(2) }[/math]

We construct the capability model of an agent as a Bayesian network. As an inherently incomplete model, it not only allows the initial and eventual states to be partially speciﬁed for any capability (and hence the fulﬁlling operations) that it encodes, but also allows the correlations between the variables within the initial and eventual states, as well as the causal relationships between them to be partially speciﬁed (e.g., when learning from plan traces). However, there are certain implications in this (e.g., the modeling can lose some information), which we will discuss in Section 3.4. Along the line of the qualiﬁcation and ramiﬁcation problems, a capability model also allows certain variables to be excluded completely from the network (i.e., related variables that are not captured in X as) due to incomplete knowledge. For example, whether an agent can drive a car (with a manual transmission) to a goal location is dependent on whether the agent can drive a manual car, even through the agent has a driver license. In this case, the ability to drive a manual car may have been ignored when creating the model.

3.2 Model Construction

We construct the capability model of each agent as an augmented Bayesian network [14]. Any partial initial state coupled with any partial eventual state may imply a capability; the probability that a capability actually exists (i.e., it can be fulﬁlled by an operation) is computed via a Bayesian inference in the network. We use augmented Bayesian network since it allows prior beliefs of conditional relative frequencies to be speciﬁed before observations are made, as well as enables us to adjust how fast these beliefs should change.

DEFINITION 2. An augmented Bayesian network [math]\displaystyle{ (ABN) (G, F, \rho) }[/math] is a Bayesian network with the following speciﬁcations:

A [math]\displaystyle{ DAG \;G : (V, E) }[/math], where [math]\displaystyle{ V }[/math] is a set of random variables, [math]\displaystyle{ V = \{V_1, V_2, \cdots, V_n\} }[/math].

[math]\displaystyle{ F }[/math] is a set of auxiliary parent variables for [math]\displaystyle{ V }[/math].

[math]\displaystyle{ \forall \in V }[/math], an auxiliary parent variable [math]\displaystyle{ F_i \in F }[/math] of [math]\displaystyle{ V_i }[/math]; and a density function [math]\displaystyle{ \rho_i }[/math] associated with [math]\displaystyle{ F_i }[/math]. Each [math]\displaystyle{ F_i }[/math], is a root and it is only connected to [math]\displaystyle{ V_i }[/math].

[math]\displaystyle{ \forall V_i \in V }[/math], for all values [math]\displaystyle{ pa_i }[/math] of the parents [math]\displaystyle{ PA_i \subseteq V }[/math] of [math]\displaystyle{ V_i }[/math]; and for all values [math]\displaystyle{ f_i }[/math] of [math]\displaystyle{ F_i }[/math], a probability distribution [math]\displaystyle{ P(V_i|pa_i, f_i) }[/math].

A capability model of an agent [math]\displaystyle{ \phi }[/math] is then deﬁned as follows:

DEFINITION 3 (CAPABILITY MODEL). A capability model of an agent [math]\displaystyle{ \phi }[/math], as a binomial [math]\displaystyle{ ABN (G_\phi,F, \rho) }[/math], has thefollowing speciﬁcations:

[math]\displaystyle{ V_\phi= X_\phi \cup \dot{X}_\phi }[/math].

[math]\displaystyle{ \forall V_i \in V_\phi }[/math], the domain of [math]\displaystyle{ V_i }[/math] is [math]\displaystyle{ D(V_i) = {true,false} }[/math].

[math]\displaystyle{ \forall V_i \in V_\phi }[/math], [math]\displaystyle{ F_i = \{F_{i1}, F_{i2}, \cdots\} }[/math], and each [math]\displaystyle{ F_{ij} }[/math] is a root and has a density function [math]\displaystyle{ \rho_{ij}(f_{ij}) (0\leq f_{ij} \leq 1) }[/math]. (For each value [math]\displaystyle{ pa_{ij} }[/math] of the parents [math]\displaystyle{ PA_i }[/math], there is an associated variable [math]\displaystyle{ F_{ij} }[/math]).

[math]\displaystyle{ \forall V_i \in V_\phi, P(V_i =true|pa_{ij}, f_{i1},\cdots, f_{ij},\cdots) = f_{ij} }[/math].

in which [math]\displaystyle{ j }[/math] in [math]\displaystyle{ pa_{ij} }[/math] indexes into the values of [math]\displaystyle{ PA_i.j }[/math] in [math]\displaystyle{ f_{ij} }[/math] indexes into the variables in [math]\displaystyle{ F_i }[/math]. Note that deﬁning partial states (i.e., allowing variables to assume the value [math]\displaystyle{ u }[/math] in a state) is used to more conveniently specify the distribution in Equation 1. Variables in the capability model do not need to expand their domains to include [math]\displaystyle{ u }[/math].

For edge construction, we can learn the correlations and causal relationships from plan traces. Note that the correlations between variables must not form loops; otherwise, they need to be broken randomly. When no training information is provided, we can specify a total ordering between the nodes within the initial and ﬁnal states, and connect a node at a given level to all nodes at lower levels, as well as every fact node to every e—node. Denote the set of edges as [math]\displaystyle{ E_\phi }[/math]. We then have constructed the capability model [math]\displaystyle{ G_\phi = (V_\phi, E_\phi) }[/math] for [math]\displaystyle{ \phi }[/math] Figure 1 provides a simple example of a capability model.

3.3 Parameter Learning

In this section, we describe how the model parameters can be learned. The parameter learning of a capability model is performed online through Bayesian learning. The initial model parameters can be computed from existing plan traces by learning the density functions (i.e., [math]\displaystyle{ f_{ij} }[/math]) in Deﬁnition 3. These parameters can then be updated online as more traces are collected (i.e., as the agent interacting with the environment).

Plan execution traces can be collected each time that a plan (or a sequence of actions) is executed, whether succeeds or fails.

DEFINITION 4 (COMPLETE PLAN TRACE). A complete plan trace is a continuous sequence ofstate observations over time, denoted as [math]\displaystyle{ \mathcal{T} = \langle s^*_1,s^*_2 \cdots, s^*_L \rangle, }[/math] in which [math]\displaystyle{ L }[/math] is the length of the plan and [math]\displaystyle{ s^*_i }[/math] denotes a complete state (i.e.,[math]\displaystyle{ s^*_i \in D(X_1) \times D(X_2) \times \cdots \times D(X_N)) }[/math].

However, in real-world situations, plan traces may be incomplete. The incompleteness can come from two aspects. First, the observed state may be partial. Second, the observations may not be continuous. Hence, we are going to have partial plan traces.

DEFINITION 5 (PARTIAL PLAN TRACE). A partialplan trace is a discontinuous sequence ofpartial state observations over time, denoted as [math]\displaystyle{ \mathcal{T} =\langle s_i,s_{i+k_1},s_{i+k_2}, \cdots \rangle }[/math] in which [math]\displaystyle{ i }[/math] denotes the time step in the complete plan and [math]\displaystyle{ 5 }[/math] denotes a partial state (i.e., [math]\displaystyle{ s_i \in D^+(X_1) \times D^+(X_2) \times \cdots \times D^+(X_N) }[/math]).

Note that the only assumption that is made in Deﬁnition 5 is that at least two different partial states must be observed during the plan execution. This means that even the start and end states of a plan execution do not necessarily have to be observed or partially observed, which is especially useful in real-world situations where a plan trace may only be a few samplings of (partial) observations during a plan execution. Note also that since the observations are in discontinuous time steps, the transition between contiguous state observations is not necessarily the result of a single action. Instead, it is the result of a plan (i.e., operation) execution. Henceforth, when we refer to plan traces, we always intend to mean partial plan traces, unless otherwise speciﬁed.

When more than two states are observed in a plan trace, it can be considered as a set of traces, with each pair of contiguous states as a separate trace. When the states are partially observed in a plan trace, it can be considered as a set of compatible traces with complete state observations ^[3] . For simplicity, we assume in the following that the plan execution traces used in learning have complete state observations. We denote this set of traces as [math]\displaystyle{ D }[/math].

To learn the parameters of a capability model, a common way is to model [math]\displaystyle{ F_{ij} }[/math] using a beta distribution (i.e., as its density function [math]\displaystyle{ \rho }[/math]). Denote the parameters for the beta distribution of [math]\displaystyle{ F_{ij} }[/math] as [math]\displaystyle{ a_{ij} }[/math] and [math]\displaystyle{ b_{ij} }[/math]. Then, we have:

[math]\displaystyle{ P(X_i = true|pa_{ij}) = \dfrac{a_{ij}}{a_{ij} + b_{ij}} \quad\quad(3) }[/math]

Suppose that the initial values or the current values for [math]\displaystyle{ a_{ij} }[/math] and [math]\displaystyle{ b_{ij} }[/math] are given. The remaining task is to update [math]\displaystyle{ a_{ij} }[/math] and [math]\displaystyle{ b_{ij} }[/math] from the given traces. Given the training set [math]\displaystyle{ D }[/math], we can now follow Bayesian inference to update the parameters of [math]\displaystyle{ Fij }[/math] as follows:

Initially or currently,

[math]\displaystyle{ \rho(f_{ij}) = beta (f_{ij}; a_{ij}, b_{ij}) \quad\quad (4) }[/math]

After observing new training examples [math]\displaystyle{ D }[/math], we have:

[math]\displaystyle{ \rho(f_{ij}|D) = beta (f_{ij}, a_{ij} + s_{ij}, b_{ij} + t_{ij}) \quad\quad (5) }[/math]

in which [math]\displaystyle{ s_{ij} }[/math] is the number of times for which [math]\displaystyle{ X_i }[/math] is true while [math]\displaystyle{ PA_i }[/math] assuming the value of [math]\displaystyle{ pa_{ij} }[/math] and [math]\displaystyle{ t_{ij} }[/math] is the number in which it equals [math]\displaystyle{ X_i }[/math] is false while [math]\displaystyle{ PA_i }[/math] assuming the value of [math]\displaystyle{ pa_{ij} }[/math].

3.4 Implications

In this section, we discuss several implications of capability models with a simple example. We ﬁrst investigate how information can be lost during learning when the correlations or causal relationships among the variables are only partially captured. This can occur, for example, when model structure is learned. In this example, we have two blocks [math]\displaystyle{ A }[/math], [math]\displaystyle{ B }[/math], and a table. Both blocks can be placed on the table or on each other. The capability model for an agent is speciﬁed in Figure 2. Initially, assuming that we do not have any knowledge of the domain, the density functions can be speciﬁed as beta distributions with [math]\displaystyle{ a = b }[/math]. In Figure 2, we use [math]\displaystyle{ a = b = 1 }[/math] for all distributions. Suppose that we observe the following plan trace, which can be the result of executing a single action that places [math]\displaystyle{ A }[/math] on [math]\displaystyle{ B }[/math]:

[math]\displaystyle{ s_1 : OnTable(A) \wedge OnTable(B) \wedge \neg On(A, B) \wedge \neg On(B,A) }[/math]

[math]\displaystyle{ s_2 : \neg OnTable(A) \wedge OnTable(B) \wedge On(A, B) \wedge \neg On(B,A) }[/math]

Based on the learning rules in Equation (5), we can update the beta distributions accordingly. For example, the beta distribution for [math]\displaystyle{ \dot{X}_3 }[/math] (i.e., [math]\displaystyle{ On(A, B) }[/math] in the eventual state) is updated to beta([math]\displaystyle{ X_1 = true,\; X_4 = true,\; \dot{X}_4 =false, \cdots;2, 1 }[/math]). This means that if both [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] are on the table, it becomes more likely that a capability exists for making [math]\displaystyle{ On(A, B ) = true }[/math]. This is understandable since an action that places [math]\displaystyle{ A }[/math] on [math]\displaystyle{ B }[/math] would achieve [math]\displaystyle{ On (A, B ) = true }[/math] in such cases. For actions with uncertainties (i.e., when using a robotic arm to perform the placing action), this beta distribution would converge to the success rate as more experiences are obtained.

Meanwhile, we also have that the beta distribution for [math]\displaystyle{ \dot{X}_2 }[/math] is updated to beta([math]\displaystyle{ X_1 = true, X_3 = false, \dot{X}_4 = false, \dot{X}_3 = true, ...; 1, 2 }[/math]). This means that if [math]\displaystyle{ A }[/math] is on [math]\displaystyle{ B }[/math] in the eventual state, it becomes less likely that [math]\displaystyle{ B }[/math] can also be on [math]\displaystyle{ A }[/math] in the eventual state. This is intuitive since we know that achieving [math]\displaystyle{ On(A, B) = true }[/math] and [math]\displaystyle{ On(B, A) = true }[/math] at the same time is impossible. In this way, capability models can reinforce the correlations between the variables as experiences are observed. The implication is that the edges (capturing the correlations) between these variables must be present; otherwise, information may be lost as described above. If the correlations are not fully captured by the model, for example, when the light blue arrows in Figure 2 are not present, the beta distribution of [math]\displaystyle{ \dot{X}_2 }[/math] would not be_updated as above, since [math]\displaystyle{ \dot{X}_3 }[/math] would no longer be a parent node of [math]\displaystyle{ \dot{X}_2 }[/math]. A similar implication also applies to causal relationships.

Figure 2: Capability model for an agent in a simple domain with four variables. [math]\displaystyle{ OnTable(A) }[/math] means that object [math]\displaystyle{ A }[/math] is on the table. [math]\displaystyle{ On (A, B ) }[/math] means that object [math]\displaystyle{ A }[/math] is on [math]\displaystyle{ B }[/math]. The correlations that correspond to the light blue arrows distinguish between two scenarios: one with proper model structure and one without (i.e., when certain correlations are missing). For clarity, we only specify the density functions of the variables in the initial state. We also show the two sets of density functions for the augmented variables for [math]\displaystyle{ X_2 }[/math] for the two different scenarios, respectively.

When environment changes, previous knowledge that is learned by the model may no longer be applicable. However, as the parameters grow, the learned model can be reluctant to adapt to the new environment. This issue can be alleviated by performing normalizations occasionally (i.e., dividing all [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] by a constant in the beta distributions), or by weighting previous experiences with [math]\displaystyle{ a }[/math] discounting factor.

4. USING CAPABILITY MODELS IN PLANNING =

Capability models described in the previous section allow us to capture the capabilities of agents. In this section, we discuss the application of capability models in single agent planning (i.e., with an agent ([math]\displaystyle{ \phi }[/math] modeled by a capability model). We extend the discussion to multi—agent planing in the next section. Generally, planning with capability models occurs in the beliefspace, as with POMDPs [10].

4.1 Single Agent Planning

First, note that applying a capability [math]\displaystyle{ s_I \Rightarrow s_E }[/math] of agent ([math]\displaystyle{ \phi }[/math] to a complete state [math]\displaystyle{ s^* }[/math] results in a belief state [math]\displaystyle{ b(\mathcal{S}) }[/math] as long as [math]\displaystyle{ s_I \sqsubseteq s^* }[/math] (otherwise, this capability cannot be applied), in which [math]\displaystyle{ s_I \sqsubseteq s^* }[/math] denotes that all the variables that are not assigned to [math]\displaystyle{ u }[/math] in [math]\displaystyle{ s_1 }[/math] have the same values as those in [math]\displaystyle{ s^* }[/math]. After applying the capability, assuming successfully, we can compute the probability weight of a state s that satisﬁes [math]\displaystyle{ s_E \sqsubseteq s }[/math] in the resulting belief state as follows:

[math]\displaystyle{ P(S) = P\dfrac{(s_E \Rightarrow s)}{P(S^*\Rightarrow s_E)} = \dfrac{P(\dot{X}_\phi = s| X_\phi = s^*)}{P(\dot{X}_{\phi}= s_E| X_{\phi}=s^*)}\quad\quad }[/math]

For any state [math]\displaystyle{ s }[/math] that does not satisfy [math]\displaystyle{ s_E \sqsubseteq s }[/math], we have [math]\displaystyle{ P(s) = 0 }[/math]. Denote [math]\displaystyle{ \mathcal{S} }[/math] in [math]\displaystyle{ b(\mathcal{S}) }[/math] as [math]\displaystyle{ \mathcal{S} = \{ s|s_E \sqsubseteq s }[/math] and [math]\displaystyle{ s }[/math] is a complete state[math]\displaystyle{ \} }[/math]. Clearly, we have:

[math]\displaystyle{ \displaystyle \sum_{s \in S} P(s) = 1 \quad \quad (7) }[/math]

Since there can be an exponential number of complete states in a belief state, depending on how many variables are assigned to [math]\displaystyle{ u }[/math], we can use a sampling approach (e.g., Monte Carlo sampling) to keep a set of complete states to represent [math]\displaystyle{ b(\mathcal{s}) }[/math]. We denote the belief state after sampling as [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math].

When applying a capability [math]\displaystyle{ s_I \sqsubseteq s_E }[/math] to a given belief state [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math], for each complete state in [math]\displaystyle{ S }[/math], we can perform sampling based on Equation (6), which returns a set of complete states with weights after applying the capability. We can then perform resampling on the computed sets of states for all states in [math]\displaystyle{ S }[/math] to compute the new belief state [math]\displaystyle{ \hat{b}(\mathcal{S}') }[/math]. In this way, we can connect different capabilities of an agent to create c-plans, which are plans in which capabilities are involved. Next, we formally deﬁne a planning problem for a single agent with a capability model.

DEFINITION 6 (SINGLE AGENT PLANNING PROBLEM). A single agent planning problem with capability models is a tuple [math]\displaystyle{ \langle b(\mathcal{I}), G, \rho \rangle }[/math], in which [math]\displaystyle{ b(\mathcal{I}) }[/math] is the initial belief state, [math]\displaystyle{ G }[/math] is the set of goal variables, and [math]\displaystyle{ \rho }[/math] is a real value in [math]\displaystyle{ [0, 1] }[/math]. The capability model of the agent is [math]\displaystyle{ G_\phi = (V_\phi, E_\phi) }[/math]. When we write [math]\displaystyle{ \rho^* }[/math] instead of [math]\displaystyle{ \rho }[/math] in the problem, it indicates a variance of the problem that needs to maximize [math]\displaystyle{ \rho }[/math].

DEFINITION 7 (SINGLE AGENT C—PLAN). A single agent c-plan for a problem [math]\displaystyle{ \langle b(\mathcal{I}), G, \rho \rangle }[/math] is a sequence of application of capabilities, such that the sum of the weights of the complete states in the belief state which include the goal variables (i.e., [math]\displaystyle{ G \sqsubseteq s }[/math]), is no less than [math]\displaystyle{ \rho }[/math] (or maximized for [math]\displaystyle{ \rho^* }[/math] ) after the application.

The synthesized single agent “plan” (i.e., a c—plan) is incomplete: it does not specify which operation fulﬁlls each capability. In fact, it only informs the user how likely there exists such an operation. A single agent c—plan can be considered to provide likely landmarks for the agent to follow. For example, in Figure 2, suppose that the initial state is

[math]\displaystyle{ s_I :On(A, B) \wedge \neg On(B, A) }[/math]

and the goal is

[math]\displaystyle{ s_E :\neg On(A, B) \wedge On(B,A) }[/math]

A c—plan created with a capability model may include an intermediate state in the form of [math]\displaystyle{ S_I \Rightarrow s_{in} \Rightarrow S_E }[/math],^[4] in which [math]\displaystyle{ s_{in} }[/math] can include, e.g., [math]\displaystyle{ OnTable(A) = true }[/math] and [math]\displaystyle{ OnTable(B) = true }[/math], such that the probability of the existence of a sequence of operations to fulﬁll the c—plan may be increased. Another example is our motivating example (Figure 1) in which having [math]\displaystyle{ has\_trolley(AG) = true }[/math] as an intermediate state helps delivery when [math]\displaystyle{ has\_trolley(AG) }[/math] is false at the beginning.

Although a single agent c—plan may seem to be less useful than a complete plan synthesized with action models, it is unavoidable when we have to plan with incomplete knowledge (e.g., incomplete action models). This is arguably more common in multi-agent systems, in which the planner needs to reason about likely plans for agents even when it does not have complete information about these agents. A speciﬁc application in such cases is when we need to perform task allocation, in which it is useful for the system to specify subtasks or state requirements for agents that are likely to achieve them without understanding how the agents achieve them.

4.2 Planning Heuristic

Given a single agent planning problem [math]\displaystyle{ \langle \phi, b(\mathcal{I}) G, \rho \rangle }[/math], besides achieving the goal variables, planning should also aim to reduce the cost of the c—plan (i.e., probability of success for the sequence of application of capabilities in the c—plan).

ASSUMPTION: To create a heuristic, we make the following assumption capabilities do not have variables with false values in [math]\displaystyle{ s_I }[/math]. With this assumption, we have the following monotonicity properties hold:

[math]\displaystyle{ P(s_I 2\Rightarrow s_E) \geq P(s'_I \Rightarrow s_E)(T(s'_1) \subseteq T(s_I) \wedge F(s_I) \subseteq F(s'_1))\quad\quad (8) }[/math]

in which [math]\displaystyle{ T }[/math], [math]\displaystyle{ F }[/math] are used as operators on a set of variables to denote the set of variables with true and false values, respectively. Equation (8) implies it is always easier to achieve the desired state with more true—value variables and lessfalse—value variables in the initial state; and

[math]\displaystyle{ P(s_I 2\Rightarrow s_E) \geq P(s_I \Rightarrow s'_E)(T(s_E) \subseteq T(s'_E) \wedge F(s_E) \subseteq F(s'_E))\quad\quad (9) }[/math]

which implies that it is always more difﬁcult to achieve the speciﬁed values for more variables in the eventual state.

HEURISTIC: We use [math]\displaystyle{ A^* }[/math] to perform the planning. At any time the current planning state is a belief state [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math] (i.e., [math]\displaystyle{ b(\mathcal{S}) }[/math] after sampling); there is also a sequence of application of capabilities (i.e., a c—plan preﬁx), denoted as [math]\displaystyle{ \phi }[/math], to reach this belief state from the initial state. We compute the heuristic value for [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math] (i.e., [math]\displaystyle{ f(\hat{b}(\mathcal{S})) = g(\hat{b}(\mathcal{S}))) + h(\hat{b}(\mathcal{S}))) }[/math] as follows. First, we compute [math]\displaystyle{ g(\hat{b}(\mathcal{S})) }[/math] as the sum of the negative logarithms of the associated probabilities of capabilities in [math]\displaystyle{ \pi }[/math].

To compute [math]\displaystyle{ h(\hat{b}(\mathcal{S})) }[/math], we need to ﬁrst compute [math]\displaystyle{ h(s) }[/math] for each complete state [math]\displaystyle{ s \in \mathcal{S} }[/math]. To compute an admissible [math]\displaystyle{ h(s) }[/math] value, we denote the set of goal variables that are currently false in [math]\displaystyle{ s }[/math] as [math]\displaystyle{ G_s }[/math]. Then, we compute [math]\displaystyle{ h(s) }[/math] as follows:

[math]\displaystyle{ h(s) = \underset{v \in G_s, s_{\neg v}}{\operatorname{argmax}} — \log P(s_{\neg v} \Rightarrow s \{v = true\}) \quad\quad(10) }[/math]

in which [math]\displaystyle{ s_{\neg v} }[/math] denotes a complete state with only [math]\displaystyle{ v }[/math] as false, and [math]\displaystyle{ s\{v = true\} }[/math] denotes the state of [math]\displaystyle{ s }[/math] after making [math]\displaystyle{ v }[/math] true. Finally, we compute [math]\displaystyle{ h(\hat{b}(\mathcal{S})) }[/math] as:

[math]\displaystyle{ h(\hat{b}(\mathcal{S}))= \displaystyle \sum_{s\in \mathcal{S}} P(s) h(s)\quad\quad (11) }[/math]

LEMMA 1. The heuristic given in Eq. (11) is admissible for ﬁnding a c-plan that maximizes [math]\displaystyle{ \rho }[/math] (i.e., with [math]\displaystyle{ p^* }[/math]), given that [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math] accurately represents [math]\displaystyle{ b(\mathcal{S}) }[/math].

PROOF. We need to prove that [math]\displaystyle{ h(\hat{b}(\mathcal{S})) }[/math] is not an over—estimate of the cost for [math]\displaystyle{ \mathcal{S} }[/math] to reach the goal state [math]\displaystyle{ G }[/math] while maximizing [math]\displaystyle{ \rho }[/math]. First, note that we can always increase [math]\displaystyle{ \rho }[/math] by trying to move a non-goal state (in the current planning state) to a goal state (i.e., [math]\displaystyle{ G \sqsubseteq s }[/math]). Furthermore, for each [math]\displaystyle{ s \in \mathcal{S} }[/math], given the monotonicity properties, we know that at least [math]\displaystyle{ h(s) }[/math] cost must be incurred to satisfy [math]\displaystyle{ G }[/math]. Hence, the conclusion holds.

Figure 3: Illustration of solving a single agent planning prob- lem with a capability model. The topology of the capability model used is shown in the top part of the ﬁgure.

Lemma 1 implies that the [math]\displaystyle{ A^* }[/math] search using the heuristic in Equation(11) would continue to improve [math]\displaystyle{ \rho\lt math\gt given sufﬁcient time. Hence, the heuristic should be used as an anytime heuristic and stop when a desired value of \lt math\gt \rho }[/math] is obtained or the increment is below a threshold. Also, approximation solutions should be considered in future work to scale to large networks.

4.3 Evaluation

We provide a preliminary evaluation of single agent planning with a capability model. We build this evaluation based on the blocksworld domain. First, we use 20 problem instances with 3 blocks^[5] and generate a complete plan for each instance. Then, we randomly remove 1 — 5 actions from each complete plan to simulate partial plan traces.

The capability model for this domain contains 12 variables and we ignore the holding and handempty predicates to simulate partial state observations. We manually construct the correlations between the variables in the initial and eventual states. For example, [math]\displaystyle{ On (A, B ) }[/math] is connected with [math]\displaystyle{ On(B, A) }[/math] since they are clearly conﬂicting. For causal relationships, we connect every node in the initial state to every node in the eventual state.

After learning the parameters of this capability model based on the partial plan traces, we apply the capability model to solve a problem as shown in Figure 3. The topology of the capability model constructed is shown in the top part of the ﬁgure, which is also the model used in Figure 4. Since we have connected every fact node with every e—node in this case, the model appears to be quite complex. Initially, we have [math]\displaystyle{ On(B3, B2) }[/math], [math]\displaystyle{ On(B2, B1) }[/math], and [math]\displaystyle{ OnTable(B 1) }[/math] (a complete state), and the goal is to achieve [math]\displaystyle{ On(B2, B3) }[/math]. We can see in Figure 3 that [math]\displaystyle{ \mathcal{I} }[/math] in [math]\displaystyle{ b(\mathcal{I}) }[/math] contains a single complete state. The c—plan involves the application of two different capabilities. For illustration purpose, we only show two possible states (in the belief state) after applying the ﬁrst capability,

For one of the two states, we then show two possible states after applying the second capability. Note that each arrow represents a sequence of actions in the original action model. Also, the possible states must be compatible with the speciﬁcations of the eventual states in the capabilities. We see some interesting capabilities being learned. For example, the ﬁrst capability is to pull out a block that is between two other blocks, and place it somewhere else.

5. MULTI-AGENT PLANNING

In this section, we extend our discussion from single agent planning to multi—agent planning. In particular, we discuss how capability models can be used along with other complete models (i.e., action models) by a centralized planner to synthesize c—plans. The settings are similar to those in our motivating example. In particular, we refer to multi—agent planning with mixed models as MAP—MM. This formulation is useful in applications in which both human and robotic agents are involved. While robotic agents are programmed and hence have complete models, the models of human agents must be learned. Hence, we use capability models to model human agents and assume that the models for the human agents are already learned (e.g., by the robots) for planning.

For robotic agents, we assume STRIPS action model [math]\displaystyle{ (\mathcal{R}, \mathcal{O}) }[/math], in which [math]\displaystyle{ mathcal{R} }[/math] is a set of predicates with typed variables, [math]\displaystyle{ \mathcal{O} }[/math] is a set of STRIPS operators. Each operator [math]\displaystyle{ o \in \mathcal{O} }[/math] is associated with a set of preconditions [math]\displaystyle{ Pre(o) \subseteq \mathcal{R} }[/math], add effects [math]\displaystyle{ Add(o) \subseteq \mathcal{R} }[/math] and delete effects [math]\displaystyle{ Del(o) \subseteq \mathcal{R} }[/math].

DEFINITION 8. Given a set of robots [math]\displaystyle{ R = \{r\} }[/math], a set of human agents [math]\displaystyle{ \phi = \{\phi\} }[/math], and a set of typed objects [math]\displaystyle{ O }[/math], a multi-agent planning problem with mixed models is given by a tuple [math]\displaystyle{ \Pi = \langle \Phi, R, b(\mathcal{I}), G, \rho \rangle }[/math], where:

Each [math]\displaystyle{ r \in R }[/math] is associated with a set of actions [math]\displaystyle{ A(r) }[/math] that are instantiated from [math]\displaystyle{ \mathcal{O} }[/math] and [math]\displaystyle{ O }[/math], which [math]\displaystyle{ r \in R }[/math] can perform; each action may not always succeed when executed and hence is associated with a cost.

Each [math]\displaystyle{ \phi \in \Phi }[/math] is associated with a capability model [math]\displaystyle{ G_\phi\ = (V_\phi, E_\phi,), }[/math] in which [math]\displaystyle{ V_\phi, = X_\phi \cup \hat{X}_\phi. X_\phi \subseteq X, }[/math] in which [math]\displaystyle{ X\phi }[/math] represents the state variables of the world and agent [math]\displaystyle{ \phi }[/math] and [math]\displaystyle{ X }[/math] represents the joint set of state variables of all agents.

Note that the grounded predicates (which are variables) for robots that are instantiated from [math]\displaystyle{ \mathcal{R} }[/math] and [math]\displaystyle{ O }[/math] also belong to [math]\displaystyle{ X }[/math] . Human and robotic agents interact through the shared variables.

LEMMA 2. The MAP-MM problem is at least PSPACE-complete.

PROOF. We only need to prove the result for one of the extreme cases: when there are only robotic agents. The problem then essentially becomes a multi—agent planning problem, which is more general than the classical planning problem, which is known to be PSPACE—complete.

5.1 State Expansion for MAP-MM

First, we make the same assumption as we made in single agent planning, such that the monotonicity properties still hold. Given the current planning state [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math] (i.e., sampled from [math]\displaystyle{ b(\mathcal{S}) }[/math]), for each [math]\displaystyle{ s \in \mathcal{S} }[/math]), a centralized planner can expand it in the next step using the following two options in MAP—MM:

Choose a robot [math]\displaystyle{ r }[/math] and an action [math]\displaystyle{ a \in A(r) }[/math] such that at least one of the complete states [math]\displaystyle{ s }[/math] in [math]\displaystyle{ \mathcal{S} }[/math] satisﬁes [math]\displaystyle{ T(Pre(a)) \subseteq T(s) }[/math].
Choose a capability on human agent [math]\displaystyle{ \phi }[/math] with the speciﬁcation [math]\displaystyle{ s \Rightarrow s_E }[/math], such that at least one of the states [math]\displaystyle{ s }[/math] in [math]\displaystyle{ \mathcal{S} }[/math] satisﬁes [math]\displaystyle{ T(s_1) \subseteq T(s) }[/math].

If we have chosen an action [math]\displaystyle{ a }[/math], for any [math]\displaystyle{ s \in \mathcal{S} }[/math] that satisﬁes [math]\displaystyle{ T(Pre(a)) \subseteq T(s) }[/math], the complete state [math]\displaystyle{ s\lt math\gt after applying a becomes \lt math\gt s' }[/math], such that [math]\displaystyle{ T(s') = (T(s) \cup Add(a)) \ Del(a) }[/math], and [math]\displaystyle{ F(s') = (F(s) \cup Del(a)) \ Add(a) }[/math]. The probability weight of [math]\displaystyle{ s' }[/math] in the new belief state after applying [math]\displaystyle{ a }[/math] does not change. For states in [math]\displaystyle{ \mathcal{S} }[/math] that do not satisfy [math]\displaystyle{ T(Pre(a)) \subseteq T(s) }[/math], we assume that the application of [math]\displaystyle{ a }[/math] does not change anything. In this way, we can construct the new belief state after applying this action.

If we have chosen a capability in the form of [math]\displaystyle{ s_I \in s_E }[/math] from a human agent ([math]\displaystyle{ \phi }[/math], for any [math]\displaystyle{ s \in \mathcal{S} }[/math] that satisﬁes [math]\displaystyle{ T(s_I) \subseteq T(s) }[/math], we can use follow discussion in Section 4.1 to compute the new belief state. Similarly, for states in [math]\displaystyle{ \mathcal{S} }[/math] that do not satisfy [math]\displaystyle{ T(s_1) \subseteq T(s) }[/math], we assume that the application of this capability does not change anything. With the new belief state, we can continue the state expansion process to expand the c—plan further.

5.2 Planning Heuristic for MAP-MM

In this section, we discuss a planning heuristic that informs us which state should be chosen to expand at any time. We can adapt the heuristic in Equation (11) to address MAP—MM. Given the current planning state [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math], we need to compute [math]\displaystyle{ h(\hat{b}(\mathcal{S})) }[/math]. For each [math]\displaystyle{ s \in \mathcal{S} }[/math], there are three cases:

1) If only capabilities are used afterwards, [math]\displaystyle{ h(s) }[/math] can be computed as in #EQ11Equation (11), except that all capability models must be considered.

2) If only actions are going to be used, [math]\displaystyle{ h(s) }[/math] can be computed based on the relaxed plan heuristic (i.e., ignoring all deleting effects), while considering all robot actions.

3) If both capabilities and actions can be used, [math]\displaystyle{ h(s) }[/math] can be computed as the minimum cost of an action that achieves any variable in [math]\displaystyle{ G_s }[/math]. The ﬁnal [math]\displaystyle{ h(s) }[/math] is chosen as the smallest value among the three cases and [math]\displaystyle{ h(\hat{b}(\mathcal{S})) }[/math] can subsequently be computed.

COROLLARY 1. The heuristic above is admissihleforﬁnding a c-plan for MAP-MM that maxmizes [math]\displaystyle{ }[/math], given that [math]\displaystyle{ \hat{b}(\mathcal{S}) }[/math] accurately represents [math]\displaystyle{ b(\mathcal{S}) }[/math].

5.3 Evaluation

In this section, we describe a simple application of MAP—MM involving a human and a robot, in which the capability model of the human is assumed to be learned by the robot. The robot then makes a multi—agent c—plan for both the human and itself.

The setup of this evaluation is identical to that in the evaluation of single agent planning, as we discussed in the previous section.

In this example, we associate the robot actions with a constant cost. After learning the parameters of the human capability model based on the generated partial plan traces, we apply the capability model to solve a problem as shown in Figure 4. Initially, we have [math]\displaystyle{ On(B 1, B3) }[/math], [math]\displaystyle{ OnTable(B3) }[/math], and [math]\displaystyle{ OnTable(B2) }[/math] (a complete state), and the goal is to achieve [math]\displaystyle{ On(B3, B2) }[/math]. The multi—agent c—plan involves the application of two robot actions and one capability of the human. We show three possible complete states (in the belief state) which satisfy the goal variable after applying the capability.

Figure 4: Illustration of solving a MAP-MNI problem with a robotic and a human agent, in which the robotic agent has an action model and the human agent has a capability model that is assumed to be learned by the robotic agent.

6. RELATED WORK

Most existing approaches for representing the dynamics of agents assume that the models are completely speciﬁed. This holds whether the underlying models are based on STRIPS actions (e.g. PDDL [7]) or stochastic action models (such as RDDL [19]). This assumption of complete knowledge is also the default in the existing multi—agent planning systems [3].

Capability models, in contrast, start with the default of incomplete models. They are thus related to the work on planning with incompletely speciﬁed actions (c.f. [15, 16, 11]). An important difference is that while this line of work models only incompleteness in the precondition/effect descriptions of the individual actions, capabilities are incomplete in that they completely abstract over actual plans that realize them. In this sense, a capability has some similarities to non—primitive tasks in HTN planning [6, 27]. For example, an abstract HTN plan with a single non—primitive task only posits that there exists some concrete realization of the nonprimitive task which will achieve the goal supported by the nonprimitive task. However, in practice, all HTN planners use “complete models” in that they provide all the reduction schemas to take the non—primitive task to its concretization. So, the “uncertainty” here is “angelic” [l2] — the planner can resolve it by the choice of reduction schemas. In contrast, capability models do not have to (and cannot) provide any speciﬁcs about the exact plan with which a capability will be realized.

Capability models also have connections to macro operators [2], as well as options, their MDP counterparts [21, 20, 9], and the BDI models [17]. Capability models are useful when plans must be made with partial knowledge. With complete models, this means that not all actions or macro—actions or options or capabilities in BDI to achieve the goal are provided. None of HTN, SMDP or BDI models can handle the question of what it means to plan when faced with such model incompleteness. Capability models in contrast propose approximate plans (referred to as c—plans) as a useful solution concept in informing the user of how likely there is an actual complete plan.

On the other hand, due to the inherent incompleteness of capability models, they are lossy in the following sense. It is possible to compile a complete model to a capability model (e.g., converting actions in RDDL [19] to a capability model), but new capabilities may also be introduced along with the actions. As a result, the synthesized plans would still be incomplete unless the use of these new capabilities are forcibly restricted. The learning of capability models has connections to learning probabilistic relational models using Bayesian networks [8]. The notion of eventual state captured in capability models is similar to that captured by the F operator (i.e., eventually) in LTL and CTL [5 , 23]. Although there are other works that discuss about capability models, e.g., [4], they are still based on action modeling.

7. CONCLUSIONS AND FUTURE WORK =

In this paper, we have introduced a new representation to model agents based on capabilities. The associated model, called a capability model, is an inherently incomplete model that has several unique advantages compared to traditional complete models (i.e., action models). The underlying structure of a capability model is a generalized 2—TBN, which can encode all the capabilities of an agent. The associated probabilities computed (i.e., via Bayesian inference on the capability model) based on the speciﬁcations of capabilities (i.e., a partial initial state coupled with a partial eventual state) determine how likely the capabilities can be fulﬁlled by an operation (i.e., a complete plan). This information can be used to synthesize incomplete “plans” (referred to as c—plans). One of the unique advantages of capability models is that learning for both model structure and parameters is robust to high degrees of incompleteness in plan execution traces (e.g., with only start and end states). Furthermore, we show that parameter learning for capability models can be performed efﬁciently online via Bayesian learning.

Additionally, we provide the details of using capability models in planning. Compared to traditional models for planning, in a planning state, capability models only suggest transitions between partial states (i.e., speciﬁed by the capabilities), along with the probabilities that specify how likely such transitions can be fulﬁlled by an operation. The limitation is that the synthesized c—plans are incomplete. However, we realize that this is unavoidable for planning with incomplete knowledge (i.e., incomplete models). In such cases, the synthesized c—plans can inform the user how likely complete plans exist when following the “guidelines” of the c—plans. In general, a c—plan with a higher probability of success should imply that a complete plan is more likely to exist. We discuss using capability models for single agent planning ﬁrst, and then extend it to multi—agent planning (with each agent modeled separately by a capability model), in which the capability models of agents are used by a centralized planner. We also discuss how capability models can be mixed with complete models.

In future work, we plan to further investigate the relationships between capability models and traditional models. We also plan to explore applications of capability models in our ongoing work on human—robot teaming [22, 13]. For example, we plan to investigate how to enable robots to learn capability models of humans and plan to coordinate with the consideration of these models.

Acknowledgments

This research is supported in part by the ARC grant W9l lNF— 13— l — 0023, and the ONR grants N00014— 13-1—0176, N00014— 13-1—0519 and N00014-15-l—2027.

Footnotes

↑ In this paper, we consider sequential multi—agent plans; synchronous or joint actions of agents are not considered.
↑ We assume in this paper that time is discretized.
↑ There is no need to expand such traces into sets of traces with complete state observations for learning, since it can be equivalently considered using arithmetic operations.
↑ Note that [math]\displaystyle{ P(s_I \Rightarrow s_E) }[/math] is not equivalent to [math]\displaystyle{ P(s_I \Rightarrow s_{in}). P(s_{in} \Rightarrow s_E). }[/math]
↑ It does not have to be three blocks but we assume only three blocks to simplify the implementation.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 CapabilityModelsandTheirApplica	Yu Zhang Sarath Sreedharan Subbarao Kambhampati			Capability Models and Their Applications in Planning						2015

[1] In this paper, we consider sequential multi—agent plans; synchronous or joint actions of agents are not considered.

[2] We assume in this paper that time is discretized.

[3] There is no need to expand such traces into sets of traces with complete state observations for learning, since it can be equivalently considered using arithmetic operations.

[4] Note that [math]\displaystyle{ P(s_I \Rightarrow s_E) }[/math] is not equivalent to [math]\displaystyle{ P(s_I \Rightarrow s_{in}). P(s_{in} \Rightarrow s_E). }[/math]

[5] It does not have to be three blocks but we assume only three blocks to simplify the implementation.

[1]

[2]

[3]

[4]

[5]