2003 ModelingAnnotatedData

(Blei & Jordan, 2003) ⇒ David M. Blei, and Michael I. Jordan. (2003). “Modeling Annotated Data.” In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ISBN:1-58113-646-3 doi:10.1145/860435.860460

Subject Headings: Corr-LDA.

Notes

Cited By

Quotes

Author Keywords

probabilistic graphical models, empirical Bayes, variational methods, automatic image annotation, image retrieval

Abstract

We consider the problem of modeling annotated data --- data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type. We conduct experiments on the Corel database of images and captions, assessing performance in terms of held-out likelihood, automatic annotation, and text-based image retrieval.

1. INTRODUCTION

Traditional methods of information retrieval are organized around the representation and processing of a document in a (high-dimensional) word-space. Modern multimedia documents, however, are not merely collections of words, but can be collections of related text, images, audio, and cross-references. When working with a corpus of such documents, there is much to be gained from representations which can explicitly model associations among the diff t types of data.

In this paper, we consider probabilistic models for documents that consist of pairs of data streams. Our focus is on problems in which one data type can be viewed as an annotation of the other data type. Examples of such data include images and their captions, papers and their bibliographies, and genes and their functions. In addition to the traditional goals of retrieval, clustering, and classification, annotated data lends itself to tasks such as automatic data annotation and retrieval of unannotated data from annotation-type queries.

A number of recent papers have considered generative probabilistic models for such multi-type or relational data [2, 6, 4, 13]. These papers have generally focused on models that jointly cluster the different data types, basing the clustering on latent variable representations that capture low-dimensional probabilistic relationships among interacting sets of variables.

In many annotation problems, however, the overall goal appears to be that of finding a conditional relationship between types, and in such cases improved performance may be found in methods with a more discriminative fl vor. In particular, the task of annotating an unannotated image can be viewed formally as a classification problem — for each word in the vocabulary we must make a yes/no decision. Standard discriminative classification methods, however, generally make little attempt to uncover the probabilistic structure of either the input domain or the output domain. This seems ill-advised in the image/word setting — surely there are relationships among the words labeling an image, and these relationships reflect corresponding relationships among the regions in that image. Moreover, it seems likely that capturing these relationships would be helpful in annotating new images. With these issues in mind, we approach the annotation problem within a framework that exploits the best of both the generative and the discriminative traditions.

In this paper, we build a set of increasingly sophisticated models for a database of annotated images, culminating in correspondence latent Dirichlet allocation (Corr-LDA), a model that fi conditional relationships between latent variable representations of sets of image regions and sets of words. We show that, in this class of models, only Corr-LDA succeeds in providing both an excellent fi of the joint data and an eff e conditional model of the caption given an image. We demonstrate its use in automatic image annotation, automatic region annotation, and text-based image retrieval.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 ModelingAnnotatedData	Michael I. Jordan David M. Blei			Modeling Annotated Data				10.1145/860435.860460		2003