2010 LargeScaleImageAnnotationLearni

(Weston et al., 2010) ⇒ Jason Weston, Samy Bengio, and Nicolas Usunier. (2010). “Large Scale Image Annotation: Learning to Rank with Joint Word-image Embeddings.” In: Machine Learning Journal, 81(1). doi:10.1007/s10994-010-5198-3

Subject Headings: WARP Loss.

Notes

Cited By

Quotes

Large scale; Image annotation; Learning to rank.

Abstract

Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the newly introduced " sibling " precision metric, where our method also obtains excellent results.

1 Introduction

The emergence of the web as a tool for sharing information has caused a massive increase in the size of potential datasets available for machines to learn from. Millions of images on web pages have tens of thousands of possible annotations in the form of HTML tags (which can be conveniently collected by querying search engines, Torralba et al. 2008a ), tags such as in www.flickr.com , or human-curated labels such as in www.image-net.org (Deng et al. 2009 ). We therefore need machine learning algorithms for image annotation that can scale to learn from and annotate such data. This includes: (i) scalable training and testing times, and (ii) scalable memory usage. In the ideal case we would like a fast algorithm that fits on a laptop, at least at annotation time. For many recently proposed models tested on small datasets, e.g. Makadia et al. ( 2008 ), Guillaumin et al. ( 2009 ), it is unclear if they satisfy these constraints.

In this work we study feasible methods for just such a goal. We consider models that learn to represent images and annotations jointly in a low dimension embedding space. Such em- beddings are fast at testing time because the low dimension implies fast computations for ranking annotations. Simultaneously, the low dimension also implies small memory usage. To obtain good performance for such a model, we propose to train its parameters by learning to rank, optimizing for the top annotations in the list, e.g. optimizing precision at k. Unfortunately, such measures can be costly to train. To make training time efficient we propose the WARP loss (Weighted Approximate-Rank Pairwise loss). The WARP loss is related to the recently proposed Ordered Weighted Pairwise Classification (OWPC) loss (Usunier et al. 2009) which has been shown to be state-of-the-art on (small) text retrieval tasks. WARP uses stochastic gradient descent and a novel sampling trick to approximate ranks resulting in an efficient online optimization strategy which we show is superior to standard stochastic gradient descent applied to the same loss, enabling us to train on datasets that do not even fit in memory. Moreover, WARP can be applied to our embedding models (in fact, to arbitrary differentiable models) whereas the OWPC loss, which relies on SVM_struct cannot.

…

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2010 LargeScaleImageAnnotationLearni	Jason Weston Nicolas Usunier Samy Bengio			Large Scale Image Annotation: Learning to Rank with Joint Word-image Embeddings				10.1007/s10994-010-5198-3		2010