Average Precision Measure

From GM-RKB
(Redirected from average precision)
Jump to navigation Jump to search

An Average Precision Measure is a ranking performance measure that computes [math]\displaystyle{ \int_0^1 p(r)dr. }[/math], for precision [math]\displaystyle{ p(r) }[/math] as a function of recall [math]\displaystyle{ r }[/math], and [math]\displaystyle{ p(r) }[/math] over the interval from [math]\displaystyle{ r=0 }[/math] to [math]\displaystyle{ r=1 }[/math].

  • AKA: [math]\displaystyle{ \operatorname{AveP}(D) }[/math].
  • Context:
    • It can be defined as <math\sum_{k=1}^n P(k) \Delta r(k)</math> where [math]\displaystyle{ k }[/math] is the rank in the retrieved list, [math]\displaystyle{ n }[/math] is the number of retrieved items, [math]\displaystyle{ P(k) }[/math] is the precision at cut-off [math]\displaystyle{ k }[/math] in the list, and [math]\displaystyle{ \Delta r(k) }[/math] is the change in recall from items [math]\displaystyle{ k-1 }[/math] to [math]\displaystyle{ k }[/math]
  • Counter-Example(s):
  • See: IR Task, Precision Measure, Ranking Task.


References

2013

  • http://en.wikipedia.org/wiki/Information_retrieval#Average_precision
    • Precision and recall are single-value metrics based on the whole list of documents returned by the system. For systems that return a ranked sequence of documents, it is desirable to also consider the order in which the returned documents are presented. By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision [math]\displaystyle{ p(r) }[/math] as a function of recall [math]\displaystyle{ r }[/math]. Average precision computes the average value of [math]\displaystyle{ p(r) }[/math] over the interval from [math]\displaystyle{ r=0 }[/math] to [math]\displaystyle{ r=1 }[/math]:[1] :[math]\displaystyle{ \operatorname{AveP} = \int_0^1 p(r)dr. }[/math] This integral is in practice replaced with a finite sum over every position in the ranked sequence of documents: :[math]\displaystyle{ \operatorname{AveP} = \sum_{k=1}^n P(k) \Delta r(k) }[/math] where [math]\displaystyle{ k }[/math] is the rank in the sequence of retrieved documents, [math]\displaystyle{ n }[/math] is the number of retrieved documents, [math]\displaystyle{ P(k) }[/math] is the precision at cut-off [math]\displaystyle{ k }[/math] in the list, and [math]\displaystyle{ \Delta r(k) }[/math] is the change in recall from items [math]\displaystyle{ k-1 }[/math] to [math]\displaystyle{ k }[/math].

      This finite sum is equivalent to: :[math]\displaystyle{ \operatorname{AveP} = \frac{\sum_{k=1}^n (P(k) \times \operatorname{rel}(k))}{\mbox{number of relevant documents}} \! }[/math] where [math]\displaystyle{ \operatorname{rel}(k) }[/math] is an indicator function equaling 1 if the item at rank [math]\displaystyle{ k }[/math] is a relevant document, zero otherwise.[2] Note that the average is over all relevant documents and the relevant documents not retrieved get a precision score of zero.

      Some authors choose to interpolate the [math]\displaystyle{ p(r) }[/math] function to reduce the impact of "wiggles" in the curve.[3][4] For example, the PASCAL Visual Object Classes challenge (a benchmark for computer vision object detection) computes average precision by averaging the precision over a set of evenly spaced recall levels {0, 0.1, 0.2, … 1.0}  : :[math]\displaystyle{ \operatorname{AveP} = \frac{1}{11} \sum_{r \in \{0, 0.1, \ldots, 1.0\}} p_\operatorname{interp}(r) }[/math] where [math]\displaystyle{ p_\operatorname{interp}(r) }[/math] is an interpolated precision that takes the maximum precision over all recalls greater than [math]\displaystyle{ r }[/math]: :[math]\displaystyle{ p_\operatorname{interp}(r) = \operatorname{max}_{\tilde{r}:\tilde{r} \geq r} p(\tilde{r}) }[/math].

      An alternative is to derive an analytical [math]\displaystyle{ p(r) }[/math] function by assuming a particular parametric distribution for the underlying decision values. For example, a binormal precision-recall curve can be obtained by assuming decision values in both classes to follow a Gaussian distribution.[5]

      Average precision is also sometimes referred to geometrically as the area under the precision-recall curve.[citation needed]

  1. Zhu, Mu (2004). "Recall, Precision and Average Precision". http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf. 
  2. Turpin, Andrew; Scholer, Falk (2006). "User performance versus precision measures for simple search tasks". Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, WA, August 06-11, 2006) (New York, NY: ACM): 11–18. doi:10.1145/1148170.1148176. ISBN 1-59593-369-7. 
  3. Everingham, Mark; Van Gool, Luc; Williams, Christopher K. I.; Winn, John; Zisserman, Andrew (June 2010). "The PASCAL Visual Object Classes (VOC) Challenge". International Journal of Computer Vision (Springer) 88 (2): 303–338. doi:10.1007/s11263-009-0275-4. http://pascallin.ecs.soton.ac.uk/challenges/VOC/pubs/everingham10.pdf. Retrieved 2011-08-29. 
  4. Manning, Christopher D.; Raghavan, Prabhakar; Schütze, Hinrich (2008). Introduction to Information Retrieval. Cambridge University Press. http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html. 
  5. K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010). The binormal assumption on precision-recall curves. Proceedings of the 20th International Conference on Pattern Recognition, 4263-4266.