2003 ComparingTopKLists

Jump to navigation Jump to search

Subject Headings: Ranking Function, Ranking Algorithm, TF-IDF Ranking Function.


Cited By



Motivated by several applications, we introduce various distance measures between "top k lists." Some of these distance measures are metrics, while others are not. For each of these latter distance measures: we show that it is "almost" a metric in the following two seemingly unrelated aspects:

  • step-(i) it satisfies a relaxed version of the polygonal (hence, triangle) inequality, and
  • step-(ii) there is a metric with positive constant multiples that bounds our measure above and below.

This is not a coincidence---we show that these two notions of almost being a metric are the same. Based on the second notion, we define two distance measures to be equivalent if they are bounded above and below by constant multiples of each other. We thereby identify a large and robust equivalence class of distance measures.


  • Thomas Andreae, Hans-Jurgen Bandelt, Performance Guarantees for Approximation Algorithms Depending on Parametrized Triangle Inequalities, SIAM Journal on Discrete Mathematics, v.8 n.1, p.1-16, Feb. 1995 doi:10.1137/S0895480192240226
  • Michael A. Bender, Chandra Chekuri, Performance guarantees for the TSP with a parameterized triangle inequality, Information Processing Letters, v.73 n.1-2, p.17-21, Jan 2000 doi:10.1016/S0020-0190(99)00160-X
  • David Carmel, Doron Cohen, Ronald Fagin, Eitan Farchi, Michael Herscovici, Yoelle S. Maarek, Aya Soffer, Static index pruning for information retrieval systems, Proceedings of the 24th ACM SIGIR Conference retrieval, p.43-50, September 2001, New Orleans, Louisiana, United States doi:10.1145/383952.383958
  • Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams, Proceedings of the 29th International Colloquium on Automata, Languages and Programming, p.693-703, July 08-13, 2002
  • D. E. Critchlow. Metric Methods for Analyzing Partially Ranked Data. Number 34 in Lecture Notes in Statistics. Springer-Verlag, Berlin, 1980.
  • P. Diaconis. Group Representation in Probability and Statistics. Number 11 in IMS Lecture Series. Institute of Mathematical Statistics, 1988.
  • P. Diaconis and R. Graham. Spearman's footrule as a measure of disarray. Journal of the Royal Statistical Society, Series B, 39(2):262--268, 1977.
  • Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th International Conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong doi:10.1145/371920.372165
  • Ronald Fagin, Ravi Kumar, D. Sivakumar, Comparing Top k Lists, SIAM Journal on Discrete Mathematics, v.17 n.1, p.134-160, 2003 doi:10.1137/S0895480102412856
  • Ronald Fagin, Larry Stockmeyer, Relaxing the Triangle Inequality in Pattern Matching, International Journal of Computer Vision, v.30 n.3, p.219-231, Dec. 1998 doi:10.1023/A:1008023416823
  • M. Kendall and J. D. Gibbons. Rank Correlation Methods. Edward Arnold, London, 1990.
  • Joon Ho Lee, Combining multiple evidence from different properties of weighting schemes, Proceedings of the 18th ACM SIGIR Conference retrieval, p.180-188, July 09-13, 1995, Seattle, Washington, United States doi:10.1145/215206.215358
  • Joon Ho Lee, Combining Multiple Evidence from Different Relevant Feedback Networks, Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), p.421-430, April 01-04, 1997,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 ComparingTopKListsRonald Fagin
Ravi Kumar
D. Sivakumar
Comparing Top k ListsProceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithmhttp://www.almaden.ibm.com/cs/people/fagin/sidma03.pdf2003