2003 ComparingTopKLists

(Fagin et al., 2003b) ⇒ Ronald Fagin, Ravi Kumar, and D. Sivakumar. (2003). “Comparing Top k Lists.” In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003).

Subject Headings: Ranking Function, Ranking Algorithm, TF-IDF Ranking Function.

Notes

Cited By

Quotes

Abstract

Motivated by several applications, we introduce various distance measures between "top k lists.” Some of these distance measures are metrics, while others are not. For each of these latter distance measures: we show that it is "almost" a metric in the following two seemingly unrelated aspects:

step-(i) it satisfies a relaxed version of the polygonal (hence, triangle) inequality, and
step-(ii) there is a metric with positive constant multiples that bounds our measure above and below.

This is not a coincidence---we show that these two notions of almost being a metric are the same. Based on the second notion, we define two distance measures to be equivalent if they are bounded above and below by constant multiples of each other. We thereby identify a large and robust equivalence class of distance measures.

References

Thomas Andreae, Hans-Jurgen Bandelt, Performance Guarantees for Approximation Algorithms Depending on Parametrized Triangle Inequalities, SIAM Journal on Discrete Mathematics, v.8 n.1, p.1-16, Feb. 1995 doi:10.1137/S0895480192240226
Michael A. Bender, Chandra Chekuri, Performance guarantees for the TSP with a parameterized triangle inequality, Information Processing Letters, v.73 n.1-2, p.17-21, Jan 2000 doi:10.1016/S0020-0190(99)00160-X
David Carmel, Doron Cohen, Ronald Fagin, Eitan Farchi, Michael Herscovici, Yoelle S. Maarek, Aya Soffer, Static index pruning for information retrieval systems, Proceedings of the 24th ACM SIGIR Conference retrieval, p.43-50, September 2001, New Orleans, Louisiana, United States doi:10.1145/383952.383958
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams, Proceedings of the 29th International Colloquium on Automata, Languages and Programming, p.693-703, July 08-13, 2002
D. E. Critchlow. Metric Methods for Analyzing Partially Ranked Data. Number 34 in Lecture Notes in Statistics. Springer-Verlag, Berlin, 1980.
P. Diaconis. Group Representation in Probability and Statistics. Number 11 in IMS Lecture Series. Institute of Mathematical Statistics, 1988.
P. Diaconis and R. Graham. Spearman's footrule as a measure of disarray. Journal of the Royal Statistical Society, Series B, 39(2):262--268, 1977.
Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th International Conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong doi:10.1145/371920.372165
Ronald Fagin, Ravi Kumar, D. Sivakumar, Comparing Top k Lists, SIAM Journal on Discrete Mathematics, v.17 n.1, p.134-160, 2003 doi:10.1137/S0895480102412856
Ronald Fagin, Larry Stockmeyer, Relaxing the Triangle Inequality in Pattern Matching, International Journal of Computer Vision, v.30 n.3, p.219-231, Dec. 1998 doi:10.1023/A:1008023416823
M. Kendall and J. D. Gibbons. Rank Correlation Methods. Edward Arnold, London, 1990.
Joon Ho Lee, Combining multiple evidence from different properties of weighting schemes, Proceedings of the 18th ACM SIGIR Conference retrieval, p.180-188, July 09-13, 1995, Seattle, Washington, United States doi:10.1145/215206.215358
Joon Ho Lee, Combining Multiple Evidence from Different Relevant Feedback Networks, Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), p.421-430, April 01-04, 1997,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 ComparingTopKLists	Ronald Fagin Ravi Kumar D. Sivakumar			Comparing Top k Lists		Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithm	http://www.almaden.ibm.com/cs/people/fagin/sidma03.pdf			2003