- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. (2008). “Introduction to Information Retrieval." Cambridge University Press. ISBN:0521865719.
As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from information retrieval systems. Of course, in that time period, most people also used human travel agents to book their travel. However, during the last decade, relentless optimization of information retrieval effectiveness has driven web search engines to new quality levels where most people are satisfied most of the time, and web search has become a standard and often preferred source of information finding. For example, the 2004 Pew Internet Survey (Fallows, 2004) found that “92% of Internet users say the Internet is a good place to go for getting everyday information. To the surprise of many, the field of information retrieval has moved from being a primarily academic discipline to being the basis underlying most people's preferred means of information access. This book presents the scientific underpinnings of this field, at a level accessible to graduate students as well as advanced undergraduates.
Information retrieval did not begin with the Web. In response to various challenges of providing information access, the field of information retrieval evolved to give principled approaches to searching various forms of content. The field began with scientific publications and library records, but soon spread to other forms of content, particularly those of information professionals, such as journalists, lawyers, and doctors. Much of the scientific research on information retrieval has occurred in these contexts, and much of the continued practice of information retrieval deals with providing access to unstructured information in various corporate and governmental domains, and this work forms much of the foundation of our book.
Nevertheless, in recent years, a principal driver of innovation has been the World Wide Web, unleashing publication at the scale of tens of millions of content creators. This explosion of published information would be moot if the information could not be found, annotated and analyzed so that each user can quickly find information that is both relevant and comprehensive for their needs. By the late 1990s, many people felt that continuing to index the whole Web would rapidly become impossible, due to the Web's exponential growth in size. But major scientific innovations, superb engineering, the rapidly declining price of computer hardware, and the rise of a commercial underpinning for web search have all conspired to power today's major search engines, which are able to provide high-quality results within subsecond response times for hundreds of millions of searches a day over billions of web pages.
Table of Contents
|Front matter (incl. table of notations)|
|01||Boolean retrieval||pdf html|
|02||The term vocabulary & postings lists||pdf html|
|03||Dictionaries and tolerant retrieval||pdf html|
|04||Index construction||pdf html|
|05||Index compression||pdf html|
|06||Scoring, term weighting & the vector space model||pdf html|
|07||Computing scores in a complete search system||pdf html|
|08||Evaluation in information retrieval||pdf html|
|09||Relevance feedback & query expansion||pdf html|
|10||XML retrieval||pdf html|
|11||Probabilistic information retrieval||pdf html|
|12||Language models for information retrieval||pdf html|
|13||Text classification & Naive Bayes||pdf html|
|14||Vector space classification||pdf html|
|15||Support vector machines & machine learning on documents||pdf html|
|16||Flat clustering||pdf html||html|
|17||Hierarchical clustering||pdf html|
|18||Matrix decompositions & latent semantic indexing||pdf html|
|19||Web search basics||pdf html|
|20||Web crawling and indexes||pdf html|
|21||Link analysis||pdf html|
|Bibliography & Index|
- (Baeza-Yates & Ribeiro-Neto 1999) ⇒ Ricardo Baeza-Yates, and Berthier Ribeiro-Neto. (1999). “Modern Information Retrieval." Addison-Wesley. ISBN:020139829X.