PageRank Algorithm
Jump to navigation
Jump to search
A PageRank Algorithm is a Directed Graph Node Ranking Algorithm that efficiently calculates a directed graph node's PageRank Value (random arrival likelihood score).
- Context:
- It was proposed in (Page et al., 1998)
- It was originally proposed for a Webpage Retrieval System.
- It requires the calculation of a PageRank Vector.
- It can determine the fraction of time that the random walker (e.g. web surfer) would spend at the page if the random process were iterated ad infinitum.
- Counter-Exsample(s):
- See: Random Walk, Stochastic Matrix, Eigenvector Centrality, Markov Process, MCMC Algorithm, Personalized PageRank.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/PageRank#Algorithm Retrieved:2015-3-12.
- The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank.
- The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.
2012
- (Wikipedia - PageRank, 2011-Jun-11) ⇒ http://en.wikipedia.org/wiki/PageRank
- PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element $E$ is referred to as the PageRank of E and denoted by [math]\displaystyle{ PR(E). }[/math] The name "PageRank" is a trademark of Google, and the PageRank process has been patented (Template:US patent).
2011
- (Franceschet, 2011) ⇒ Massimo Franceschet. (2011). “PageRank: standing on the shoulders of giants.” In: Communications of the ACM, 54(6) doi:10.1145/1953122.1953146
- QUOTE: We start by providing an intuitive interpretation of PageRank in terms of random walks on graphs. The Web is viewed as a directed graph of pages connected by hyperlinks. A random surfer starts from an arbitrary page and simply keeps clicking on successive links at random, bouncing from page to page. The PageRank value of a page corresponds to the relative frequency the random surfer visits that page, assuming that the surfer goes on infinitely. The more time spent by the random surfer on a page, the higher the PageRank importance of the page. ...
2009
- (Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN:1420089641
2006
- (Langville & Meyer, 2006) ⇒ Amy N. Langville, and Carl Dean Meyer. (2006). “Page Rank and Beyond." Princeton University Press. ISBN:0691122024
2005
- (Getoor & Diehl, 2005) ⇒ Lise Getoor, Christopher P. Diehl. (2005). “Link Mining: A survey.” In: SIGKDD Explorations, 7(2). doi:10.1145/1117454.1117456
- QUOTE: In the context of web information retrieval, the PageRank [91] and HITS [64] algorithms are the most notable approaches to LBR. PageRank models web surfing as a random walk where the surfer randomly selects and follows links and occasionally jumps to a new web page to start another traversal of the link structure. The rank of a given web page in this context is the fraction of time that the random web surfer would spend at the page if the random process were iterated ad infinitum. This can be determined by computing the steady-state distribution of the random process. HITS assumes a slightly more complex process, modeling the web as being composed of two types of web pages: hubs and authorities. Hubs are web pages that link to many authoritative pages. Authorities are web pages that are linked to by many hubs. Each page in the web is assigned hub and authority scores. These scores are computed by an iterative algorithm that updates the scores of a page based on the scores of pages in its immediate neighborhood. This approach bears a relation to PageRank with two separate random walks - one with hub transitions and one with authority transitions - on a corresponding bipartite graph of hubs and authorities [73; 95; 84]. The hub and authority scores are the steady-state distributions of the respective random processes.
2002
- (Haveliwala, 2002) ⇒ Taher H. Haveliwala. (2002). “Topic-sensitive PageRank.” In: Proceedings of the 11th International Conference on World Wide Web. doi:10.1145/511446.511513
- QUOTE: In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query.
2001
- (Ng et al., 2001) ⇒ Andrew Y. Ng, Alice X. Zheng, Michael I. Jordan. (2001). “Stable Algorithms for Link Analysis.” In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2001). doi:10.1162/153244302760185243
- QUOTE: The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for identifying “authoritative or “influential articles, given hyperlink or citation information.
1998
- (Page et al., 1998) ⇒ Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. (1998). “The PageRank Citation Ranking: Bringing order to the web." Technical Report, Stanford University.