String Matching Algorithm: Difference between revisions

Revision as of 15:32, 19 August 2021

A string matching algorithm is a search algorithm that can be applied by a string matching system (to solve a string matching task).

AKA: String Pattern Search Algorithm, String-Searching Algorithm.
Context:
- It can range from being an Exact String Matching Algorithm to being an Approximate String Matching Algorithm.
Example(s):
- an Exact String Matching Algorithm such as:
  - Naïve String Matching Algorithm 0 (no preprocessing) Θ((n-m+1) m)
  - Aho-Corasick String Matching Algorithm.
  - Rabin-Karp String Matching Algorithm Θ(m) average Θ(n+m), worst Θ((n-m+1) m)
  - Finite State Automaton String Matching Algorithm Θ(m |Σ|) Θ(n)
  - Knuth-Morris-Pratt String Matching Algorithm Θ(m) Θ(n)
  - Boyer-Moore String Matching Algorithm Θ(m + |Σ|) Ω(n/m), O(n)
  - Bitap String Matching Algorithm (shift-or, shift-and, Baeza-Yates-Gonnet) Θ(m + |Σ|) O(mn)
- an Approximate String Matching Algorithm such as:
  - Smith-Waterman Sequence Alignment Algorithm.
  - Needleman-Wunsch Sequence Alignment Algorithm.
- …
Counter-Example(s):
See: Alphabet, Pattern Recognition Task, Information Retrieval Task, Natural Language Processing Algorithm.

References

2020

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/String-searching_algorithm Retrieved:2020-2-23.
- In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.
  A basic example of string searching is when the pattern and the searched text are arrays of elements of an alphabet (finite set) Σ. Σ may be a human language alphabet, for example, the letters A through Z and other applications may use a binary alphabet (Σ = {0,1}) or a DNA alphabet (Σ = {A,C,G,T}) in bioinformatics.
  In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a variable-width encoding is in use, then it may be slower to find the Nth character, perhaps requiring time proportional to N. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.

2017

(Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). "String Matching Algorithm". In: (Sammut & Webb, 2017). DOI:10.1007/978-1-4899-7687-1_791.
- QUOTE: A string matching algorithm returns parts of text matching a given pattern, such as a regular expression. Such algorithms have countless applications, from file editing to bioinformatics. Many algorithms compute deterministic finite automata, which can be expensive to build, but are usually efficient to use; they include the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm, that build the automaton in time $O(m)$ and $O(m + s)$, respectively, where $m$ is the length of the pattern and $s$ the size of the alphabet, and match a text of length $n$ in time $O(n)$ in the worst case.

2002

(Navarro & Raffinot, 2002) ⇒ Gonzalo Navarro, and Mathieu Raffinot. (2002). “Flexible Pattern Matching in Strings." Cambridge University Press.

1987

(Karp & Rabin, 1987) ⇒ Richard M. Karp, and Michael O. Rabin. (1987). “Efficient Randomized Pattern-Matching Algorithms.” In: IBM Journal of Research and Development, 31(2).

1977

(Knuth et al., 1977) ⇒ Donald E. Knuth, James H. Morris, Jr., and Vaughan R. Pratt. (1977). “Fast Pattern Matching in Strings.” In: SIAM Journal on Computing, 6(2).

1975

(Aho & Corasick, 1975) ⇒ Alfred V. Aho, and Margaret J. Corasick. (1975). “Efficient String Matching: An aid to bibliographic search.” In: Communications of the ACM, 18(6). doi:10.1145/360825.360855

@@ Line 29: / Line 29: @@
 === 2020 ===
 * (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/String-searching_algorithm Retrieved:2020-2-23.
-** In [[computer science]], '''string-searching algorithms''', sometimes called '''string-matching algorithms''', are an important class of [[string algorithms]] that try to find a place where one or several [[string (computer science)|strings]] (also called patterns) are found within a larger string or text.         <P>	 A basic example of string searching is when the pattern and the searched text are [[Array data structure|arrays]] of elements of an [[Alphabet (computer science)|alphabet]] ([[finite set]]) Σ. Σ may be a human language alphabet, for example, the letters ''A'' through ''Z'' and other applications may use a ''binary alphabet'' (Σ = {0,1}) or a ''DNA alphabet'' (Σ = {A,C,G,T}) in [[bioinformatics]].         <P>	 In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a [[variable-width encoding]] is in use, then it may be slower to find the ''N''th character, perhaps requiring time proportional to ''N''. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.
+** In [[computer science]], '''string-searching algorithms''', sometimes called '''string-matching algorithms''', are an important class of [[string algorithms]] that try to find a place where one or several [[string (computer science)|strings]] (also called patterns) are found within a larger string or text.         <P>        A basic example of string searching is when the pattern and the searched text are [[Array data structure|arrays]] of elements of an [[Alphabet (computer science)|alphabet]] ([[finite set]]) Σ. Σ may be a human language alphabet, for example, the letters ''A'' through ''Z'' and other applications may use a ''binary alphabet'' (Σ = {0,1}) or a ''DNA alphabet'' (Σ = {A,C,G,T}) in [[bioinformatics]].         <P>        In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a [[variable-width encoding]] is in use, then it may be slower to find the ''N''th character, perhaps requiring time proportional to ''N''. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.
 === 2017 ===

String Matching Algorithm: Difference between revisions

Revision as of 15:32, 19 August 2021

References

2020

2017

2002

1987

1977

1975

Navigation menu

Search