2002 FlexiblePatternMatchingInStrings

Subject Headings: String Pattern Matching Task, String Pattern Matching Algorithm, Multiple String Matching, Extended String Matching, Regular Expression Matching, Approximate String Matching.

Notes

String matching problems range from the relatively simple task of searching a single text for a string of characters to searching a database for approximate occurrences of a complex pattern. Recent years have witnessed a dramatic increase of interest in sophisticated string matching problems, especially in information retrieval and computational biology. This book presents a practical approach to string matching problems, focusing on the algorithms and implementations that perform best in practice. It covers searching for simple, multiple and extended strings, as well as regular expressions, and exact and approximate searching. It includes all the most significant new developments in complex pattern searching. The clear explanations, step-by-step examples, algorithm pseudocode, and implementation efficiency maps will enable researchers, professionals and students in bioinformatics, computer science, and software engineering to choose the most appropriate algorithms for their applications.

Approximate string matching, also called "string matching allowing errors," is the problem of finding a pattern [math]\displaystyle{ p }[/math] in a text [math]\displaystyle{ T }[/math] when a limited number [math]\displaystyle{ k }[/math] of differences is permitted between the pattern and its occurrences in the text.
…
We call α=k/m the "error level.” It gives a measure of the "fraction" of the pattern that can be altered.

Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal.
…
A concept of "distance" between two strings can be defined according to the minimum cost of making them equal.
…
In some cases it is useful to measure the degree of similarity rather than of dissimilarity (i.e., a distance). One example is the LCS, a heavily studeis measure. Other examples are the shortest common supersequence (SCS), longest common substring (LCG, different from LCS because the common string has to be a contiguous substring of both sequences), and shortest common superstring (SCG), as well as their version or more than two strings.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2002 FlexiblePatternMatchingInStrings	Mathieu Raffinot Gonzalo Navarro			Flexible Pattern Matching in Strings			http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521813077			2002