Sequence Alignment Task

From GM-RKB
Jump to navigation Jump to search

A Sequence Alignment Task is a Range-type String Matching Task that requires the identification of strings with a log string distance to a given string.



References

2021a

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Sequence_alignment Retrieved:2021-2-21.
    • In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.[1] Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.

      Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

      (...)

      Very short or very similar sequences can be aligned by hand. However, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort. Instead, human knowledge is applied in constructing algorithms to produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect patterns that are difficult to represent algorithmically (especially in the case of nucleotide sequences). Computational approaches to sequence alignment generally fall into two categories: global alignments and local alignments. Calculating a global alignment is a form of global optimization that "forces" the alignment to span the entire length of all query sequences. By contrast, local alignments identify regions of similarity within long sequences that are often widely divergent overall. Local alignments are often preferable, but can be more difficult to calculate because of the additional challenge of identifying the regions of similarity.[2] A variety of computational algorithms have been applied to the sequence alignment problem. These include slow but formally correct methods like dynamic programming. These also include efficient, heuristic algorithms or probabilistic methods designed for large-scale database search, that do not guarantee to find best matches.

  1. Mount DM. (2004). Bioinformatics: Sequence and Genome Analysis (2nd ed.). Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY. ISBN 978-0-87969-608-5.
  2. Polyanovsky, V. O.; Roytberg, M. A.; Tumanyan, V. G. (2011). "Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences". Algorithms for Molecular Biology. 6 (1): 25. doi:10.1186/1748-7188-6-25. PMC 3223492. PMID 22032267. S2CID 2658261.

2021b

2020

2019

2015

2011

2002

  • (Navarro & Raffinot, 2002) ⇒ Gonzalo Navarro, and Mathieu Raffinot. (2002). “Flexible Pattern Matching in Strings." Cambridge University Press.
    • QUOTE: Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal.

      ... A concept of "distance" between two strings can be defined according to the minimum cost of making them equal.

      ... In some cases it is useful to measure the degree of similarity rather than of dissimilarity (i.e., a distance). One example is the LCS, a heavily studied measure. Other examples are the shortest common supersequence (SCS), longest common substring (LCG, different from LCS because the common string has to be a contiguous substring of both sequences), and shortest common superstring (SCG), as well as their version or more than two strings.

1999

  1. Merkel, M. 1999. Understanding and enhancing translation by parallel text processing. Linköping Studies in Science and Technology. Dissertation No. 607. Linköping University. Dept, of Computer and Information Science.
  2. Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. 1999. Evaluating LWA and UWA. PLUG deliverable 3A.1. Internal report.
  3. Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. forthcoming. Evaluation of Word Alignment Systems. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC-2000, Athens, Greece, 2000.

1981

1974