String-based Regular Expression Pattern

Jump to: navigation, search

A string-based regular expression pattern is a string-based pattern expression that is a regular expression pattern (that can accept a string).



    • In computing, a 'regular expression (abbreviated regex or regexp) is a sequence of text characters, some of which are understood to be metacharacters with symbolic meaning, and some of which have their literal meaning, that together can automatically identify textual material of a given pattern, or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern. The pattern sequence itself is an expression that is a statement in a language designed specifically to represent prescribed targets in the most concise and flexible way to direct the automation of text processing of general text files, specific textual forms, or of random input strings. A regular expression patterns a match to a string. It is employed in a search to identify text for further processing, such as displaying the match, or altering it; or it is employed to simply inform of the location or count. The concept arose in the 1950s, when Kleene formalized the description of a regular language, and came into common use with the Unix text processing utilities ed, an editor, and grep (global regular expression print), a filter.

      A very simple use of a regular expression would be to locate the same word spelled two different ways in a text editor, for example seriali[sz]e. A wildcard match can also achieve this, but wildcard matches differ from regular expressions in that wildcards are limited to what they can pattern, (having fewer metacharacters and a simple language-base), whereas regular expressions are not. A usual context of wildcard characters is in globbing a similar names in a list of files, whereas regular expressions are usually employed in applications that pattern-match text strings in general. The simple regexp ^[ \t]+ matches excess whitespace on a line. An advanced regexp used to match a numeral is ^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$ . (See Examples below.)

      A regular expression processor processes a regular expression statement as a grammar expressed in a formal language. With different languages there are different syntax and grammar, and so there exists different systems of regular expressions. The regexp processor compiles the given "code", and with that examines the target text string, parsing it to identify substrings that are members of its language, the regular expressions. Regular expressions are so useful in computing that they should like to become a common standard for everyone, so the various systems of regular expressions have evolved to provide a basic and extended standard for the grammar and syntax. Aficionados use modern regular expressions, which heavily augment the standard, and that are also similar across computing platforms and applications. Regular expression processors are found in some search engines, some of the search and replace dialogs of word processors and text editors, and in the command lines of text processing utilities including sed and AWK.

      Application developers also use regular expressions in their programming languages. Some languages are straightforward and some require extra assignment or compilation steps, or require loading a library. Perl, Ruby, AWK, and Tcl, always offer regular expressions directly in the language, while others offer them in their standard library, as do .NET languages, Java, Python and C++ (since C++11). For most other languages, such as Object Pascal (Delphi), C and earlier versions of C++, libraries that implement regular expressions are available. They all offer virtually the same standard of regular expression grammar and syntax, with only a relatively few syntax exceptions in the higher order grammar.