Pattern Description . Matches any character except newline. [a-z0-9] Matches any single character of set. [^a-z0-9] Matches any single character not in set. \d Matches a digit. Same as [0-9]. \D Matches a non-digit. Same as [^0-9]. \w Matches an alphanumeric (word) character --
[a-zA-Z0-9_]. \W Matches a non-word character [^a-zA-Z0-9_]. \s Matches a whitespace character (space, tab,
newline, etc.). \S Matches a non-whitespace character. \n Matches a newline (line feed). \r Matches a return. \t Matches a tab. \f Matches a form feed. \b Matches a backspace. \0 Matches a null character. \000 Also matches a null character because of the
following: \nnn Matches an ASCII character of that octal value. \xnn Matches an ASCII character of that hexadecimal
value. \cX Matches an ASCII control character. \metachar Matches the meta-character (e.g., \, ., |). (abc) Used
to create subexpressions. Remembers the match for later backreferences.
Referenced by replacement patterns that use \1, \2, etc. \1, \2,… Matches whatever first (second, and so on) of
parentheses matched. x? Matches 0 or 1 x's, where x is any of above. x* Matches 0 or more x's. x+ Matches 1 or more x's. (x+?) Turns greediness off so that the minimum number is
matched before moving to the next part. x{m,n} Matches at least m x's, but no more than n. abc Matches all of a, b, and c in order. a|b|c Matches one of a, b, or c. \b Matches a word boundary (outside [] only). \B Matches a non-word boundary. ^ Anchors match to the beginning of a line or string. $ Anchors match to the end of a line or string.
Pattern | Comment |
(?:pattern) | For grouping without creating backreferences |
(?=pattern) | A zero-width positive look-ahead assertion. For example, \w+(?=\t) matches a word followed by a tab, without including the tab in $&. |
(?!pattern) | A zero-width negative look-ahead assertion. For example foo(?!bar)/matches any occurrence of "foo" that isn't followed by "bar". |
(?<=pattern) | A zero-width positive look-behind assertion. For example, (?<=\t)\w+ matches a word that follows a tab, without including the tab in $&. Works only for fixed-width look-behind. |
(?<!pattern) | A zero-width negative look-behind assertion. For example (?<!bar)foo matches any occurrence of "foo" that does not follow "bar". Works only for fixed-width look-behind. |
At this time, TextSpresso only supports the \nn and \name syntax for specifying parts of a pattern to be used in the replace. In addition, you can specify the entire match with the & character. Using an additional \, you can include a literal '&', '\n', or '\name'. For example, given the find pattern (a+)(b)(c) matched against text aaabc:
Pattern | Result | Explanation |
\1\3 | aaac | Builds the replace from the 1st and 3rd parts of the found text. |
\1 & \3 | aaa aaabc c | Includes two spaces and the entire match in the middle. |
\\1\&\2 | \1&b | The \ is used to escape \1 and & so that they are included literally. \2 matches the second part of the pattern. |
How To Build It
Enter
the text you want to find in the Find
Pattern
field. The search text must conform to the specifications of the
regular expression syntax. Note that in TextSpresso you can use the
String Code Editor to enter UTF8 characters in addition to the regular
expression syntax which allows you to enter a character's octal or
hexadecimal code.
Enter the replacement text in the Replace Pattern
field. Note that while regular expressions are fairly standard across
software, replace patterns tend to vary in syntax and functionality. At
this time TextSpresso only supports the syntax and features specified
above.
If you will not be using the match or any of
its parts in the replacement, you can speed up the operation by
checking Simple replace?
Simple replace uses the same optimized replace engine as TextSpresso's
other filters and can process large amounts of text much faster. With
simple replace turned on:
Pattern | Comment |
* | Retains the character (not byte) in the same position from the matched text. No character if there is no matched text in the same numerical position. |
\* | Includes a literal * in the replacement text. |
\\ | Includes a literal \ in the replacement text. |
About The Text Fields
Text
fields for text used by the filter are specially designed to support
the display and editing of all characters. This includes control
characters which are not editable in a normal field. Control characters
are displayed as UTF codes surrounded by dashes. For example, a line
feed character is displayed as "-UTF10-".
You can
precisely
insert, view, and edit any character in Unicode by clicking the label
above a field. This will display the String Code Editor. The String
Code Editor displays all characters in the field and their UTF codes in
a grid. Using this grid you can type characters directly or indirectly
by entering Unicode values. By displaying and allowing you to edit the
UTF codes, the String Code Editor makes it possible to include and edit
characters which cannot be displayed in the System font and/or cannot
normally be displayed (i.e. control characters).
Regular Expressions vs. Pattern
Filters
Regular
expressions and TextSpresso patterns can both match complex, variable
text. TextSpresso patterns were originally designed to be easy for a
novice to understand, predict, enter, and use. They therefore have a
simple syntax with no characters that have a special meaning,
and
they are edited using a graphical editor. Novice users don't have to
remember any special rules or look up specific syntax to build their
patterns.
Regular
expressions are the opposite in that they have a complex syntax where
many characters have a special or double meaning, and they are entered
without the assistance of a graphical editor. If you're used to the
regular expression syntax, you can enter a pattern faster by typing it
than by using a TextSpresso pattern and its graphical editor. But even
expert users are sometimes unable to predict exactly how a regular
expression will match.
From a functionality stand
point, regular
expressions can do things that normal TextSpresso patterns cannot, such
as parsing a match into sub parts and using those parts in the
replacement. But TextSpresso patterns are often faster for searching
and replacing, though this is only noticeable with large amounts of
text. TextSpresso patterns are also used by some filters, such as the
Sort filter type.
You should feel free to use
whatever filter
type/pattern matching you feel most comfortable with. That's why
TextSpresso offers both engines.
Notes
For
regular expression searching, TextSpresso uses a modified version of
the PCRE library, which is open source software written by Philip
Hazel, and copyright by the University of Cambridge, England. You can
learn more about PCRE and download the source code at:
http://www.pcre.org/
Special thanks to Philip Hazel
and the
University of Cambridge for making this library available as open
source under the BSD license.