JCL Help:TURESearch

From Project JEDI Wiki
Jump to navigationJump to search

Class Hierarchy

TSearchEngine
|
TURESearch


Summary

The summary for this help topic does not exist, edit this page

Pascal

 public TURESearch = class(TSearchEngine);


Description

TURESearch is a Unicode Regular Expression (URE) search implementation. The class handles low and high surrogates, case-(in)sensitivity, can ignore non-spacing characters and optionally returns whole words only. Assumptions:

  • Regular expression and text already normalized.
  • Conversion to lower case assumes a 1-1 mapping.

Definitions:

  • Separator - any one of U+2028, U+2029, NL, CR.

Operators:

Operator Description
. match any character
* match zero or more of the last subexpression
+ match one or more of the last subexpression
? match zero or one of the last subexpression
() subexpression grouping
{m, n} match at least m occurences and up to n occurences - Note: both values can be 0 (zero) or ommitted which denotes an unlimiting bound {,} and {0,} and {0, 0} corresponds to * {, 1} and {0, 1} corresponds to ? {1,} and {1, 0} corresponds to +
{m} match exactly m occurences


Literals and Constants:

Literal/Constant Description
c literal UCS2 character
x.... hexadecimal number of up to 4 digits
X.... hexadecimal number of up to 4 digits
u.... hexadecimal number of up to 4 digits
U.... hexadecimal number of up to 4 digits


Character classes:

  • [...] denotes a character class.
  • [^...] denotes a negated character class.
  • Character classes can contain literals, POSIX classes and/or character property classes.
  • POSIX classes are delimited by :POSIX:
  • Character property classes are p or P (the latter denotes a negated property class) followed by a comma separated list of integers between 0 and the maximum entry index in TCharacterCategory. These integers directly correspond to the TCharacterCategory enumeration entries.

This implementation is an improved translation from the URE package written by Mark Leisher (mleisher@crl.nmsu.edu) who used a variation of the RE->DFA algorithm done by Mark Hopkins (markh@csd4.csd.uwm.edu).


About

Unit

JclUnicode


Navigation

Donator

Mike Lischke


Contribute to this help topic

This documentation wiki is based on the collaborative effort of Project JEDI users. Your edits are welcome in order to improve documentation quality: edit this page