How to use NTS interface

The Search function of the NTS interface supports POSIX regular expressions. The most useful patterns for corpus research are explained below.

Note: You can choose regular expression search in the SEARCH TYPE box.

"A regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string." (source)

Basic Patterns
Pattern Description Example Matches
. Any single character r.n run, ran, r1n
* Zero or more of the preceding element .*able able, table, capable
+ One or more of the preceding element .+able table, capable — but not able
? Zero or one of the preceding element (makes it optional) colou?r color, colour
{m,n} Between m and n repetitions of the preceding element .{1,4}able cable, table, capable
| Alternation (OR) was|were was, were
( … ) Group sub-expression (treated as a single unit) (pre|re)view preview, review
[ … ] Match any one character in the set; use - for ranges [aeiou] any single vowel
[^ … ] Match any one character not in the set [^aeiou] any single non-vowel character
Anchors & Boundaries
Pattern Description Example Matches
^ Beginning of string ^the "the" only at the start of a token
$ End of string ing$ tokens ending in "ing"
^word$ Exact match (anchored at both ends) ^the$ "the" and nothing else
Character Class Shorthands
Shorthand Description Equivalent
\d Any digit [[:digit:]] i.e. [0-9]
\D Any non-digit [^[:digit:]]
\w Any word character (letter, digit, underscore) [[:word:]] i.e. [A-Za-z0-9_]
\W Any non-word character [^[:word:]]
\s Any whitespace character [[:space:]]
\S Any non-whitespace character [^[:space:]]
POSIX Character Classes (use inside brackets)
Class Description Example
[[:alpha:]] Any letter [[:alpha:]]+ — one or more letters
[[:digit:]] Any digit (0–9) [[:digit:]]{4} — exactly four digits (e.g. a year)
[[:lower:]] Any lower-case letter ^[[:lower:]]+$ — all-lowercase tokens
[[:upper:]] Any upper-case letter ^[[:upper:]]+$ — all-uppercase tokens (e.g. acronyms)
[[:punct:]] Any punctuation character [[:punct:]] — matches !, ?, , etc.
[[:alnum:]] Any letter or digit ^[[:alnum:]]+$ — tokens with no punctuation
Practical Examples for Corpus Linguistics
Task Pattern Explanation
Words ending in -ing .*ing$ Any characters followed by "ing" at end
Words starting with un- ^un.* Starts with "un" followed by anything
Spelling variation: color/colour colou?r The "u" is optional
Suffix alternation: -ise/-ize .*i[sz]e$ Either "s" or "z" before final "e"
Personal pronouns ^(I|me|my|mine)$ Exact match of any listed form
Contracted forms with apostrophe .*n't$ don't, won't, can't, etc.
Words of exactly 3 letters ^[[:alpha:]]{3}$ Exactly three letters, anchored
Tokens containing digits \d Any token with at least one digit
ALL CAPS tokens (e.g. acronyms) ^[[:upper:]]{2,}$ Two or more uppercase letters only
Verb forms: go/goes/going/gone/went ^(go|goes|going|gone|went)$ Alternation with anchors for exact match
Phrasal verb particle in context position ^(up|out|off|on|in|down|away|back)$ Common particles (use in context filter with regex toggle)
Reduplicated forms ^(.+)\1$ Back reference: same sequence repeated (e.g. "mama", "byebye")

You can also toggle regex mode for individual context positions (L1–L5, R1–R5), allowing you to use patterns like ^(the|a|an)$ to match determiners in a specific slot.

For the full PostgreSQL regular expression reference, see the documentation. If you need help constructing a complex pattern, please contact us.

Contact: