How to use NTS interface

The Search function of the NTS interface supports POSIX regular expressions. The most useful patterns for corpus research are explained below.

Note: You can choose regular expression search in the SEARCH TYPE box.

"A regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string." (source)

Basic Patterns

Pattern	Description	Example	Matches
`.`	Any single character	`r.n`	run, ran, r1n
`*`	Zero or more of the preceding element	`.*able`	able, table, capable
`+`	One or more of the preceding element	`.+able`	table, capable — but not able
`?`	Zero or one of the preceding element (makes it optional)	`colou?r`	color, colour
`{m,n}`	Between m and n repetitions of the preceding element	`.{1,4}able`	cable, table, capable
`\|`	Alternation (OR)	`was\|were`	was, were
`( … )`	Group sub-expression (treated as a single unit)	`(pre\|re)view`	preview, review
`[ … ]`	Match any one character in the set; use `-` for ranges	`[aeiou]`	any single vowel
`[^ … ]`	Match any one character not in the set	`[^aeiou]`	any single non-vowel character

Anchors & Boundaries

Pattern	Description	Example	Matches
`^`	Beginning of string	`^the`	"the" only at the start of a token
`$`	End of string	`ing$`	tokens ending in "ing"
`^word$`	Exact match (anchored at both ends)	`^the$`	"the" and nothing else
`^(w1\|w2\|w3)$`	Match any one of several whole words (alternation with anchors). Use this for all forms of a lemma, spelling variants, or to compare multiple words in a single search.	`^(say\|said\|says\|saying)$`	say, said, says, saying — and nothing else

Character Class Shorthands

Shorthand	Description	Equivalent
`\d`	Any digit	`[[:digit:]]` i.e. `[0-9]`
`\D`	Any non-digit	`[^[:digit:]]`
`\w`	Any word character (letter, digit, underscore)	`[[:word:]]` i.e. `[A-Za-z0-9_]`
`\W`	Any non-word character	`[^[:word:]]`
`\s`	Any whitespace character	`[[:space:]]`
`\S`	Any non-whitespace character	`[^[:space:]]`

POSIX Character Classes (use inside brackets)

Class	Description	Example
`[[:alpha:]]`	Any letter	`[[:alpha:]]+` — one or more letters
`[[:digit:]]`	Any digit (0–9)	`[[:digit:]]{4}` — exactly four digits (e.g. a year)
`[[:lower:]]`	Any lower-case letter	`^[[:lower:]]+$` — all-lowercase tokens
`[[:upper:]]`	Any upper-case letter	`^[[:upper:]]+$` — all-uppercase tokens (e.g. acronyms)
`[[:punct:]]`	Any punctuation character	`[[:punct:]]` — matches `!`, `?`, `,` etc.
`[[:alnum:]]`	Any letter or digit	`^[[:alnum:]]+$` — tokens with no punctuation

Practical Examples for Corpus Linguistics

Task	Pattern	Explanation
Words ending in -ing	`.*ing$`	Any characters followed by "ing" at end
Words starting with un-	`^un.*`	Starts with "un" followed by anything
Spelling variation: color/colour	`colou?r`	The "u" is optional
Suffix alternation: -ise/-ize	`.*i[sz]e$`	Either "s" or "z" before final "e"
Personal pronouns	`^(I\|me\|my\|mine)$`	Exact match of any listed form
All inflected forms of a lemma (e.g. say)	`^(say\|said\|says\|saying)$`	Alternation inside `(...)` with `^…$` anchors returns only those exact word forms. Works for spelling variants too: `^(color\|colour\|colors\|colours)$`.
Contracted forms with apostrophe	`.*n't$`	don't, won't, can't, etc.
Words of exactly 3 letters	`^[[:alpha:]]{3}$`	Exactly three letters, anchored
Tokens containing digits	`\d`	Any token with at least one digit
ALL CAPS tokens (e.g. acronyms)	`^[[:upper:]]{2,}$`	Two or more uppercase letters only
Verb forms: go/goes/going/gone/went	`^(go\|goes\|going\|gone\|went)$`	Alternation with anchors for exact match
Phrasal verb particle in context position	`^(up\|out\|off\|on\|in\|down\|away\|back)$`	Common particles (use in context filter with regex toggle)
Reduplicated forms	`^(.+)\1$`	Back reference: same sequence repeated (e.g. "mama", "byebye")

You can also toggle regex mode for individual context positions (L1–L5, R1–R5), allowing you to use patterns like ^(the|a|an)$ to match determiners in a specific slot.

For the full PostgreSQL regular expression reference, see the documentation. If you need help constructing a complex pattern, please contact us.

Contact:

General comments: Mikko [dot] Laitinen [at] uef [dot] fi
Technical comments: Mehrdad [dot] Salimi [at] uef [dot] fi
Technical comments: Masoud [dot] Fatemi [at] uef [dot] fi