Reelex (Reetz - Celex interface) note on "Pattern search".



The search for words can be done with the orthographic representation, the 'DISC' (IPA) representation, or the 'CVC' representation of words with so-called 'regular expressions'. In case there are more than one representation for a given word, Reelex uses only the first representation given by Celex. This might be changed in future versions of Reelex.



Orthographic search
Upper / lower case differences in the spelling are ignored.
Umlaute are represented as 'ae', 'oe', 'ue' and the 'sharp s' is represented as 'ss' in Celex.
In English, there are entries with spaces, apostrophs and hyphens (bird's-eye view), with points (a.m.), with slashes (A/C) and with commata (hop, skip, and jump).
In Dutch, there are entries with apostrophs and hyphens ('s-Gravendeel).



DISC (IPA) search
In the present version of Reelex, the 'IPA' search is the 'DISC' representation of Celex given by the third column in the 'phonology' files of Celex (in a later version, true IPA characters may be used). The DISC notation uses one symbol for any speech sound (incl. short/long vowels, diphthongs, affricates). There is a PDF-file which lists all DISC symbols in IPA and SAMPA notations. A simple way to find out about a specific DISC symbol is by earching a word in the orthographic notation and check the DISC transcriptin of the sound in question.
There are three different ways to give DISC search strings: (1) with syllable breaks given by hyphens (-) and main stress marks given by simple apostrophs (') at the beginning of a syllable, (2) with syllable breaks, and (3) only by the sequence of segments.
Here is a list of the least transparent DISC-to-IPA translation (see also the special symbols below). A complete list of all sounds can be downloaded from .
ExampleSAMPADISC
Pferdpf+
Zahlts=
GindZ_
AdvantageA:#
AllroundmanO:$
KäseE:)
Öl|:|
Götter//
Ragtime{{
hata&
Beginn@2
Parfuml/~:^
AfrontO~:~




CVC search
The sequence of consonants (C) and vowels (V) as given by Celex in Column five of the 'phonology' files. Long vowels and diphthongs are represented by two vowel symbols (VV), and affricates by one consonant (C).
CVC pattern can be searched with or without syllable breaks.



Special symbols for orthographic and DISC (IPA) pattern
There are some special pattern symbols to search the orthographic and DISC (IPA) data (not in the minimal pairs part) for groups of sounds, which are represented by capital letters preceeded by a backslash. These special symbols are:
symbolorthographyIPA
\Cany consonantany consonant
\Gany double-consonant (e.g. pp)
\Vany single vowelany short vowel
\V\Vany two vowelsany long vowel or diphthong
\Fany single or double vowelany short or long vowel or diphthong
\Xany letterany sound
\Yone or more lettersone or more sounds
\Znone or more lettersnone or more sounds




Regular expressions
Pattern can be given as strings as they appear in the orthographic or DISC notation of Celex. Internally, Reelex uses so-called regular expresions as they are used in the scripting language Perl (which is used to run Reelex). The problem is that Celex' DISC notation uses some of the symbols itself (e.g. '){|$'), which limits the use of regular expressions in Reelex.