| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286 | [[regexp-syntax]]==== Regular expression syntaxRegular expression queries are supported by the `regexp` and the `query_string`queries.  The Lucene regular expression engineis not Perl-compatible but supports a smaller range of operators.[NOTE]=====We will not attempt to explain regular expressions, butjust explain the supported operators.========== Standard operatorsAnchoring::+--Most regular expression engines allow you to match any part of a string.If you want the regexp pattern to start at the beginning of the string orfinish at the end of the string, then you have to _anchor_ it specifically,using `^` to indicate the beginning or `$` to indicate the end.Lucene's patterns are always anchored.  The pattern provided must matchthe entire string. For string `"abcde"`:    ab.*     # match    abcd     # no match--Allowed characters::+--Any Unicode characters may be used in the pattern, but certain charactersare reserved and must be escaped.  The standard reserved characters are:..... ? + * | { } [ ] ( ) " \....If you enable optional features (see below) then these characters mayalso be reserved:    # @ & < >  ~Any reserved character can be escaped with a backslash `"\*"` includinga literal backslash character: `"\\"`Additionally, any characters (except double quotes) are interpreted literallywhen surrounded by double quotes:    john"@smith.com"--Match any character::+--The period `"."` can be used to represent any character.  For string `"abcde"`:    ab...   # match    a.c.e   # match--One-or-more::+--The plus sign `"+"` can be used to repeat the preceding shortest patternonce or more times. For string `"aaabbb"`:    a+b+        # match    aa+bb+      # match    a+.+        # match    aa+bbb+     # match--Zero-or-more::+--The asterisk `"*"` can be used to match the preceding shortest patternzero-or-more times.  For string `"aaabbb`":    a*b*        # match    a*b*c*      # match    .*bbb.*     # match    aaa*bbb*    # match--Zero-or-one::+--The question mark `"?"` makes the preceding shortest pattern optional. Itmatches zero or one times.  For string `"aaabbb"`:    aaa?bbb?    # match    aaaa?bbbb?  # match    .....?.?    # match    aa?bb?      # no match--Min-to-max::+--Curly brackets `"{}"` can be used to specify a minimum and (optionally)a maximum number of times the preceding shortest pattern can repeat.  Theallowed forms are:    {5}     # repeat exactly 5 times    {2,5}   # repeat at least twice and at most 5 times    {2,}    # repeat at least twiceFor string `"aaabbb"`:    a{3}b{3}        # match    a{2,4}b{2,4}    # match    a{2,}b{2,}      # match    .{3}.{3}        # match    a{4}b{4}        # no match    a{4,6}b{4,6}    # no match    a{4,}b{4,}      # no match--Grouping::+--Parentheses `"()"` can be used to form sub-patterns. The quantity operatorslisted above operate on the shortest previous pattern, which can be a group.For string `"ababab"`:    (ab)+       # match    ab(ab)+     # match    (..)+       # match    (...)+      # no match    (ab)*       # match    abab(ab)?   # match    ab(ab)?     # no match    (ab){3}     # match    (ab){1,2}   # no match--Alternation::+--The pipe symbol `"|"` acts as an OR operator. The match will succeed ifthe pattern on either the left-hand side OR the right-hand side matches.The alternation applies to the _longest pattern_, not the shortest.For string `"aabb"`:    aabb|bbaa   # match    aacc|bb     # no match    aa(cc|bb)   # match    a+|b+       # no match    a+b+|b+a+   # match    a+(b|c)+    # match--Character classes::+--Ranges of potential characters may be represented as character classesby enclosing them in square brackets `"[]"`. A leading `^`negates the character class. The allowed forms are:    [abc]   # 'a' or 'b' or 'c'    [a-c]   # 'a' or 'b' or 'c'    [-abc]  # '-' or 'a' or 'b' or 'c'    [abc\-] # '-' or 'a' or 'b' or 'c'    [^abc]  # any character except 'a' or 'b' or 'c'    [^a-c]  # any character except 'a' or 'b' or 'c'    [^-abc]  # any character except '-' or 'a' or 'b' or 'c'    [^abc\-] # any character except '-' or 'a' or 'b' or 'c'Note that the dash `"-"` indicates a range of characters, unless it isthe first character or if it is escaped with a backslash.For string `"abcd"`:    ab[cd]+     # match    [a-d]+      # match    [^a-d]+     # no match--===== Optional operatorsThese operators are available by default as the `flags` parameter defaults to `ALL`.Different flag combinations (concatenated with `"|"`) can be used to enable/disablespecific operators:    {        "regexp": {            "username": {                "value": "john~athon<1-5>",                "flags": "COMPLEMENT|INTERVAL"            }        }    }Complement::+--The complement is probably the most useful option. The shortest pattern thatfollows a tilde `"~"` is negated.  For instance, `"ab~cd" means:* Starts with `a`* Followed by `b`* Followed by a string of any length that it anything but `c`* Ends with `d`For the string `"abcdef"`:    ab~df     # match    ab~cf     # match    ab~cdef   # no match    a~(cb)def # match    a~(bc)def # no matchEnabled with the `COMPLEMENT` or `ALL` flags.--Interval::+--The interval option enables the use of numeric ranges, enclosed by anglebrackets `"<>"`. For string: `"foo80"`:    foo<1-100>     # match    foo<01-100>    # match    foo<001-100>   # no matchEnabled with the `INTERVAL` or `ALL` flags.--Intersection::+--The ampersand `"&"` joins two patterns in a way that both of them have tomatch. For string `"aaabbb"`:    aaa.+&.+bbb     # match    aaa&bbb         # no matchUsing this feature usually means that you should rewrite your regularexpression.Enabled with the `INTERSECTION` or `ALL` flags.--Any string::+--The at sign `"@"` matches any string in its entirety.  This could be combinedwith the intersection and complement above to express ``everything except''.For instance:    @&~(foo.+)      # anything except string beginning with "foo"Enabled with the `ANYSTRING` or `ALL` flags.--
 |