Browse Source

Added warning messages about the dangers of pathological regexes to:
* pattern-replace charfilter
* pattern-capture and pattern-replace token filters
* pattern tokenizer
* pattern analyzer

Relates to #20038

Clinton Gormley 9 years ago
parent
commit
2f6d0119f1

+ 15 - 0
docs/reference/analysis/analyzers/pattern-analyzer.asciidoc

@@ -5,6 +5,21 @@ The `pattern` analyzer uses a regular expression to split the text into terms.
 The regular expression should match the *token separators*  not the tokens
 themselves. The regular expression defaults to `\W+` (or all non-word characters).
 
+[WARNING]
+.Beware of Pathological Regular Expressions
+========================================
+
+The pattern analyzer uses
+http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
+
+A badly written regular expression could run very slowly or even throw a
+StackOverflowError and cause the node it is running on to exit suddenly.
+
+Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
+
+========================================
+
+
 [float]
 === Definition
 

+ 14 - 0
docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc

@@ -5,6 +5,20 @@ The `pattern_replace` character filter uses a regular expression to match
 characters which should be replaced with the specified replacement string.
 The replacement string can refer to capture groups in the regular expression.
 
+[WARNING]
+.Beware of Pathological Regular Expressions
+========================================
+
+The pattern replace character filter uses
+http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
+
+A badly written regular expression could run very slowly or even throw a
+StackOverflowError and cause the node it is running on to exit suddenly.
+
+Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
+
+========================================
+
 [float]
 === Configuration
 

+ 14 - 0
docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc

@@ -7,6 +7,20 @@ Patterns are not anchored to the beginning and end of the string, so
 each pattern can match multiple times, and matches are allowed to
 overlap.
 
+[WARNING]
+.Beware of Pathological Regular Expressions
+========================================
+
+The pattern capture token filter uses
+http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
+
+A badly written regular expression could run very slowly or even throw a
+StackOverflowError and cause the node it is running on to exit suddenly.
+
+Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
+
+========================================
+
 For instance a pattern like :
 
 [source,js]

+ 14 - 0
docs/reference/analysis/tokenfilters/pattern_replace-tokenfilter.asciidoc

@@ -7,3 +7,17 @@ defined using the `pattern` parameter, and the replacement string can be
 provided using the `replacement` parameter (supporting referencing the
 original text, as explained
 http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
+
+[WARNING]
+.Beware of Pathological Regular Expressions
+========================================
+
+The pattern replace token filter uses
+http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
+
+A badly written regular expression could run very slowly or even throw a
+StackOverflowError and cause the node it is running on to exit suddenly.
+
+Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
+
+========================================

+ 14 - 0
docs/reference/analysis/tokenizers/pattern-tokenizer.asciidoc

@@ -8,6 +8,20 @@ terms.
 The default pattern is `\W+`, which splits text whenever it encounters
 non-word characters.
 
+[WARNING]
+.Beware of Pathological Regular Expressions
+========================================
+
+The pattern tokenizer uses
+http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
+
+A badly written regular expression could run very slowly or even throw a
+StackOverflowError and cause the node it is running on to exit suddenly.
+
+Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
+
+========================================
+
 [float]
 === Example output