Browse Source

[DOCS] Add concepts section to analysis topic (#50801)

This helps the topic better match the structure of
our machine learning docs, e.g.
https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html

This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts'
child page, but I plan to add other concepts, such as 'Index time vs.
search time', with later PRs.
James Rodewig 5 years ago
parent
commit
0605eb2078

+ 1 - 1
docs/reference/analysis.asciidoc

@@ -144,7 +144,7 @@ looking for:
 
 include::analysis/overview.asciidoc[]
 
-include::analysis/anatomy.asciidoc[]
+include::analysis/concepts.asciidoc[]
 
 include::analysis/testing.asciidoc[]
 

+ 5 - 13
docs/reference/analysis/anatomy.asciidoc

@@ -1,5 +1,5 @@
 [[analyzer-anatomy]]
-== Anatomy of an analyzer
+=== Anatomy of an analyzer
 
 An _analyzer_  -- whether built-in or custom -- is just a package which
 contains three lower-level building blocks: _character filters_,
@@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text.
 Elasticsearch also exposes the individual building blocks so that they can be
 combined to define new <<analysis-custom-analyzer,`custom`>> analyzers.
 
-[float]
-=== Character filters
+==== Character filters
 
 A _character filter_ receives the original text as a stream of characters and
 can transform the stream by adding, removing, or changing characters.  For
@@ -22,8 +21,7 @@ elements like `<b>` from the stream.
 An analyzer may have *zero or more* <<analysis-charfilters,character filters>>,
 which are applied in order.
 
-[float]
-=== Tokenizer
+==== Tokenizer
 
 A _tokenizer_  receives a stream of characters, breaks it up into individual
 _tokens_ (usually individual words), and outputs a stream of _tokens_. For
@@ -37,9 +35,7 @@ the term represents.
 
 An analyzer must have *exactly one* <<analysis-tokenizers,tokenizer>>.
 
-
-[float]
-=== Token filters
+==== Token filters
 
 A _token filter_ receives the token stream and may add, remove, or change
 tokens.  For example, a <<analysis-lowercase-tokenfilter,`lowercase`>> token
@@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of
 each token.
 
 An analyzer may have *zero or more* <<analysis-tokenfilters,token filters>>,
-which are applied in order.
-
-
-
-
+which are applied in order.

+ 11 - 0
docs/reference/analysis/concepts.asciidoc

@@ -0,0 +1,11 @@
+[[analysis-concepts]]
+== Text analysis concepts
+++++
+<titleabbrev>Concepts</titleabbrev>
+++++
+
+This section explains the fundamental concepts of text analysis in {es}.
+
+* <<analyzer-anatomy>>
+
+include::anatomy.asciidoc[]