analysis.asciidoc 1.7 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
  1. [[analysis]]
  2. = Text analysis
  3. :lucene-analysis-docs: https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis
  4. :lucene-stop-word-link: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/resources/org/apache/lucene/analysis
  5. [partintro]
  6. --
  7. _Text analysis_ is the process of converting unstructured text, like
  8. the body of an email or a product description, into a structured format that's
  9. optimized for search.
  10. [discrete]
  11. [[when-to-configure-analysis]]
  12. === When to configure text analysis
  13. {es} performs text analysis when indexing or searching <<text,`text`>> fields.
  14. If your index doesn't contain `text` fields, no further setup is needed; you can
  15. skip the pages in this section.
  16. However, if you use `text` fields or your text searches aren't returning results
  17. as expected, configuring text analysis can often help. You should also look into
  18. analysis configuration if you're using {es} to:
  19. * Build a search engine
  20. * Mine unstructured data
  21. * Fine-tune search for a specific language
  22. * Perform lexicographic or linguistic research
  23. [discrete]
  24. [[analysis-toc]]
  25. === In this section
  26. * <<analysis-overview>>
  27. * <<analysis-concepts>>
  28. * <<configure-text-analysis>>
  29. * <<analysis-analyzers>>
  30. * <<analysis-tokenizers>>
  31. * <<analysis-tokenfilters>>
  32. * <<analysis-charfilters>>
  33. * <<analysis-normalizers>>
  34. --
  35. include::analysis/overview.asciidoc[]
  36. include::analysis/concepts.asciidoc[]
  37. include::analysis/configure-text-analysis.asciidoc[]
  38. include::analysis/analyzers.asciidoc[]
  39. include::analysis/tokenizers.asciidoc[]
  40. include::analysis/tokenfilters.asciidoc[]
  41. include::analysis/charfilters.asciidoc[]
  42. include::analysis/normalizers.asciidoc[]