snowball-analyzer.asciidoc 2.5 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
  1. [[analysis-snowball-analyzer]]
  2. === Snowball Analyzer
  3. An analyzer of type `snowball` that uses the
  4. <<analysis-standard-tokenizer,standard
  5. tokenizer>>, with
  6. <<analysis-standard-tokenfilter,standard
  7. filter>>,
  8. <<analysis-lowercase-tokenfilter,lowercase
  9. filter>>,
  10. <<analysis-stop-tokenfilter,stop
  11. filter>>, and
  12. <<analysis-snowball-tokenfilter,snowball
  13. filter>>.
  14. The Snowball Analyzer is a stemming analyzer from Lucene that is
  15. originally based on the snowball project from
  16. http://snowball.tartarus.org[snowball.tartarus.org].
  17. Sample usage:
  18. [source,js]
  19. --------------------------------------------------
  20. {
  21. "index" : {
  22. "analysis" : {
  23. "analyzer" : {
  24. "my_analyzer" : {
  25. "type" : "snowball",
  26. "language" : "English"
  27. }
  28. }
  29. }
  30. }
  31. }
  32. --------------------------------------------------
  33. The `language` parameter can have the same values as the
  34. <<analysis-snowball-tokenfilter,snowball
  35. filter>> and defaults to `English`. Note that not all the language
  36. analyzers have a default set of stopwords provided.
  37. The `stopwords` parameter can be used to provide stopwords for the
  38. languages that have no defaults, or to simply replace the default set
  39. with your custom list. Check <<analysis-stop-analyzer,Stop Analyzer>>
  40. for more details. A default set of stopwords for many of these
  41. languages is available from for instance
  42. https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/[here]
  43. and
  44. https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball[here.]
  45. A sample configuration (in YAML format) specifying Swedish with
  46. stopwords:
  47. [source,js]
  48. --------------------------------------------------
  49. index :
  50. analysis :
  51. analyzer :
  52. my_analyzer:
  53. type: snowball
  54. language: Swedish
  55. stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"
  56. --------------------------------------------------