123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223 |
- [[mixing-exact-search-with-stemming]]
- === Mixing exact search with stemming
- When building a search application, stemming is often a must as it is desirable
- for a query on `skiing` to match documents that contain `ski` or `skis`. But
- what if a user wants to search for `skiing` specifically? The typical way to do
- this would be to use a <<multi-fields,multi-field>> in order to have the same
- content indexed in two different ways:
- [source,js]
- --------------------------------------------------
- PUT index
- {
- "settings": {
- "analysis": {
- "analyzer": {
- "english_exact": {
- "tokenizer": "standard",
- "filter": [
- "lowercase"
- ]
- }
- }
- }
- },
- "mappings": {
- "_doc": {
- "properties": {
- "body": {
- "type": "text",
- "analyzer": "english",
- "fields": {
- "exact": {
- "type": "text",
- "analyzer": "english_exact"
- }
- }
- }
- }
- }
- }
- }
- PUT index/_doc/1
- {
- "body": "Ski resort"
- }
- PUT index/_doc/2
- {
- "body": "A pair of skis"
- }
- POST index/_refresh
- --------------------------------------------------
- // CONSOLE
- With such a setup, searching for `ski` on `body` would return both documents:
- [source,js]
- --------------------------------------------------
- GET index/_search
- {
- "query": {
- "simple_query_string": {
- "fields": [ "body" ],
- "query": "ski"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [source,js]
- --------------------------------------------------
- {
- "took": 2,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total": 2,
- "max_score": 0.18232156,
- "hits": [
- {
- "_index": "index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.18232156,
- "_source": {
- "body": "Ski resort"
- }
- },
- {
- "_index": "index",
- "_type": "_doc",
- "_id": "2",
- "_score": 0.18232156,
- "_source": {
- "body": "A pair of skis"
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 2,/"took": "$body.took",/]
- On the other hand, searching for `ski` on `body.exact` would only return
- document `1` since the analysis chain of `body.exact` does not perform
- stemming.
- [source,js]
- --------------------------------------------------
- GET index/_search
- {
- "query": {
- "simple_query_string": {
- "fields": [ "body.exact" ],
- "query": "ski"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [source,js]
- --------------------------------------------------
- {
- "took": 1,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total": 1,
- "max_score": 0.8025915,
- "hits": [
- {
- "_index": "index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.8025915,
- "_source": {
- "body": "Ski resort"
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 1,/"took": "$body.took",/]
- This is not something that is easy to expose to end users, as we would need to
- have a way to figure out whether they are looking for an exact match or not and
- redirect to the appropriate field accordingly. Also what to do if only parts of
- the query need to be matched exactly while other parts should still take
- stemming into account?
- Fortunately, the `query_string` and `simple_query_string` queries have a feature
- that solve this exact problem: `quote_field_suffix`. This tell Elasticsearch
- that the words that appear in between quotes are to be redirected to a different
- field, see below:
- [source,js]
- --------------------------------------------------
- GET index/_search
- {
- "query": {
- "simple_query_string": {
- "fields": [ "body" ],
- "quote_field_suffix": ".exact",
- "query": "\"ski\""
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[continued]
- [source,js]
- --------------------------------------------------
- {
- "took": 2,
- "timed_out": false,
- "_shards": {
- "total": 1,
- "successful": 1,
- "skipped" : 0,
- "failed": 0
- },
- "hits": {
- "total": 1,
- "max_score": 0.8025915,
- "hits": [
- {
- "_index": "index",
- "_type": "_doc",
- "_id": "1",
- "_score": 0.8025915,
- "_source": {
- "body": "Ski resort"
- }
- }
- ]
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 2,/"took": "$body.took",/]
- In the above case, since `ski` was in-between quotes, it was searched on the
- `body.exact` field due to the `quote_field_suffix` parameter, so only document
- `1` matched. This allows users to mix exact search with stemmed search as they
- like.
|