123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214 |
- [[index-modules-index-sorting]]
- == Index Sorting
- When creating a new index in Elasticsearch it is possible to configure how the Segments
- inside each Shard will be sorted. By default Lucene does not apply any sort.
- The `index.sort.*` settings define which fields should be used to sort the documents inside each Segment.
- [WARNING]
- nested fields are not compatible with index sorting because they rely on the assumption
- that nested documents are stored in contiguous doc ids, which can be broken by index sorting.
- An error will be thrown if index sorting is activated on an index that contains nested fields.
- For instance the following example shows how to define a sort on a single field:
- [source,console]
- --------------------------------------------------
- PUT my-index-000001
- {
- "settings": {
- "index": {
- "sort.field": "date", <1>
- "sort.order": "desc" <2>
- }
- },
- "mappings": {
- "properties": {
- "date": {
- "type": "date"
- }
- }
- }
- }
- --------------------------------------------------
- <1> This index is sorted by the `date` field
- <2> ... in descending order.
- It is also possible to sort the index by more than one field:
- [source,console]
- --------------------------------------------------
- PUT my-index-000001
- {
- "settings": {
- "index": {
- "sort.field": [ "username", "date" ], <1>
- "sort.order": [ "asc", "desc" ] <2>
- }
- },
- "mappings": {
- "properties": {
- "username": {
- "type": "keyword",
- "doc_values": true
- },
- "date": {
- "type": "date"
- }
- }
- }
- }
- --------------------------------------------------
- <1> This index is sorted by `username` first then by `date`
- <2> ... in ascending order for the `username` field and in descending order for the `date` field.
- Index sorting supports the following settings:
- `index.sort.field`::
- The list of fields used to sort the index.
- Only `boolean`, `numeric`, `date` and `keyword` fields with `doc_values` are allowed here.
- `index.sort.order`::
- The sort order to use for each field.
- The order option can have the following values:
- * `asc`: For ascending order
- * `desc`: For descending order.
- `index.sort.mode`::
- Elasticsearch supports sorting by multi-valued fields.
- The mode option controls what value is picked to sort the document.
- The mode option can have the following values:
- * `min`: Pick the lowest value.
- * `max`: Pick the highest value.
- `index.sort.missing`::
- The missing parameter specifies how docs which are missing the field should be treated.
- The missing value can have the following values:
- * `_last`: Documents without value for the field are sorted last.
- * `_first`: Documents without value for the field are sorted first.
- [WARNING]
- Index sorting can be defined only once at index creation. It is not allowed to add or update
- a sort on an existing index. Index sorting also has a cost in terms of indexing throughput since
- documents must be sorted at flush and merge time. You should test the impact on your application
- before activating this feature.
- [discrete]
- [[early-terminate]]
- === Early termination of search request
- By default in Elasticsearch a search request must visit every document that matches a query to
- retrieve the top documents sorted by a specified sort.
- Though when the index sort and the search sort are the same it is possible to limit
- the number of documents that should be visited per segment to retrieve the N top ranked documents globally.
- For example, let's say we have an index that contains events sorted by a timestamp field:
- [source,console]
- --------------------------------------------------
- PUT events
- {
- "settings": {
- "index": {
- "sort.field": "timestamp",
- "sort.order": "desc" <1>
- }
- },
- "mappings": {
- "properties": {
- "timestamp": {
- "type": "date"
- }
- }
- }
- }
- --------------------------------------------------
- <1> This index is sorted by timestamp in descending order (most recent first)
- You can search for the last 10 events with:
- [source,console]
- --------------------------------------------------
- GET /events/_search
- {
- "size": 10,
- "sort": [
- { "timestamp": "desc" }
- ]
- }
- --------------------------------------------------
- // TEST[continued]
- Elasticsearch will detect that the top docs of each segment are already sorted in the index
- and will only compare the first N documents per segment.
- The rest of the documents matching the query are collected to count the total number of results
- and to build aggregations.
- If you're only looking for the last 10 events and have no interest in
- the total number of documents that match the query you can set `track_total_hits`
- to false:
- [source,console]
- --------------------------------------------------
- GET /events/_search
- {
- "size": 10,
- "sort": [ <1>
- { "timestamp": "desc" }
- ],
- "track_total_hits": false
- }
- --------------------------------------------------
- // TEST[continued]
- <1> The index sort will be used to rank the top documents and each segment will early terminate the collection after the first 10 matches.
- This time, Elasticsearch will not try to count the number of documents and will be able to terminate the query
- as soon as N documents have been collected per segment.
- [source,console-result]
- --------------------------------------------------
- {
- "_shards": ...
- "hits" : { <1>
- "max_score" : null,
- "hits" : []
- },
- "took": 20,
- "timed_out": false
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
- // TESTRESPONSE[s/"took": 20,/"took": "$body.took",/]
- <1> The total number of hits matching the query is unknown because of early termination.
- NOTE: Aggregations will collect all documents that match the query regardless
- of the value of `track_total_hits`
- [[index-modules-index-sorting-conjunctions]]
- === Use index sorting to speed up conjunctions
- Index sorting can be useful in order to organize Lucene doc ids (not to be
- conflated with `_id`) in a way that makes conjunctions (a AND b AND ...) more
- efficient. In order to be efficient, conjunctions rely on the fact that if any
- clause does not match, then the entire conjunction does not match. By using
- index sorting, we can put documents that do not match together, which will
- help skip efficiently over large ranges of doc IDs that do not match the
- conjunction.
- This trick only works with low-cardinality fields. A rule of thumb is that
- you should sort first on fields that both have a low cardinality and are
- frequently used for filtering. The sort order (`asc` or `desc`) does not
- matter as we only care about putting values that would match the same clauses
- close to each other.
- For instance if you were indexing cars for sale, it might be interesting to
- sort by fuel type, body type, make, year of registration and finally mileage.
|