123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265 |
- [[search-field-stats]]
- == Field stats API
- experimental[]
- The field stats api allows one to find statistical properties of a field
- without executing a search, but looking up measurements that are natively
- available in the Lucene index. This can be useful to explore a dataset which
- you don't know much about. For example, this allows creating a histogram
- aggregation with meaningful intervals based on the min/max range of values.
- The field stats api by defaults executes on all indices, but can execute on
- specific indices too.
- All indices:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/_field_stats?fields=rating"
- --------------------------------------------------
- Specific indices:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/index1,index2/_field_stats?fields=rating"
- --------------------------------------------------
- Supported request options:
- [horizontal]
- `fields`:: A list of fields to compute stats for.
- `level`:: Defines if field stats should be returned on a per index level or on a
- cluster wide level. Valid values are `indices` and `cluster` (default).
- Alternatively the `fields` option can also be defined in the request body:
- [source,js]
- --------------------------------------------------
- curl -XPOST "http://localhost:9200/_field_stats?level=indices" -d '{
- "fields" : ["rating"]
- }'
- --------------------------------------------------
- This is equivalent to the previous request.
- [float]
- === Field statistics
- The field stats api is supported on string based, number based and date based fields and can return the following statistics per field:
- [horizontal]
- `max_doc`::
- The total number of documents.
- `doc_count`::
- The number of documents that have at least one term for this field, or -1 if
- this measurement isn't available on one or more shards.
- `density`::
- The percentage of documents that have at least one value for this field. This
- is a derived statistic and is based on the `max_doc` and `doc_count`.
- `sum_doc_freq`::
- The sum of each term's document frequency in this field, or -1 if this
- measurement isn't available on one or more shards.
- Document frequency is the number of documents containing a particular term.
- `sum_total_term_freq`::
- The sum of the term frequencies of all terms in this field across all
- documents, or -1 if this measurement isn't available on one or more shards.
- Term frequency is the total number of occurrences of a term in a particular
- document and field.
- `min_value`::
- The lowest value in the field represented in a displayable form.
- `max_value`::
- The highest value in the field represented in a displayable form.
- NOTE: Documents marked as deleted (but not yet removed by the merge process)
- still affect all the mentioned statistics.
- [float]
- === Cluster level field statistics example
- Request:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/_field_stats?fields=rating,answer_count,creation_date,display_name"
- --------------------------------------------------
- Response:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 1,
- "successful": 1,
- "failed": 0
- },
- "indices": {
- "_all": { <1>
- "fields": {
- "creation_date": {
- "max_doc": 1326564,
- "doc_count": 564633,
- "density": 42,
- "sum_doc_freq": 2258532,
- "sum_total_term_freq": -1,
- "min_value": "2008-08-01T16:37:51.513Z",
- "max_value": "2013-06-02T03:23:11.593Z"
- },
- "display_name": {
- "max_doc": 1326564,
- "doc_count": 126741,
- "density": 9,
- "sum_doc_freq": 166535,
- "sum_total_term_freq": 166616,
- "min_value": "0",
- "max_value": "정혜선"
- },
- "answer_count": {
- "max_doc": 1326564,
- "doc_count": 139885,
- "density": 10,
- "sum_doc_freq": 559540,
- "sum_total_term_freq": -1,
- "min_value": 0,
- "max_value": 160
- },
- "rating": {
- "max_doc": 1326564,
- "doc_count": 437892,
- "density": 33,
- "sum_doc_freq": 1751568,
- "sum_total_term_freq": -1,
- "min_value": -14,
- "max_value": 1277
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> The `_all` key indicates that it contains the field stats of all indices in the cluster.
- [float]
- ==== Indices level field statistics example
- Request:
- [source,js]
- --------------------------------------------------
- curl -XGET "http://localhost:9200/_field_stats?fields=rating,answer_count,creation_date,display_name&level=indices"
- --------------------------------------------------
- Response:
- [source,js]
- --------------------------------------------------
- {
- "_shards": {
- "total": 1,
- "successful": 1,
- "failed": 0
- },
- "indices": {
- "stack": { <1>
- "fields": {
- "creation_date": {
- "max_doc": 1326564,
- "doc_count": 564633,
- "density": 42,
- "sum_doc_freq": 2258532,
- "sum_total_term_freq": -1,
- "min_value": "2008-08-01T16:37:51.513Z",
- "max_value": "2013-06-02T03:23:11.593Z"
- },
- "display_name": {
- "max_doc": 1326564,
- "doc_count": 126741,
- "density": 9,
- "sum_doc_freq": 166535,
- "sum_total_term_freq": 166616,
- "min_value": "0",
- "max_value": "정혜선"
- },
- "answer_count": {
- "max_doc": 1326564,
- "doc_count": 139885,
- "density": 10,
- "sum_doc_freq": 559540,
- "sum_total_term_freq": -1,
- "min_value": 0,
- "max_value": 160
- },
- "rating": {
- "max_doc": 1326564,
- "doc_count": 437892,
- "density": 33,
- "sum_doc_freq": 1751568,
- "sum_total_term_freq": -1,
- "min_value": -14,
- "max_value": 1277
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> The `stack` key means it contains all field stats for the `stack` index.
- [float]
- === Field stats index constraints
- Field stats index constraints allows to omit all field stats for indices that don't match with the constraint. An index
- constraint can exclude indices' field stats based on the `min_value` and `max_value` statistic. This option is only
- useful if the `level` option is set to `indices`.
- For example index constraints can be useful to find out the min and max value of a particular property of your data in
- a time based scenario. The following request only returns field stats for the `answer_count` property for indices
- holding questions created in the year 2014:
- [source,js]
- --------------------------------------------------
- curl -XPOST "http://localhost:9200/_field_stats?level=indices" -d '{
- "fields" : ["answer_count"] <1>
- "index_constraints" : { <2>
- "creation_date" : { <3>
- "min_value" : { <4>
- "gte" : "2014-01-01T00:00:00.000Z",
- },
- "max_value" : {
- "lt" : "2015-01-01T00:00:00.000Z"
- }
- }
- }
- }'
- --------------------------------------------------
- <1> The fields to compute and return field stats for.
- <2> The set index constraints. Note that index constrains can be defined for fields that aren't defined in the `fields` option.
- <3> Index constraints for the field `creation_date`.
- <4> An index constraint on the `min_value` property of a field statistic.
- For a field, index constraints can be defined on the `min_value` statistic, `max_value` statistic or both.
- Each index constraint support the following comparisons:
- [horizontal]
- `gte`:: Greater-than or equal to
- `gt`:: Greater-than
- `lte`:: Less-than or equal to
- `lt`:: Less-than
|