Browse Source

Docs: Refactored the mapping meta-fields docs

Clinton Gormley 10 năm trước cách đây
mục cha
commit
c56ce0e242

+ 1 - 1
docs/reference/index-modules/mapper.asciidoc

@@ -52,7 +52,7 @@ creating a new index.
 [float]
 === Mapper settings
 
-`index.mapper.dynamic` (_static_)::
+`index.mapper.dynamic` (_dynamic_)::
 
     Dynamic creation of mappings for unmapped types can be completely
     disabled by setting `index.mapper.dynamic` to `false`.

+ 61 - 10
docs/reference/mapping/fields.asciidoc

@@ -1,27 +1,78 @@
 [[mapping-fields]]
-== Fields
+== Meta-Fields
 
-Each mapping has a number of fields associated with it
-which can be used to control how the document metadata
-(eg <<mapping-all-field>>) is indexed.
+Each document has metadata associated with it, such as the `_index`, mapping
+<<mapping-type-field,`_type`>>, and `_id` meta-fields.  The behaviour of some of these meta-fields
+can be customised when a mapping type is created.
 
-include::fields/uid-field.asciidoc[]
+The meta-fields are:
 
-include::fields/id-field.asciidoc[]
+[horizontal]
+<<mapping-index-field,`_index`>>::
+
+    The index to which the document belongs.
+
+<<mapping-uid-field,`_uid`>>::
+
+    A composite field consisting of the `_type` and the `_id`.
+
+<<mapping-type-field,`_type`>>::
+
+    The document's <<all-mapping-types,mapping type>>.
+
+<<mapping-id-field,`_id`>>::
+
+    The document's ID.
+
+<<mapping-source-field,`_source`>>::
+
+    The original JSON representing the body of the document.
+
+<<mapping-all-field,`_all`>>::
+
+    A _catch-all_  field that indexes the values of all other fields.
+
+<<mapping-field-names-field,`_field_names`>>::
+
+    All fields in the document which contain non-null values.
+
+<<mapping-parent-field,`_parent`>>::
+
+    Used to create a parent-child relationship between two mapping types.
+
+<<mapping-routing-field,`_routing`>>::
+
+    A custom routing value which routes a document to a particular shard.
+
+<<mapping-size-field,`_size`>>::
+
+    The size of the `_source` field in bytes.
+
+<<mapping-timestamp-field,`_timestamp`>>::
+
+    A timestamp associated with the document, either specified manually or auto-generated.
+
+<<mapping-ttl-field,`_ttl`>>::
+
+    How long a document should live before it is automatically deleted.
+
+include::fields/index-field.asciidoc[]
+
+include::fields/uid-field.asciidoc[]
 
 include::fields/type-field.asciidoc[]
 
+include::fields/id-field.asciidoc[]
+
 include::fields/source-field.asciidoc[]
 
 include::fields/all-field.asciidoc[]
 
-include::fields/parent-field.asciidoc[]
-
 include::fields/field-names-field.asciidoc[]
 
-include::fields/routing-field.asciidoc[]
+include::fields/parent-field.asciidoc[]
 
-include::fields/index-field.asciidoc[]
+include::fields/routing-field.asciidoc[]
 
 include::fields/size-field.asciidoc[]
 

+ 401 - 63
docs/reference/mapping/fields/all-field.asciidoc

@@ -1,78 +1,416 @@
 [[mapping-all-field]]
-=== `_all`
+=== `_all` field
 
-The idea of the `_all` field is that it includes the text of one or more
-other fields within the document indexed. It can come very handy
-especially for search requests, where we want to execute a search query
-against the content of a document, without knowing which fields to
-search on. This comes at the expense of CPU cycles and index size.
+The `_all` field is a special _catch-all_ field which concatenates the values
+of all of the other fields into one big string, which is then
+<<analysis,analyzed>> and indexed, but not stored.  This means that it can be
+searched, but not retrieved.
 
-The `_all` fields can be completely disabled. Explicit field mappings and
-object mappings can be excluded / included in the `_all` field. By
-default, it is enabled and all fields are included in it for ease of
-use.
+The `_all` field allows you to search for values in documents without knowing
+which field contains the value.  This makes it a useful option when getting
+started with a new dataset. For instance:
 
-When disabling the `_all` field, it is a good practice to set
-`index.query.default_field` to a different value (for example, if you
-have a main "message" field in your data, set it to `message`).
+[source,js]
+--------------------------------
+PUT my_index/user/1 <1>
+{
+  "first_name":    "John",
+  "last_name":     "Smith",
+  "date_of_birth": "1970-10-24"
+}
+
+GET my_index/_search
+{
+  "query": {
+    "match": {
+      "_all": "john smith 1970"
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+<1> The `_all` field will contain the terms: [ `"john"`, `"smith"`, `"1970"`, `"10"`, `"24"` ]
+
+[NOTE]
+.All values treated as strings
+=============================================================================
 
-One of the nice features of the `_all` field is that it takes into
-account specific fields boost levels. Meaning that if a title field is
-boosted more than content, the title (part) in the `_all` field will
-mean more than the content (part) in the `_all` field.
+The `date_of_birth` field in the above example is recognised as a `date` field
+and so will index a single term representing `1970-10-24 00:00:00 UTC`. The
+`_all` field, however, treats all values as strings, so the date value is
+indexed as the three string terms: `"1970"`, `"24"`, `"10"`.
 
-Here is a sample mapping:
+It is important to note that the `_all` field combines the original values
+from each field as a string. It does not combine the _terms_ from each field.
+
+=============================================================================
+
+The `_all` field is just a <<string,`string`>> field, and accepts the same
+parameters that  other string fields accept, including `analyzer`,
+`term_vectors`, `index_options`, and `store`.
+
+The `_all` field can be useful, especially when exploring new data using
+simple filtering.  However, by concatenating field values into one big string,
+the `_all` field loses the distinction between short fields (more relevant)
+and long fields (less relevant). For use cases where search relevance is
+important, it is better to query individual fields specifically.
+
+The `_all` field is not free: it requires extra CPU cycles and uses more disk
+space. If not needed, it can be completely <<disabling-all-field,disabled>> or
+customised on a <<include-in-all,per-field basis>>.
+
+[[querying-all-field]]
+==== Using the `_all` field in queries
+
+The <<query-dsl-query-string-query,`query_string`>> and
+<<query-dsl-simple-query-string-query,`simple_query_string`>> queries query
+the `_all` field by default, unless another field is specified:
 
 [source,js]
---------------------------------------------------
-{
-    "person" : {
-        "_all" : {"enabled" : true},
-        "properties" : {
-            "name" : {
-                "type" : "object",
-                "dynamic" : false,
-                "properties" : {
-                    "first" : {"type" : "string", "store" : true , "include_in_all" : false},
-                    "last" : {"type" : "string", "index" : "not_analyzed"}
-                }
-            },
-            "address" : {
-                "type" : "object",
-                "include_in_all" : false,
-                "properties" : {
-                    "first" : {
-                        "properties" : {
-                            "location" : {"type" : "string", "store" : true}
-                        }
-                    },
-                    "last" : {
-                        "properties" : {
-                            "location" : {"type" : "string"}
-                        }
-                    }
-                }
-            },
-            "simple1" : {"type" : "long", "include_in_all" : true},
-            "simple2" : {"type" : "long", "include_in_all" : false}
+--------------------------------
+GET _search
+{
+  "query": {
+    "query_string": {
+      "query": "john smith 1970"
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+The same goes for the `?q=` parameter in <<search-uri-request, URI search
+requests>> (which is rewritten to a `query_string` query internally):
+
+[source,js]
+--------------------------------
+GET _search?q=john+smith+1970
+--------------------------------
+
+Other queries, such as the <<query-dsl-match-query,`match`>> and
+<<query-dsl-term-query,`term`>> queries require you to specify
+the `_all` field explicitly, as per the
+<<mapping-all-field,first example>>.
+
+[[disabling-all-field]]
+==== Disabling the `_all` field
+
+The `_all` field can be completely disabled per-type by setting `enabled` to
+`false`:
+
+[source,js]
+--------------------------------
+PUT my_index
+{
+  "mappings": {
+    "type_1": { <1>
+      "properties": {...}
+    },
+    "type_2": { <2>
+      "_all": {
+        "enabled": false
+      },
+      "properties": {...}
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+<1> The `_all` field in `type_1` is enabled.
+<2> The `_all` field in `type_2` is completely disabled.
+
+If the `_all` field is disabled, then URI search requests and the
+`query_string` and `simple_query_string` queries will not be able to use it
+for queries (see <<querying-all-field>>).  You can configure them to use a
+different field with the `index.query.default_field` setting:
+
+[source,js]
+--------------------------------
+PUT my_index
+{
+  "mappings": {
+    "my_type": {
+      "_all": {
+        "enabled": false <1>
+      },
+      "properties": {
+        "content": {
+          "type": "string"
+        }
+      }
+    }
+  },
+  "settings": {
+    "index.query.default_field": "content" <2>
+  },
+}
+--------------------------------
+// AUTOSENSE
+
+<1> The `_all` field is disabled for the `my_type` type.
+<2> The `query_string` query will default to querying the `content` field in this index.
+
+[[include-in-all]]
+==== Including specific fields in `_all`
+
+Individual fields can be included or excluded from the `_all` field with the
+`include_in_all` setting, which defaults to `true`:
+
+[source,js]
+--------------------------------
+PUT my_index
+{
+  "mappings": {
+    "my_type": {
+      "properties": {
+        "title": { <1>
+          "type": "string"
+        }
+        "content": { <1>
+          "type": "string"
+        },
+        "date": { <2>
+          "type": "date",
+          "include_in_all": false
         }
+      }
     }
+  }
 }
---------------------------------------------------
+--------------------------------
+// AUTOSENSE
 
-The `_all` fields allows for `store`, `term_vector` and `analyzer` (with
-specific `analyzer` and `search_analyzer`) to be set.
+<1> The `title` and `content` fields with be included in the `_all` field.
+<2> The `date` field will not be included in the `_all` field.
+
+The `include_in_all` parameter can also be set at the type level and on
+<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
+in which case all sub-fields inherit that setting.  For instance:
+
+[source,js]
+--------------------------------
+PUT my_index
+{
+  "mappings": {
+    "my_type": {
+      "include_in_all": false, <1>
+      "properties": {
+        "title":          { "type": "string" },
+        "author": {
+          "include_in_all": true, <2>
+          "properties": {
+            "first_name": { "type": "string" },
+            "last_name":  { "type": "string" }
+          }
+        },
+        "editor": {
+          "properties": {
+            "first_name": { "type": "string" }, <3>
+            "last_name":  { "type": "string", "include_in_all": true } <3>
+          }
+        }
+      }
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+<1> All fields in `my_type` are excluded from `_all`.
+<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
+<3> Only the `editor.last_name` field is included in `_all`.
+    The `editor.first_name` inherits the type-level setting and is excluded.
+
+[[all-field-and-boosting]]
+==== Index boosting and the `_all` field
+
+Individual fields can be _boosted_  at index time, with the `boost` parameter.
+The `_all` field takes these boosts into account:
+
+[source,js]
+--------------------------------
+PUT myindex
+{
+  "mappings": {
+    "mytype": {
+      "properties": {
+        "title": { <1>
+          "type": "string",
+          "boost": 2
+        },
+        "content": { <1>
+          "type": "string"
+        }
+      }
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+<1> When querying the `_all` field, words that originated in the
+    `title` field are twice as relevant as words that originated in
+    the `content` field.
+
+WARNING: Using index-time boosting with the `_all` field has a significant
+impact on query performance. Usually the better solution is to query fields
+individually, with optional query time boosting.
+
+
+[[custom-all-fields]]
+==== Custom `_all` fields
+
+While there is only a single `_all` field per index, the <<copy-to,`copy_to`>>
+parameter allows the creation of multiple __custom `_all` fields__. For
+instance, `first_name` and `last_name` fields can be combined together into
+the `full_name` field:
+
+[source,js]
+--------------------------------
+PUT myindex
+{
+  "mappings": {
+    "mytype": {
+      "properties": {
+        "first_name": {
+          "type":    "string",
+          "copy_to": "full_name" <1>
+        },
+        "last_name": {
+          "type":    "string",
+          "copy_to": "full_name" <1>
+        },
+        "full_name": {
+          "type":    "string"
+        }
+      }
+    }
+  }
+}
+
+PUT myindex/mytype/1
+{
+  "first_name": "John",
+  "last_name": "Smith"
+}
+
+GET myindex/_search
+{
+  "query": {
+    "match": {
+      "full_name": "John Smith"
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+<1> The `first_name` and `last_name` values are copied to the `full_name` field.
+
+[[highlighting-all-field]]
+==== Highlighting and the `_all` field
+
+A field can only be used for <<search-request-highlighting,highlighting>>  if
+the original string value is available, either from the
+<<mapping-source-field,`_source`>>  field or as a stored field.
+
+The `_all` field is not present in the `_source` field and it is not stored by
+default, and so cannot be highlighted. There are two options. Either
+<<all-field-store,store the `_all` field>> or highlight the
+<<all-highlight-fields,original fields>>.
+
+[[all-field-store]]
+===== Store the `_all` field
+
+If `store` is set to `true`, then the original field value is retrievable and
+can be highlighted:
+
+[source,js]
+--------------------------------
+PUT myindex
+{
+  "mappings": {
+    "mytype": {
+      "_all": {
+        "store": true
+      }
+    }
+  }
+}
+
+PUT myindex/mytype/1
+{
+  "first_name": "John",
+  "last_name": "Smith"
+}
+
+GET _search
+{
+  "query": {
+    "match": {
+      "_all": "John Smith"
+    }
+  },
+  "highlight": {
+    "fields": {
+      "_all": {}
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
+
+Of course, storing the `_all` field will use significantly more disk space
+and, because it is a combination of other fields, it may result in odd
+highlighting results.
+
+The `_all` field also accepts the `term_vector` and `index_options`
+parameters, allowing the use of the fast vector highlighter and the postings
+highlighter.
+
+[[all-highlight-fields]]
+===== Highlight original fields
+
+You can query the `_all` field, but use the original fields for highlighting as follows:
+
+[source,js]
+--------------------------------
+PUT myindex
+{
+  "mappings": {
+    "mytype": {
+      "_all": {}
+    }
+  }
+}
+
+PUT myindex/mytype/1
+{
+  "first_name": "John",
+  "last_name": "Smith"
+}
+
+GET _search
+{
+  "query": {
+    "match": {
+      "_all": "John Smith" <1>
+    }
+  },
+  "highlight": {
+    "fields": {
+      "*_name": { <2>
+        "require_field_match": "false"  <3>
+      }
+    }
+  }
+}
+--------------------------------
+// AUTOSENSE
 
-[float]
-[[highlighting]]
-==== Highlighting
+<1> The query inspects the `_all` field to find matching documents.
+<2> Highlighting is performed on the two name fields, which are available from the `_source`.
+<3> The query wasn't run against the name fields, so set `require_field_match` to `false`.
 
-For any field to allow
-<<search-request-highlighting,highlighting>> it has
-to be either stored or part of the `_source` field. By default the `_all`
-field does not qualify for either, so highlighting for it does not yield
-any data.
 
-Although it is possible to `store` the `_all` field, it is basically an
-aggregation of all fields, which means more data will be stored, and
-highlighting it might produce strange results.

+ 53 - 4
docs/reference/mapping/fields/field-names-field.asciidoc

@@ -1,6 +1,55 @@
 [[mapping-field-names-field]]
-=== `_field_names`
+=== `_field_names` field
+
+The `_field_names` field indexes the names of every field in a document that
+contains any value other than `null`.  This field is used by the
+<<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>>
+queries to find documents that either have or don't have any non-+null+ value
+for a particular field.
+
+The value of the `_field_name` field is accessible in queries, aggregations, and
+scripts:
+
+[source,js]
+--------------------------
+# Example documents
+PUT my_index/my_type/1
+{
+  "title": "This is a document"
+}
+
+PUT my_index/my_type/1
+{
+  "title": "This is another document",
+  "body": "This document has a body"
+}
+
+GET my_index/_search
+{
+  "query": {
+    "terms": {
+      "_field_names": [ "title" ] <1>
+    }
+  },
+  "aggs": {
+    "Field names": {
+      "terms": {
+        "field": "_field_names", <2>
+        "size": 10
+      }
+    }
+  },
+  "script_fields": {
+    "Field names": {
+      "script": "doc['_field_names']" <3>
+    }
+  }
+}
+
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_field_names` field (also see the <<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>> queries)
+<2> Aggregating on the `_field_names` field
+<3> Accessing the `_field_names` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
 
-The `_field_names` field indexes the field names of a document, which can later
-be used to search for documents based on the fields that they contain typically
-using the `exists` and `missing` filters.

+ 40 - 7
docs/reference/mapping/fields/id-field.asciidoc

@@ -1,11 +1,44 @@
 [[mapping-id-field]]
-=== `_id`
+=== `_id` field
 
-Each document indexed is associated with an id and a type. The `_id`
-field allows accessing only the id of a document.
+Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
+<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>.  The
+`_id` field is not indexed as its value can be derived automatically from the
+<<mapping-uid-field,`_uid`>> field.
 
-Note, even though the `_id` is not indexed, all the APIs still work
-(since they work with the `_uid` field), as well as fetching by ids
-using `term`, `terms` or `prefix` queries/filters (including the
-specific `ids` query/filter).
+The value of the `_id` field is accessible in queries and scripts, but _not_
+in aggregations or when sorting, where the <<mapping-uid-field,`_uid`>> field
+should be used instead:
+
+[source,js]
+--------------------------
+# Example documents
+PUT my_index/my_type/1
+{
+  "text": "Document with ID 1"
+}
+
+PUT my_index/my_type/2
+{
+  "text": "Document with ID 2"
+}
+
+GET my_index/_search
+{
+  "query": {
+    "terms": {
+      "_id": [ "1", "2" ] <1>
+    }
+  },
+  "script_fields": {
+    "UID": {
+      "script": "doc['_id']" <2>
+    }
+  }
+}
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
+<2> Accessing the `_id` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
 

+ 49 - 8
docs/reference/mapping/fields/index-field.asciidoc

@@ -1,15 +1,56 @@
 [[mapping-index-field]]
-=== `_index`
+=== `_index` field
 
-The ability to store in a document the index it belongs to. By default
-it is disabled, in order to enable it, the following mapping should be
-defined:
+The name of the index that contains the document.  This field is not indexed
+but can be automatically derived from the index itself.
+
+Its value is accessible in queries, aggregations, scripts, and when sorting:
 
 [source,js]
---------------------------------------------------
+--------------------------
+# Example documents
+PUT index_1/my_type/1
+{
+  "text": "Document in index 1"
+}
+
+PUT index_2/my_type/2
 {
-    "tweet" : {
-        "_index" : { "enabled" : true }
+  "text": "Document in index 2"
+}
+
+GET index_1,index_2/_search
+{
+  "query": {
+    "terms": {
+      "_index": ["index_1", "index_2"] <1>
+    }
+  },
+  "aggs": {
+    "indices": {
+      "terms": {
+        "field": "_index", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_index": { <3>
+        "order": "asc"
+      }
     }
+  ],
+  "script_fields": {
+    "index_name": {
+      "script": "doc['_index']" <4>
+    }
+  }
 }
---------------------------------------------------
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_index` field
+<2> Aggregating on the `_index` field
+<3> Sorting on the `_index` field
+<4> Accessing the `_index` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

+ 140 - 29
docs/reference/mapping/fields/parent-field.asciidoc

@@ -1,54 +1,165 @@
 [[mapping-parent-field]]
-=== `_parent`
+=== `_parent` field
 
-TIP: It is highly recommend to reindex all indices with `_parent` field created before version 2.x.
-     The reason for this is to gain from all the optimizations added with the 2.0 release.
+added[2.0.0,The parent-child implementation has been completely rewritten. It is advisable to reindex any 1.x indices which use parent-child to take advantage of the new optimizations]
 
-The parent field mapping is defined on a child mapping, and points to
-the parent type this child relates to. For example, in case of a `blog`
-type and a `blog_tag` type child document, the mapping for `blog_tag`
-should be:
+A parent-child relationship can be established between documents in the same
+index by making one mapping type the parent of another:
 
 [source,js]
 --------------------------------------------------
+PUT my_index
 {
-    "blog_tag" : {
-        "_parent" : {
-            "type" : "blog"
+  "mappings": {
+    "my_parent": {},
+    "my_child": {
+      "_parent": {
+        "type": "my_parent" <1>
+      }
+    }
+  }
+}
+
+PUT my_index/my_parent/1 <2>
+{
+  "text": "This is a parent document"
+}
+
+PUT my_index/my_child/2?parent=1 <3>
+{
+  "text": "This is a child document"
+}
+
+PUT my_index/my_child/3?parent=1 <3>
+{
+  "text": "This is another child document"
+}
+
+GET my_index/my_parent/_search
+{
+  "query": {
+    "has_child": { <4>
+      "type": "my_child",
+      "query": {
+        "match": {
+          "text": "child document"
         }
+      }
     }
+  }
 }
 --------------------------------------------------
+// AUTOSENSE
+<1> The `my_parent` type is parent to the `my_child` type.
+<2> Index a parent document.
+<3> Index two child documents, specifying the parent document's ID.
+<4> Find all parent documents that have children which match the query.
 
-The mapping is automatically stored and indexed (meaning it can be
-searched on using the `_parent` field notation).
 
-==== Limitations
+See the <<query-dsl-has-child-query,`has_child`>> and
+<<query-dsl-has-parent-query,`has_parent`>> queries,
+the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation,
+and <<parent-child-inner-hits,inner hits>> for more information.
 
-The `_parent.type` setting can only point to a type that doesn't exist yet.
-This means that a type can't become a parent type after is has been created.
+The value of the `_parent` field is accessible in queries, aggregations, scripts,
+and when sorting:
 
-The `parent.type` setting can't point to itself. This means self referential
-parent/child isn't supported.
+[source,js]
+--------------------------
+GET my_index/_search
+{
+  "query": {
+    "terms": {
+      "_parent": [ "1" ] <1>
+    }
+  },
+  "aggs": {
+    "parents": {
+      "terms": {
+        "field": "_parent", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_parent": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "parent": {
+      "script": "doc['_parent']" <4>
+    }
+  }
+}
+--------------------------
+// AUTOSENSE
 
-==== Global ordinals
+<1> Querying on the `_parent` field (also see the <<query-dsl-has-parent-query,`has_parent` query>> and the <<query-dsl-has-child-query,`has_child` query>>)
+<2> Aggregating on the `_parent` field (also see the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation)
+<3> Sorting on the `_parent` field
+<4> Accessing the `_parent` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
 
-Parent-child uses <<global-ordinals,global ordinals>> to speed up joins and global ordinals need to be rebuilt after any change to a shard.
-The more parent id values are stored in a shard, the longer it takes to rebuild global ordinals for the `_parent` field.
 
-Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals.
-This can introduce a significant latency spike for your users. You can use <<fielddata-loading,eager_global_ordinals>> to shift the cost of building global ordinals
-from query time to refresh time, by mapping the _parent field as follows:
+==== Parent-child restrictions
 
-==== Memory usage
+* The parent and child types must be different -- parent-child relationships
+  cannot be established between documents of the same type.
 
-The only on heap memory used by parent/child is the global ordinals for the `_parent` field.
+* The `_parent.type` setting can only point to a type that doesn't exist yet.
+  This means that a type cannot become a parent type after it is has been
+  created.
 
-How much memory is used for the global ordianls for the `_parent` field in the fielddata cache
-can be checked via the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
-APIS, eg:
+* Parent and child documents must be indexed on the same shard.  The `parent`
+  ID is used as the <<mapping-routing-field,routing>> value for the child,
+  to ensure that the child is indexed on the same shard as the parent.
+  This means that the same `parent` value needs to be provided when
+  <<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
+  a child document.
+
+==== Global ordinals
+
+Parent-child uses <<global-ordinals,global ordinals>> to speed up joins.
+Global ordinals need to be rebuilt after any change to a shard. The more
+parent id values are stored in a shard, the longer it takes to rebuild the
+global ordinals for the `_parent` field.
+
+Global ordinals, by default, are built lazily: the first parent-child query or
+aggregation after a refresh will trigger building of global ordinals. This can
+introduce a significant latency spike for your users. You can use
+<<fielddata-loading,eager_global_ordinals>> to shift the cost of building global
+ordinals from query time to refresh time, by mapping the `_parent` field as follows:
 
 [source,js]
 --------------------------------------------------
-curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
+PUT my_index
+{
+  "mappings": {
+    "my_parent": {},
+    "my_child": {
+      "_parent": {
+        "type": "my_parent",
+        "fielddata": {
+          "loading": "eager_global_ordinals"
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// AUTOSENSE
+
+The amount of heap used by global ordinals can be checked as follows:
+
+[source,sh]
+--------------------------------------------------
+# Per-index
+GET _stats/fielddata?human&fields=_parent
+
+# Per-node per-index
+GET _nodes/stats/indices/fielddata?human&fields=_parent
 --------------------------------------------------
+// AUTOSENSE
+

+ 128 - 16
docs/reference/mapping/fields/routing-field.asciidoc

@@ -1,22 +1,134 @@
 [[mapping-routing-field]]
-=== `_routing`
+=== `_routing` field
 
-The routing field allows to control the `_routing` aspect when indexing
-data and explicit routing control is required. It is stored and indexed.
+A document is routed to a particular shard in an index using the following
+formula:
 
-[float]
-==== required
+    shard_num = hash(_routing) % num_primary_shards
 
-Another aspect of the `_routing` mapping is the ability to define it as
-required by setting `required` to `true`. This is very important to set
-when using routing features, as it allows different APIs to make use of
-it. For example, an index operation will be rejected if no routing value
-has been provided.
+The default value used for `_routing` is the document's <<mapping-id-field,`_id`>>
+or the document's <<mapping-parent-field,`_parent`>> ID, if present.
 
-[float]
-==== id uniqueness
+Custom routing patterns can be implemented by specifying a custom `routing`
+value per document.  For instance:
 
-When indexing documents specifying a custom `_routing`, the uniqueness
-of the `_id` is not guaranteed throughout all the shards that the index
-is composed of. In fact, documents with the same `_id` might end up in
-different shards if indexed with different `_routing` values.
+[source,js]
+------------------------------
+PUT my_index/my_type/1?routing=user1 <1>
+{
+  "title": "This is a document"
+}
+
+GET my_index/my_type/1?routing=user1 <2>
+------------------------------
+// AUTOSENSE
+
+<1> This document uses `user1` as its routing value, instead of its ID.
+<2> The the same `routing` value needs to be provided when
+    <<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
+    the document.
+
+The value of the `_routing` field is accessible in queries, aggregations, scripts,
+and when sorting:
+
+[source,js]
+--------------------------
+GET my_index/_search
+{
+  "query": {
+    "terms": {
+      "_routing": [ "user1" ] <1>
+    }
+  },
+  "aggs": {
+    "Routing values": {
+      "terms": {
+        "field": "_routing", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_routing": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "Routing value": {
+      "script": "doc['_routing']" <4>
+    }
+  }
+}
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_routing` field (also see the <<query-dsl-ids-query,`ids` query>>)
+<2> Aggregating on the `_routing` field
+<3> Sorting on the `_routing` field
+<4> Accessing the `_routing` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
+
+
+==== Searching with custom routing
+
+Custom routing can reduce the impact of searches.  Instead of having to fan
+out a search request to all the shards in an index, the request can be sent to
+just the shard that matches the specific routing value (or values):
+
+[source,js]
+------------------------------
+GET my_index/_search?routing=user1,user2 <1>
+{
+  "query": {
+    "match": {
+      "title": "document"
+    }
+  }
+}
+------------------------------
+// AUTOSENSE
+
+<1> This search request will only be executed on the shards associated with the `user1` and `user2` routing values.
+
+
+==== Making a routing value required
+
+When using custom routing, it is important to provide the routing value
+whenever <<docs-index_,indexing>>, <<docs-get,getting>>,
+<<docs-delete,deleting>>, or <<docs-update,updating>> a document.
+
+Forgetting the routing value can lead to a document being indexed on more than
+one shard.  As a safeguard, the `_routing` field can be configured to make a
+custom `routing` value required for all CRUD operations:
+
+[source,js]
+------------------------------
+PUT my_index
+{
+  "mappings": {
+    "my_type": {
+      "_routing": {
+        "required": true <1>
+      }
+    }
+  }
+}
+
+PUT my_index/my_type/1 <2>
+{
+  "text": "No routing value provided"
+}
+------------------------------
+// AUTOSENSE
+<1> Routing is required for `my_type` documents.
+<2> This index request throws a `routing_missing_exception`.
+
+==== Unique IDs with custom routing
+
+When indexing documents specifying a custom `_routing`, the uniqueness of the
+`_id` is not guaranteed across all of the shards in the index. In fact,
+documents with the same `_id` might end up on different shards if indexed with
+different `_routing` values.
+
+It is up to the user to ensure that IDs are unique across the index.

+ 69 - 8
docs/reference/mapping/fields/size-field.asciidoc

@@ -1,15 +1,76 @@
 [[mapping-size-field]]
-=== `_size`
+=== `_size` field
 
-The `_size` field allows to automatically index the size of the original
-`_source` indexed. By default, it's disabled. In order to enable it, set
-the mapping to:
+The `_size` field, when enabled, indexes the size in bytes of the original
+<<mapping-source-field,`_source`>>. In order to enable it, set
+the mapping as follows:
 
 [source,js]
---------------------------------------------------
+--------------------------
+PUT my_index
 {
-    "tweet" : {
-        "_size" : {"enabled" : true}
+  "mappings": {
+    "my_type": {
+      "_size": {
+        "enabled": true
+      }
     }
+  }
 }
---------------------------------------------------
+--------------------------
+// AUTOSENSE
+
+The value of the `_size` field is accessible in queries, aggregations, scripts,
+and when sorting:
+
+[source,js]
+--------------------------
+# Example documents
+PUT my_index/my_type/1
+{
+  "text": "This is a document"
+}
+
+PUT my_index/my_type/2
+{
+  "text": "This is another document"
+}
+
+GET my_index/_search
+{
+  "query": {
+    "range": {
+      "_size": { <1>
+        "gt": 10
+      }
+    }
+  },
+  "aggs": {
+    "Sizes": {
+      "terms": {
+        "field": "_size", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_size": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "Size": {
+      "script": "doc['_size']"  <4>
+    }
+  }
+}
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_size` field
+<2> Aggregating on the `_size` field
+<3> Sorting on the `_size` field
+<4> Accessing the `_size` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
+

+ 20 - 17
docs/reference/mapping/fields/source-field.asciidoc

@@ -1,12 +1,12 @@
 [[mapping-source-field]]
-=== `_source`
+=== `_source` field
 
-The `_source` field is an automatically generated field that stores the actual
-JSON that was used as the indexed document. It is not indexed (searchable),
-just stored. When executing "fetch" requests, like <<docs-get,get>> or
-<<search-search,search>>, the `_source` field is returned by default.
+The `_source` field contains the original JSON document body that was passed
+at index time.  The `_source` field itself is not indexed (and thus is not
+searchable), but it is stored so that it can be returned when executing
+_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
 
-==== Disabling source
+==== Disabling the `_source` field
 
 Though very handy to have around, the source field does incur storage overhead
 within the index. For this reason, it can be disabled as follows:
@@ -26,7 +26,7 @@ PUT tweets
 // AUTOSENSE
 
 [WARNING]
-.Think before disabling the source field
+.Think before disabling the `_source` field
 ==================================================
 
 Users often disable the `_source` field without thinking about the
@@ -46,10 +46,10 @@ available then a number of features are not supported:
 
 * Potentially in the future, the ability to repair index corruption
   automatically.
+==================================================
 
-If disk space is a concern, rather increase the
+TIP: If disk space is a concern, rather increase the
 <<index-codec,compression level>> instead of disabling the `_source`.
-==================================================
 
 .The metrics use case
 **************************************************
@@ -69,12 +69,20 @@ metrics case.
 
 
 [[include-exclude]]
-==== Including / Excluding fields from source
+==== Including / Excluding fields from `_source`
 
 An expert-only feature is the ability to prune the contents of the `_source`
 field after the document has been indexed, but before the `_source` field is
-stored.  The `includes`/`excludes` parameters (which also accept wildcards)
-can be used as follows:
+stored.
+
+WARNING: Removing fields from the `_source` has similar downsides to disabling
+`_source`, especially the fact that you cannot reindex documents from one
+Elasticsearch index to another. Consider using
+<<search-request-source-filtering,source filtering>> or a
+<<mapping-transform,transform script>> instead.
+
+The `includes`/`excludes` parameters (which also accept wildcards) can be used
+as follows:
 
 [source,js]
 --------------------------------------------------
@@ -126,8 +134,3 @@ GET logs/event/_search
 <1> These fields will be removed from the stored `_source` field.
 <2> We can still search on this field, even though it is not in the stored `_source`.
 
-WARNING: Removing fields from the `_source` has similar downsides to disabling
-`_source`, especially the fact that you cannot reindex documents from one
-Elasticsearch index to another. Consider using
-<<search-request-source-filtering,source filtering>> or a
-<<mapping-transform,transform script>> instead.

+ 71 - 67
docs/reference/mapping/fields/timestamp-field.asciidoc

@@ -1,90 +1,94 @@
 [[mapping-timestamp-field]]
-=== `_timestamp`
+=== `_timestamp` field
 
-The `_timestamp` field allows to automatically index the timestamp of a
-document. If it is not provided it will be automatically set
-to a <<mapping-timestamp-field-default,default date>>.
-
-[float]
-==== enabled
-
-By default it is disabled. In order to enable it, the following mapping
-should be defined:
+The `_timestamp` field, when enabled, allows a timestamp to be indexed and
+stored with a document. The timestamp may be specified manually, generated
+automatically, or set to a default value:
 
 [source,js]
---------------------------------------------------
+------------------------------------
+PUT my_index
 {
-    "tweet" : {
-        "_timestamp" : { "enabled" : true }
+  "mappings": {
+    "my_type": {
+      "_timestamp": { <1>
+        "enabled": true
+      }
     }
+  }
 }
---------------------------------------------------
 
-[float]
-[[mapping-timestamp-field-format]]
-==== format
+PUT my_index/my_type/1?timestamp=2015-01-01 <2>
+{ "text": "Timestamp as a formatted date" }
 
-You can define the <<mapping-date-format,date
-format>> used to parse the provided timestamp value. For example:
+PUT my_index/my_type/2?timestamp=1420070400000 <3>
+{ "text": "Timestamp as milliseconds since the epoch" }
 
-[source,js]
---------------------------------------------------
-{
-    "tweet" : {
-        "_timestamp" : {
-            "enabled" : true,
-            "path" : "post_date",
-            "format" : "YYYY-MM-dd"
-        }
-    }
-}
---------------------------------------------------
+PUT my_index/my_type/3 <4>
+{ "text": "Autogenerated timestamp set to now()" }
 
-Note, the default format is `epoch_millis||strictDateOptionalTime`. The timestamp value will
-first be parsed as a number and if it fails the format will be tried.
+------------------------------------
+// AUTOSENSE
 
-[float]
-[[mapping-timestamp-field-default]]
-==== default
+<1> Enable the `_timestamp` field with default settings.
+<2> Set the timestamp manually with a formatted date.
+<3> Set the timestamp with milliseconds since the epoch.
+<4> Auto-generates a timestamp with <<date-math,now()>>.
 
-You can define a default value for when timestamp is not provided
-within the index request or in the `_source` document.
+The behaviour of the `_timestamp` field can be configured with the following parameters:
 
-By default, the default value is `now` which means the date the document was processed by the indexing chain.
+`default`::
 
-You can reject documents which do not provide a `timestamp` value by setting `ignore_missing` to false (default to `true`):
+    A default value to be used if none is provided.  Defaults to <<date-math,now()>>.
 
-[source,js]
---------------------------------------------------
-{
-    "tweet" : {
-        "_timestamp" : {
-            "enabled" : true,
-            "ignore_missing" : false
-        }
-    }
-}
---------------------------------------------------
+`format`::
 
-You can also set the default value to any date respecting <<mapping-timestamp-field-format,timestamp format>>:
+    The <<mapping-date-format,date format>> (or formats) to use when parsing timestamps.  Defaults to `epoch_millis||strictDateOptionalTime`.
+
+`ignore_missing`::
+
+    If `true` (default), replace missing timestamps with the `default` value.  If `false`, throw an exception.
+
+
+The value of the `_timestamp` field is accessible in queries, aggregations, scripts,
+and when sorting:
 
 [source,js]
---------------------------------------------------
+--------------------------
+GET my_index/_search
 {
-    "tweet" : {
-        "_timestamp" : {
-            "enabled" : true,
-            "format" : "YYYY-MM-dd",
-            "default" : "1970-01-01"
-        }
+  "query": {
+    "range": {
+      "_timestamp": { <1>
+        "gte": "2015-01-01"
+      }
+    }
+  },
+  "aggs": {
+    "Timestamps": {
+      "terms": {
+        "field": "_timestamp", <2>
+        "size": 10
+      }
     }
+  },
+  "sort": [
+    {
+      "_timestamp": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "Timestamp": {
+      "script": "doc['_timestamp']" <4>
+    }
+  }
 }
---------------------------------------------------
-
-If you don't provide any timestamp value, _timestamp will be set to this default value.
-
-In elasticsearch 1.4, we allowed setting explicitly `"default":null` which is not possible anymore
-as we added a new `ignore_missing` setting.
-When reading an index created with elasticsearch 1.4 and using this, we automatically update it by
-removing `"default": null` and setting `"ignore_missing": false`
+--------------------------
+// AUTOSENSE
 
+<1> Querying on the `_timestamp` field
+<2> Aggregating on the `_timestamp` field
+<3> Sorting on the `_timestamp` field
+<4> Accessing the `_timestamp` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

+ 75 - 36
docs/reference/mapping/fields/ttl-field.asciidoc

@@ -1,67 +1,106 @@
 [[mapping-ttl-field]]
 === `_ttl`
 
-A lot of documents naturally come with an expiration date. Documents can
-therefore have a `_ttl` (time to live), which will cause the expired
-documents to be deleted automatically.
+Some types of documents, such as session data or special offers, come with an
+expiration date. The `_ttl` field allows you to specify the minimum time a
+document should live, after which time the document is deleted automatically.
 
-`_ttl` accepts two parameters which are described below, every other setting will be silently ignored.
+[TIP]
+.Prefer index-per-timeframe to TTL
+======================================================
 
-[float]
-==== enabled
+With TTL , expired documents first have to be marked as deleted then later
+purged from the index when segments are merged.  For append-only time-based
+data such as log events, it is much more efficient to use an index-per-day /
+week / month instead of TTLs.  Old log data can be removed by simply deleting
+old indices.
 
-By default it is disabled, in order to enable it, the following mapping
-should be defined:
+======================================================
+
+The `_ttl` field may be enabled as follows:
 
 [source,js]
---------------------------------------------------
+-------------------------------
+PUT my_index
 {
-    "tweet" : {
-        "_ttl" : { "enabled" : true }
+  "mappings": {
+    "my_type": {
+      "_ttl": {
+        "enabled": true
+      }
     }
+  }
+}
+
+PUT my_index/my_type/1?ttl=10m <1>
+{
+  "text": "Will expire in 10 minutes"
+}
+
+PUT my_index/my_type/2 <2>
+{
+  "text": "Will not expire"
 }
---------------------------------------------------
+-------------------------------
+// AUTOSENSE
+<1> This document will expire 10 minutes after being indexed.
+<2> This document has no TTL set and will not expire.
 
-`_ttl` can only be enabled once and never be disabled again.
+The expiry time is calculated as the value of the
+<<mapping-timestamp-field,`_timestamp`>> field (or `now()` if the `_timestamp`
+is not enabled) plus the `ttl` specified in the indexing request.
 
-[float]
-==== default
+==== Default TTL
 
-You can provide a per index/type default `_ttl` value as follows:
+You can provide a default `_ttl`, which will be applied to indexing requests where the `ttl` is not specified:
 
 [source,js]
---------------------------------------------------
+-------------------------------
+PUT my_index
 {
-    "tweet" : {
-        "_ttl" : { "enabled" : true, "default" : "1d" }
+  "mappings": {
+    "my_type": {
+      "_ttl": {
+        "enabled": true,
+        "defaut": "5m"
+      }
     }
+  }
 }
---------------------------------------------------
 
-In this case, if you don't provide a `_ttl` value in your query or in
-the `_source` all tweets will have a `_ttl` of one day.
+PUT my_index/my_type/1?ttl=10m <1>
+{
+  "text": "Will expire in 10 minutes"
+}
 
-In case you do not specify a time unit like `d` (days), `m` (minutes),
-`h` (hours), `ms` (milliseconds) or `w` (weeks), milliseconds is used as
-default unit.
+PUT my_index/my_type/2 <2>
+{
+  "text": "Will expire in 5 minutes"
+}
+-------------------------------
+// AUTOSENSE
+<1> This document will expire 10 minutes after being indexed.
+<2> This document has no TTL set and so will expire after the default 5 minutes.
 
-If no `default` is set and no `_ttl` value is given then the document
-has an infinite `_ttl` and will not expire.
+The `default` value can use <<time-units,time units>> like `d` for days, and
+will use `ms` as the default unit if no time unit is provided.
 
 You can dynamically update the `default` value using the put mapping
 API. It won't change the `_ttl` of already indexed documents but will be
 used for future documents.
 
-[float]
 ==== Note on documents expiration
 
-Expired documents will be automatically deleted regularly. You can
-dynamically set the `indices.ttl.interval` to fit your needs. The
-default value is `60s`.
+Expired documents will be automatically deleted periodoically. The following
+settings control the expiry process:
+
+`indices.ttl.interval`::
+
+How often the purge process should run. Defaults to `60s`. Expired documents
+may still be retrieved before they are purged.
+
+`indices.ttl.bulk_size`::
 
-The deletion orders are processed by bulk. You can set
-`indices.ttl.bulk_size` to fit your needs. The default value is `10000`.
+How many deletions are handled by a single <<docs-bulk,`bulk`>> request. The
+default value is `10000`.
 
-Note that the expiration procedure handle versioning properly so if a
-document is updated between the collection of documents to expire and
-the delete order, the document won't be deleted.

+ 58 - 5
docs/reference/mapping/fields/type-field.asciidoc

@@ -1,7 +1,60 @@
 [[mapping-type-field]]
-=== `_type`
+=== `_type` field
+
+Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
+<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>.  The
+`_type` field is indexed in order to make searching by type name fast.
+
+The value of the `_type` field is accessible in queries, aggregations,
+scripts, and when sorting:
+
+[source,js]
+--------------------------
+# Example documents
+PUT my_index/type_1/1
+{
+  "text": "Document with type 1"
+}
+
+PUT my_index/type_2/2
+{
+  "text": "Document with type 2"
+}
+
+GET my_index/_search/type_*
+{
+  "query": {
+    "terms": {
+      "_type": [ "type_1", "type_2" ] <1>
+    }
+  },
+  "aggs": {
+    "types": {
+      "terms": {
+        "field": "_type", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_type": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "type": {
+      "script": "doc['_type']" <4>
+    }
+  }
+}
+
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_type` field
+<2> Aggregating on the `_type` field
+<3> Sorting on the `_type` field
+<4> Accessing the `_type` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
 
-Each document indexed is associated with an id and a type. The `_type`
-field allows accessing only the type of a document. It is indexed
-to allow quickly filtering on type, for example, when performing
-a search request on a single or multiple types.

+ 56 - 7
docs/reference/mapping/fields/uid-field.asciidoc

@@ -1,10 +1,59 @@
 [[mapping-uid-field]]
-=== `_uid`
+=== `_uid` field
 
-Each document indexed is associated with an id and a type, the internal
-`_uid` field is the unique identifier of a document within an index and
-is composed of the type and the id (meaning that different types can
-have the same id and still maintain uniqueness).
+Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
+<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>.  These
+values are combined as `{type}#{id}` and indexed as the `_uid` field.
+
+The value of the `_uid` field is accessible in queries, aggregations, scripts,
+and when sorting:
+
+[source,js]
+--------------------------
+# Example documents
+PUT my_index/my_type/1
+{
+  "text": "Document with ID 1"
+}
+
+PUT my_index/my_type/2
+{
+  "text": "Document with ID 2"
+}
+
+GET my_index/_search
+{
+  "query": {
+    "terms": {
+      "_uid": [ "my_type#1", "my_type#2" ] <1>
+    }
+  },
+  "aggs": {
+    "UIDs": {
+      "terms": {
+        "field": "_uid", <2>
+        "size": 10
+      }
+    }
+  },
+  "sort": [
+    {
+      "_uid": { <3>
+        "order": "desc"
+      }
+    }
+  ],
+  "script_fields": {
+    "UID": {
+      "script": "doc['_uid']" <4>
+    }
+  }
+}
+--------------------------
+// AUTOSENSE
+
+<1> Querying on the `_uid` field (also see the <<query-dsl-ids-query,`ids` query>>)
+<2> Aggregating on the `_uid` field
+<3> Sorting on the `_uid` field
+<4> Accessing the `_uid` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
 
-The `_uid` field is for type based filtering, as well as for
-lookups of `_id` and `_type`.