2 years ago · f2dfbfe8c4
--- a/docs/reference/mapping/types.asciidoc
+++ b/docs/reference/mapping/types.asciidoc
@@ -83,6 +83,7 @@ as-you-type completion.
 
				 ==== Document ranking types
			
 
				 
			
 
				 <<dense-vector,`dense_vector`>>::   Records dense vectors of float values.
			
 
				+<<sparse-vector,`sparse_vector`>>:: Records sparse vectors of float values.
			
 
				 <<rank-feature,`rank_feature`>>::   Records a numeric feature to boost hits at
			
 
				                                     query time.
			
 
				 <<rank-features,`rank_features`>>:: Records numeric features to boost hits at
			
@@ -179,6 +180,8 @@ include::types/search-as-you-type.asciidoc[]
 
				 
			
 
				 include::types/shape.asciidoc[]
			
 
				 
			
 
				+include::types/sparse-vector.asciidoc[]
			
 
				+
			
 
				 include::types/text.asciidoc[]
			
 
				 
			
 
				 include::types/token-count.asciidoc[]
			
--- a/docs/reference/mapping/types/sparse-vector.asciidoc
+++ b/docs/reference/mapping/types/sparse-vector.asciidoc
@@ -0,0 +1,36 @@
 
				+[[sparse-vector]]
			
 
				+=== Sparse vector field type
			
 
				+++++
			
 
				+<titleabbrev>Sparse vector</titleabbrev>
			
 
				+++++
			
 
				+
			
 
				+A `sparse_vector` field can index features and weights so that they can later be used to query
			
 
				+documents in queries with a <<query-dsl-text-expansion-query,`text_expansion`>> query.
			
 
				+
			
 
				+`sparse_vector` is the field type that should be used with <<elser-mappings, ELSER mappings>>.
			
 
				+
			
 
				+[source,console]
			
 
				+--------------------------------------------------
			
 
				+PUT my-index
			
 
				+{
			
 
				+  "mappings": {
			
 
				+    "properties": {
			
 
				+      "text.tokens": {
			
 
				+        "type": "sparse_vector"
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+
			
 
				+See <<semantic-search-elser, semantic search with ELSER>> for a complete example on adding documents
			
 
				+ to a `sparse_vector` mapped field using ELSER.
			
 
				+
			
 
				+NOTE: `sparse_vector` fields only support single-valued fields and strictly positive
			
 
				+values. Multi-valued fields and negative values will be rejected.
			
 
				+
			
 
				+NOTE: `sparse_vector` fields do not support querying, sorting or aggregating. They may
			
 
				+only be used within <<query-dsl-text-expansion-query,`text_expansion`>> queries.
			
 
				+
			
 
				+NOTE: `sparse_vector` fields only preserve 9 significant bits for the precision, which
			
 
				+translates to a relative error of about 0.4%.
			
--- a/docs/reference/query-dsl/text-expansion-query.asciidoc
+++ b/docs/reference/query-dsl/text-expansion-query.asciidoc
@@ -4,9 +4,9 @@
 
				 <titleabbrev>Text expansion</titleabbrev>
			
 
				 ++++
			
 
				 
			
 
				-The text expansion query uses a {nlp} model to convert the query text into a 
			
 
				-list of token-weight pairs which are then used in a query against a 
			
 
				-<<rank-features,rank features field>>.
			
 
				+The text expansion query uses a {nlp} model to convert the query text into a
			
 
				+list of token-weight pairs which are then used in a query against a
			
 
				+<<sparse-vector,sparse vector>> or <<rank-features,rank features>> field.
			
 
				 
			
 
				 [discrete]
			
 
				 [[text-expansion-query-ex-request]]
			
@@ -19,7 +19,7 @@ GET _search
 
				 {
			
 
				    "query":{
			
 
				       "text_expansion":{
			
 
				-         "<rank_features_field>":{
			
 
				+         "<sparse_vector_field>":{
			
 
				             "model_id":"the model to produce the token weights",
			
 
				             "model_text":"the query string"
			
 
				          }
			
@@ -33,33 +33,33 @@ GET _search
 
				 [[text-expansion-query-params]]
			
 
				 === Top level parameters for `text_expansion`
			
 
				 
			
 
				-`<rank_features_field>`:::
			
 
				+`<sparse_vector_field>`:::
			
 
				 (Required, object)
			
 
				-The name of the field that contains the token-weight pairs the NLP model created 
			
 
				+The name of the field that contains the token-weight pairs the NLP model created
			
 
				 based on the input text.
			
 
				 
			
 
				 [discrete]
			
 
				 [[text-expansion-rank-feature-field-params]]
			
 
				-=== Top level parameters for `<rank_features_field>`
			
 
				+=== Top level parameters for `<sparse_vector_field>`
			
 
				 
			
 
				 `model_id`::::
			
 
				 (Required, string)
			
 
				-The ID of the model to use to convert the query text into token-weight pairs. It 
			
 
				-must be the same model ID that was used to create the tokens from the input 
			
 
				+The ID of the model to use to convert the query text into token-weight pairs. It
			
 
				+must be the same model ID that was used to create the tokens from the input
			
 
				 text.
			
 
				 
			
 
				 `model_text`::::
			
 
				 (Required, string)
			
 
				-The query text you want to use for search. 
			
 
				+The query text you want to use for search.
			
 
				 
			
 
				 
			
 
				 [discrete]
			
 
				 [[text-expansion-query-example]]
			
 
				 === Example
			
 
				 
			
 
				-The following is an example of the `text_expansion` query that references the 
			
 
				-ELSER model to perform semantic search. For a more detailed description of how 
			
 
				-to perform semantic search by using ELSER and the `text_expansion` query, refer 
			
 
				+The following is an example of the `text_expansion` query that references the
			
 
				+ELSER model to perform semantic search. For a more detailed description of how
			
 
				+to perform semantic search by using ELSER and the `text_expansion` query, refer
			
 
				 to <<semantic-search-elser,this tutorial>>.
			
 
				 
			
 
				 [source,console]
			
@@ -82,25 +82,25 @@ GET my-index/_search
 
				 [[optimizing-text-expansion]]
			
 
				 === Optimizing the search performance of the text_expansion query
			
 
				 
			
 
				-https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Max WAND] 
			
 
				-is an optimization technique used by {es} to skip documents that cannot score 
			
 
				-competitively against the current best matching documents. However, the tokens 
			
 
				-generated by the ELSER model don't work well with the Max WAND optimization. 
			
 
				-Consequently, enabling Max WAND can actually increase query latency for 
			
 
				-`text_expansion`. For datasets of a significant size, disabling Max 
			
 
				+https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Max WAND]
			
 
				+is an optimization technique used by {es} to skip documents that cannot score
			
 
				+competitively against the current best matching documents. However, the tokens
			
 
				+generated by the ELSER model don't work well with the Max WAND optimization.
			
 
				+Consequently, enabling Max WAND can actually increase query latency for
			
 
				+`text_expansion`. For datasets of a significant size, disabling Max
			
 
				 WAND leads to lower query latencies.
			
 
				 
			
 
				 Max WAND is controlled by the
			
 
				-<<track-total-hits, track_total_hits>> query parameter. Setting track_total_hits 
			
 
				-to true forces {es} to consider all documents, resulting in lower query 
			
 
				-latencies for the `text_expansion` query. However, other {es} queries run slower 
			
 
				+<<track-total-hits, track_total_hits>> query parameter. Setting track_total_hits
			
 
				+to true forces {es} to consider all documents, resulting in lower query
			
 
				+latencies for the `text_expansion` query. However, other {es} queries run slower
			
 
				 when Max WAND is disabled.
			
 
				 
			
 
				-If you are combining the `text_expansion` query with standard text queries in a 
			
 
				-compound search, it is recommended to measure the query performance before 
			
 
				+If you are combining the `text_expansion` query with standard text queries in a
			
 
				+compound search, it is recommended to measure the query performance before
			
 
				 deciding which setting to use.
			
 
				 
			
 
				-NOTE: The `track_total_hits` option applies to all queries in the search request 
			
 
				-and may be optimal for some queries but not for others. Take into account the 
			
 
				-characteristics of all your queries to determine the most suitable 
			
 
				+NOTE: The `track_total_hits` option applies to all queries in the search request
			
 
				+and may be optimal for some queries but not for others. Take into account the
			
 
				+characteristics of all your queries to determine the most suitable
			
 
				 configuration.
			
--- a/docs/reference/search/search-your-data/semantic-search-elser.asciidoc
+++ b/docs/reference/search/search-your-data/semantic-search-elser.asciidoc
@@ -4,18 +4,18 @@
 
				 <titleabbrev>Semantic search with ELSER</titleabbrev>
			
 
				 ++++
			
 
				 
			
 
				-Elastic Learned Sparse EncodeR - or ELSER - is an NLP model trained by Elastic 
			
 
				-that enables you to perform semantic search by using sparse vector 
			
 
				-representation. Instead of literal matching on search terms, semantic search 
			
 
				-retrieves results based on the intent and the contextual meaning of a search 
			
 
				+Elastic Learned Sparse EncodeR - or ELSER - is an NLP model trained by Elastic
			
 
				+that enables you to perform semantic search by using sparse vector
			
 
				+representation. Instead of literal matching on search terms, semantic search
			
 
				+retrieves results based on the intent and the contextual meaning of a search
			
 
				 query.
			
 
				 
			
 
				-The instructions in this tutorial shows you how to use ELSER to perform semantic 
			
 
				+The instructions in this tutorial shows you how to use ELSER to perform semantic
			
 
				 search on your data.
			
 
				 
			
 
				-NOTE: Only the first 512 extracted tokens per field are considered during 
			
 
				-semantic search with ELSER. Refer to 
			
 
				-{ml-docs}/ml-nlp-limitations.html#ml-nlp-elser-v1-limit-512[this page] for more 
			
 
				+NOTE: Only the first 512 extracted tokens per field are considered during
			
 
				+semantic search with ELSER. Refer to
			
 
				+{ml-docs}/ml-nlp-limitations.html#ml-nlp-elser-v1-limit-512[this page] for more
			
 
				 information.
			
 
				 
			
 
				 
			
@@ -23,18 +23,18 @@ information.
 
				 [[requirements]]
			
 
				 ==== Requirements
			
 
				 
			
 
				-To perform semantic search by using ELSER, you must have the NLP model deployed 
			
 
				-in your cluster. Refer to the 
			
 
				-{ml-docs}/ml-nlp-elser.html[ELSER documentation] to learn how to download and 
			
 
				+To perform semantic search by using ELSER, you must have the NLP model deployed
			
 
				+in your cluster. Refer to the
			
 
				+{ml-docs}/ml-nlp-elser.html[ELSER documentation] to learn how to download and
			
 
				 deploy the model.
			
 
				 
			
 
				-NOTE: The minimum dedicated ML node size for deploying and using the ELSER model 
			
 
				-is 4 GB in Elasticsearch Service if 
			
 
				-{cloud}/ec-autoscaling.html[deployment autoscaling] is turned off. Turning on 
			
 
				-autoscaling is recommended because it allows your deployment to dynamically 
			
 
				-adjust resources based on demand. Better performance can be achieved by using 
			
 
				-more allocations or more threads per allocation, which requires bigger ML nodes. 
			
 
				-Autoscaling provides bigger nodes when required. If autoscaling is turned off, 
			
 
				+NOTE: The minimum dedicated ML node size for deploying and using the ELSER model
			
 
				+is 4 GB in Elasticsearch Service if
			
 
				+{cloud}/ec-autoscaling.html[deployment autoscaling] is turned off. Turning on
			
 
				+autoscaling is recommended because it allows your deployment to dynamically
			
 
				+adjust resources based on demand. Better performance can be achieved by using
			
 
				+more allocations or more threads per allocation, which requires bigger ML nodes.
			
 
				+Autoscaling provides bigger nodes when required. If autoscaling is turned off,
			
 
				 you must provide suitably sized nodes yourself.
			
 
				 
			
 
				 
			
@@ -42,17 +42,17 @@ you must provide suitably sized nodes yourself.
 
				 [[elser-mappings]]
			
 
				 ==== Create the index mapping
			
 
				 
			
 
				-First, the mapping of the destination index - the index that contains the tokens 
			
 
				-that the model created based on your text - must be created.  The destination 
			
 
				-index must have a field with the 
			
 
				-<<rank-features, `sparse_vector` or `rank_features`>> field type to index the 
			
 
				+First, the mapping of the destination index - the index that contains the tokens
			
 
				+that the model created based on your text - must be created.  The destination
			
 
				+index must have a field with the
			
 
				+<<sparse-vector, `sparse_vector`>> or <<rank-features,`rank_features`>> field type to index the
			
 
				 ELSER output.
			
 
				 
			
 
				-NOTE: ELSER output must be ingested into a field with the `sparse_vector` or 
			
 
				-`rank_features` field type. Otherwise, {es} interprets the token-weight pairs as 
			
 
				-a massive amount of fields in a document. If you get an error similar to this 
			
 
				-`"Limit of total fields [1000] has been exceeded while adding new fields"` then 
			
 
				-the ELSER output field is not mapped properly and it has a field type different 
			
 
				+NOTE: ELSER output must be ingested into a field with the `sparse_vector` or
			
 
				+`rank_features` field type. Otherwise, {es} interprets the token-weight pairs as
			
 
				+a massive amount of fields in a document. If you get an error similar to this
			
 
				+`"Limit of total fields [1000] has been exceeded while adding new fields"` then
			
 
				+the ELSER output field is not mapped properly and it has a field type different
			
 
				 than `sparse_vector` or `rank_features`.
			
 
				 
			
 
				 [source,console]
			
@@ -74,19 +74,19 @@ PUT my-index
 
				 // TEST[skip:TBD]
			
 
				 <1> The name of the field to contain the generated tokens.
			
 
				 <2> The field to contain the tokens is a `sparse_vector` field.
			
 
				-<3> The name of the field from which to create the sparse vector representation. 
			
 
				+<3> The name of the field from which to create the sparse vector representation.
			
 
				 In this example, the name of the field is `text`.
			
 
				 <4> The field type which is text in this example.
			
 
				 
			
 
				-To learn how to optimize space, refer to the <<save-space>> section. 
			
 
				+To learn how to optimize space, refer to the <<save-space>> section.
			
 
				 
			
 
				 
			
 
				 [discrete]
			
 
				 [[inference-ingest-pipeline]]
			
 
				 ==== Create an ingest pipeline with an inference processor
			
 
				 
			
 
				-Create an <<ingest,ingest pipeline>> with an 
			
 
				-<<inference-processor,{infer} processor>> to use ELSER to infer against the data 
			
 
				+Create an <<ingest,ingest pipeline>> with an
			
 
				+<<inference-processor,{infer} processor>> to use ELSER to infer against the data
			
 
				 that is being ingested in the pipeline.
			
 
				 
			
 
				 [source,console]
			
@@ -112,10 +112,10 @@ PUT _ingest/pipeline/elser-v2-test
 
				 }
			
 
				 ----
			
 
				 // TEST[skip:TBD]
			
 
				-<1> The `field_map` object maps the input document field name (which is `text` 
			
 
				-in this example) to the name of the field that the model expects (which is 
			
 
				+<1> The `field_map` object maps the input document field name (which is `text`
			
 
				+in this example) to the name of the field that the model expects (which is
			
 
				 always `text_field`).
			
 
				-<2> The `text_expansion` inference type needs to be used in the {infer} ingest 
			
 
				+<2> The `text_expansion` inference type needs to be used in the {infer} ingest
			
 
				 processor.
			
 
				 
			
 
				 
			
@@ -123,19 +123,19 @@ processor.
 
				 [[load-data]]
			
 
				 ==== Load data
			
 
				 
			
 
				-In this step, you load the data that you later use in the {infer} ingest 
			
 
				+In this step, you load the data that you later use in the {infer} ingest
			
 
				 pipeline to extract tokens from it.
			
 
				 
			
 
				-Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS 
			
 
				-MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by 
			
 
				-a list of relevant text passages. All unique passages, along with their IDs, 
			
 
				-have been extracted from that data set and compiled into a 
			
 
				+Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS
			
 
				+MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by
			
 
				+a list of relevant text passages. All unique passages, along with their IDs,
			
 
				+have been extracted from that data set and compiled into a
			
 
				 https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv[tsv file].
			
 
				 
			
 
				-Download the file and upload it to your cluster using the 
			
 
				-{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[Data Visualizer] 
			
 
				-in the {ml-app} UI. Assign the name `id` to the first column and `text` to the 
			
 
				-second column. The index name is `test-data`. Once the upload is complete, you 
			
 
				+Download the file and upload it to your cluster using the
			
 
				+{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[Data Visualizer]
			
 
				+in the {ml-app} UI. Assign the name `id` to the first column and `text` to the
			
 
				+second column. The index name is `test-data`. Once the upload is complete, you
			
 
				 can see an index named `test-data` with 182469 documents.
			
 
				 
			
 
				 
			
@@ -143,7 +143,7 @@ can see an index named `test-data` with 182469 documents.
 
				 [[reindexing-data-elser]]
			
 
				 ==== Ingest the data through the {infer} ingest pipeline
			
 
				 
			
 
				-Create the tokens from the text by reindexing the data throught the {infer} 
			
 
				+Create the tokens from the text by reindexing the data throught the {infer}
			
 
				 pipeline that uses ELSER as the inference model.
			
 
				 
			
 
				 [source,console]
			
@@ -161,8 +161,8 @@ POST _reindex?wait_for_completion=false
 
				 }
			
 
				 ----
			
 
				 // TEST[skip:TBD]
			
 
				-<1> The default batch size for reindexing is 1000. Reducing `size` to a smaller 
			
 
				-number makes the update of the reindexing process quicker which enables you to 
			
 
				+<1> The default batch size for reindexing is 1000. Reducing `size` to a smaller
			
 
				+number makes the update of the reindexing process quicker which enables you to
			
 
				 follow the progress closely and detect errors early.
			
 
				 
			
 
				 The call returns a task ID to monitor the progress:
			
@@ -173,7 +173,7 @@ GET _tasks/<task_id>
 
				 ----
			
 
				 // TEST[skip:TBD]
			
 
				 
			
 
				-You can also open the Trained Models UI, select the Pipelines tab under ELSER to 
			
 
				+You can also open the Trained Models UI, select the Pipelines tab under ELSER to
			
 
				 follow the progress.
			
 
				 
			
 
				 
			
@@ -181,9 +181,9 @@ follow the progress.
 
				 [[text-expansion-query]]
			
 
				 ==== Semantic search by using the `text_expansion` query
			
 
				 
			
 
				-To perform semantic search, use the `text_expansion` query, and provide the 
			
 
				-query text and the ELSER model ID. The example below uses the query text "How to 
			
 
				-avoid muscle soreness after running?", the `ml.tokens` field contains the 
			
 
				+To perform semantic search, use the `text_expansion` query, and provide the
			
 
				+query text and the ELSER model ID. The example below uses the query text "How to
			
 
				+avoid muscle soreness after running?", the `ml.tokens` field contains the
			
 
				 generated ELSER output:
			
 
				 
			
 
				 [source,console]
			
@@ -202,9 +202,9 @@ GET my-index/_search
 
				 ----
			
 
				 // TEST[skip:TBD]
			
 
				 
			
 
				-The result is the top 10 documents that are closest in meaning to your query 
			
 
				-text from the `my-index` index sorted by their relevancy. The result also 
			
 
				-contains the extracted tokens for each of the relevant search results with their 
			
 
				+The result is the top 10 documents that are closest in meaning to your query
			
 
				+text from the `my-index` index sorted by their relevancy. The result also
			
 
				+contains the extracted tokens for each of the relevant search results with their
			
 
				 weights.
			
 
				 
			
 
				 [source,consol-result]
			
@@ -246,7 +246,7 @@ weights.
 
				 ----
			
 
				 // NOTCONSOLE
			
 
				 
			
 
				-To learn about optimizing your `text_expansion` query, refer to 
			
 
				+To learn about optimizing your `text_expansion` query, refer to
			
 
				 <<optimizing-text-expansion>>.
			
 
				 
			
 
				 
			
@@ -254,16 +254,16 @@ To learn about optimizing your `text_expansion` query, refer to
 
				 [[text-expansion-compound-query]]
			
 
				 ==== Combining semantic search with other queries
			
 
				 
			
 
				-You can combine `text_expansion` with other queries in a 
			
 
				-<<compound-queries,compound query>>. For example using a filter clause in a 
			
 
				-<<query-dsl-bool-query>> or a full text query which may or may not use the same 
			
 
				-query text as the `text_expansion` query. This enables you to combine the search 
			
 
				+You can combine `text_expansion` with other queries in a
			
 
				+<<compound-queries,compound query>>. For example using a filter clause in a
			
 
				+<<query-dsl-bool-query>> or a full text query which may or may not use the same
			
 
				+query text as the `text_expansion` query. This enables you to combine the search
			
 
				 results from both queries.
			
 
				 
			
 
				-The search hits from the `text_expansion` query tend to score higher than other 
			
 
				-{es} queries. Those scores can be regularized by increasing or decreasing the 
			
 
				-relevance scores of each query by using the `boost` parameter. Recall on the 
			
 
				-`text_expansion` query can be high where there is a long tail of less relevant 
			
 
				+The search hits from the `text_expansion` query tend to score higher than other
			
 
				+{es} queries. Those scores can be regularized by increasing or decreasing the
			
 
				+relevance scores of each query by using the `boost` parameter. Recall on the
			
 
				+`text_expansion` query can be high where there is a long tail of less relevant
			
 
				 results. Use the `min_score` parameter to prune those less relevant documents.
			
 
				 
			
 
				 [source,console]
			
@@ -274,7 +274,7 @@ GET my-index/_search
 
				     "bool": { <1>
			
 
				       "should": [
			
 
				         {
			
 
				-          "text_expansion": { 
			
 
				+          "text_expansion": {
			
 
				             "ml.tokens": {
			
 
				               "model_text": "How to avoid muscle soreness after running?",
			
 
				               "model_id": ".elser_model_2",
			
@@ -295,13 +295,13 @@ GET my-index/_search
 
				 }
			
 
				 ----
			
 
				 // TEST[skip:TBD]
			
 
				-<1> Both the `text_expansion` and the `query_string` queries are in a `should` 
			
 
				+<1> Both the `text_expansion` and the `query_string` queries are in a `should`
			
 
				 clause of a `bool` query.
			
 
				-<2> The `boost` value is `1` for the `text_expansion` query which is the default 
			
 
				-value. This means that the relevance score of the results of this query are not 
			
 
				+<2> The `boost` value is `1` for the `text_expansion` query which is the default
			
 
				+value. This means that the relevance score of the results of this query are not
			
 
				 boosted.
			
 
				-<3> The `boost` value is `4` for the `query_string` query. The relevance score 
			
 
				-of the results of this query is increased causing them to rank higher in the 
			
 
				+<3> The `boost` value is `4` for the `query_string` query. The relevance score
			
 
				+of the results of this query is increased causing them to rank higher in the
			
 
				 search results.
			
 
				 <4> Only the results with a score equal to or higher than `10` are displayed.
			
 
				 
			
@@ -314,22 +314,22 @@ search results.
 
				 [[save-space]]
			
 
				 ==== Saving disk space by excluding the ELSER tokens from document source
			
 
				 
			
 
				-The tokens generated by ELSER must be indexed for use in the 
			
 
				-<<query-dsl-text-expansion-query, text_expansion query>>. However, it is not 
			
 
				-necessary to retain those terms in the document source. You can save disk space 
			
 
				-by using the <<include-exclude,source exclude>> mapping to remove the ELSER 
			
 
				-terms from the document source. 
			
 
				-
			
 
				-WARNING: Reindex uses the document source to populate the destination index. 
			
 
				-Once the ELSER terms have been excluded from the source, they cannot be 
			
 
				-recovered through reindexing. Excluding the tokens from the source is a 
			
 
				-space-saving optimsation that should only be applied if you are certain that 
			
 
				-reindexing will not be required in the future! It's important to carefully 
			
 
				-consider this trade-off and make sure that excluding the ELSER terms from the 
			
 
				+The tokens generated by ELSER must be indexed for use in the
			
 
				+<<query-dsl-text-expansion-query, text_expansion query>>. However, it is not
			
 
				+necessary to retain those terms in the document source. You can save disk space
			
 
				+by using the <<include-exclude,source exclude>> mapping to remove the ELSER
			
 
				+terms from the document source.
			
 
				+
			
 
				+WARNING: Reindex uses the document source to populate the destination index.
			
 
				+Once the ELSER terms have been excluded from the source, they cannot be
			
 
				+recovered through reindexing. Excluding the tokens from the source is a
			
 
				+space-saving optimsation that should only be applied if you are certain that
			
 
				+reindexing will not be required in the future! It's important to carefully
			
 
				+consider this trade-off and make sure that excluding the ELSER terms from the
			
 
				 source aligns with your specific requirements and use case.
			
 
				 
			
 
				-The mapping that excludes `ml.tokens` from the  `_source` field can be created 
			
 
				-by the following API call: 
			
 
				+The mapping that excludes `ml.tokens` from the  `_source` field can be created
			
 
				+by the following API call:
			
 
				 
			
 
				 [source,console]
			
 
				 ----
			
@@ -343,10 +343,10 @@ PUT my-index
 
				     },
			
 
				     "properties": {
			
 
				       "ml.tokens": {
			
 
				-        "type": "sparse_vector" 
			
 
				+        "type": "sparse_vector"
			
 
				       },
			
 
				-      "text": { 
			
 
				-        "type": "text" 
			
 
				+      "text": {
			
 
				+        "type": "text"
			
 
				       }
			
 
				     }
			
 
				   }
			
--- a/docs/reference/tab-widgets/semantic-search/field-mappings.asciidoc
+++ b/docs/reference/tab-widgets/semantic-search/field-mappings.asciidoc
@@ -1,14 +1,14 @@
 
				 // tag::elser[]
			
 
				 
			
 
				-ELSER produces token-weight pairs as output from the input text and the query. 
			
 
				-The {es} <<rank-features,`sparse_vector`>> field type can store these 
			
 
				-token-weight pairs as numeric feature vectors. The index must have a field with 
			
 
				+ELSER produces token-weight pairs as output from the input text and the query.
			
 
				+The {es} <<sparse-vector,`sparse_vector`>> field type can store these
			
 
				+token-weight pairs as numeric feature vectors. The index must have a field with
			
 
				 the `sparse_vector` field type to index the tokens that ELSER generates.
			
 
				 
			
 
				-To create a mapping for your ELSER index, refer to the 
			
 
				-<<elser-mappings,Create the index mapping section>> of the tutorial. The example 
			
 
				-shows how to create an index mapping for `my-index` that defines the 
			
 
				-`my_embeddings.tokens` field - which will contain the ELSER output - as a 
			
 
				+To create a mapping for your ELSER index, refer to the
			
 
				+<<elser-mappings,Create the index mapping section>> of the tutorial. The example
			
 
				+shows how to create an index mapping for `my-index` that defines the
			
 
				+`my_embeddings.tokens` field - which will contain the ELSER output - as a
			
 
				 `sparse_vector` field.
			
 
				 
			
 
				 [source,console]
			
@@ -29,7 +29,7 @@ PUT my-index
 
				 ----
			
 
				 <1> The name of the field that will contain the tokens generated by ELSER.
			
 
				 <2> The field that contains the tokens must be a `sparse_vector` field.
			
 
				-<3> The name of the field from which to create the sparse vector representation. 
			
 
				+<3> The name of the field from which to create the sparse vector representation.
			
 
				 In this example, the name of the field is `my_text_field`.
			
 
				 <4> The field type is `text` in this example.
			
 
				 
			
@@ -38,21 +38,21 @@ In this example, the name of the field is `my_text_field`.
 
				 
			
 
				 // tag::dense-vector[]
			
 
				 
			
 
				-The models compatible with {es} NLP generate dense vectors as output. The 
			
 
				-<<dense-vector,`dense_vector`>> field type is suitable for storing dense vectors 
			
 
				-of numeric values. The index must have a field with the `dense_vector` field 
			
 
				-type to index the embeddings that the supported third-party model that you 
			
 
				-selected generates. Keep in mind that the model produces embeddings with a 
			
 
				-certain number of dimensions. The `dense_vector` field must be configured with 
			
 
				-the same number of dimensions using the `dims` option. Refer to the respective 
			
 
				-model documentation to get information about the number of dimensions of the 
			
 
				+The models compatible with {es} NLP generate dense vectors as output. The
			
 
				+<<dense-vector,`dense_vector`>> field type is suitable for storing dense vectors
			
 
				+of numeric values. The index must have a field with the `dense_vector` field
			
 
				+type to index the embeddings that the supported third-party model that you
			
 
				+selected generates. Keep in mind that the model produces embeddings with a
			
 
				+certain number of dimensions. The `dense_vector` field must be configured with
			
 
				+the same number of dimensions using the `dims` option. Refer to the respective
			
 
				+model documentation to get information about the number of dimensions of the
			
 
				 embeddings.
			
 
				 
			
 
				-To review a mapping of an index for an NLP model, refer to the mapping code 
			
 
				-snippet in the 
			
 
				-{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an ingest inference pipeline] 
			
 
				-section of the tutorial. The example shows how to create an index mapping that 
			
 
				-defines the `my_embeddings.predicted_value` field - which will contain the model 
			
 
				+To review a mapping of an index for an NLP model, refer to the mapping code
			
 
				+snippet in the
			
 
				+{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an ingest inference pipeline]
			
 
				+section of the tutorial. The example shows how to create an index mapping that
			
 
				+defines the `my_embeddings.predicted_value` field - which will contain the model
			
 
				 output - as a `dense_vector` field.
			
 
				 
			
 
				 [source,console]
			
@@ -74,16 +74,16 @@ PUT my-index
 
				   }
			
 
				 }
			
 
				 ----
			
 
				-<1> The name of the field that will contain the embeddings generated by the 
			
 
				+<1> The name of the field that will contain the embeddings generated by the
			
 
				 model.
			
 
				 <2> The field that contains the embeddings must be a `dense_vector` field.
			
 
				-<3> The model produces embeddings with a certain number of dimensions. The 
			
 
				-`dense_vector` field must be configured with the same number of dimensions by 
			
 
				-the `dims` option. Refer to the respective model documentation to get 
			
 
				+<3> The model produces embeddings with a certain number of dimensions. The
			
 
				+`dense_vector` field must be configured with the same number of dimensions by
			
 
				+the `dims` option. Refer to the respective model documentation to get
			
 
				 information about the number of dimensions of the embeddings.
			
 
				-<4> The name of the field from which to create the dense vector representation. 
			
 
				+<4> The name of the field from which to create the dense vector representation.
			
 
				 In this example, the name of the field is `my_text_field`.
			
 
				 <5> The field type is `text` in this example.
			
 
				 
			
 
				 
			
 
				-// end::dense-vector[]
			
 
				+// end::dense-vector[]