11 月之前 · 7b39d3db52
--- a/docs/reference/query-dsl/script-score-query.asciidoc
+++ b/docs/reference/query-dsl/script-score-query.asciidoc
@@ -62,10 +62,17 @@ multiplied by `boost` to produce final documents' scores. Defaults to `1.0`.
 
				 ===== Use relevance scores in a script
			
 
				 
			
 
				 Within a script, you can
			
 
				-{ref}/modules-scripting-fields.html#scripting-score[access] 
			
 
				+{ref}/modules-scripting-fields.html#scripting-score[access]
			
 
				 the `_score` variable which represents the current relevance score of a
			
 
				 document.
			
 
				 
			
 
				+[[script-score-access-term-statistics]]
			
 
				+===== Use term statistics in a script
			
 
				+
			
 
				+Within a script, you can
			
 
				+{ref}/modules-scripting-fields.html#scripting-term-statistics[access]
			
 
				+the `_termStats` variable which provides statistical information about the terms used in the child query of the `script_score` query.
			
 
				+
			
 
				 [[script-score-predefined-functions]]
			
 
				 ===== Predefined functions
			
 
				 You can use any of the available {painless}/painless-contexts.html[painless
			
@@ -147,7 +154,7 @@ updated since update operations also update the value of the `_seq_no` field.
 
				 
			
 
				 [[decay-functions-numeric-fields]]
			
 
				 ====== Decay functions for numeric fields
			
 
				-You can read more about decay functions 
			
 
				+You can read more about decay functions
			
 
				 {ref}/query-dsl-function-score-query.html#function-decay[here].
			
 
				 
			
 
				 * `double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)`
			
@@ -233,7 +240,7 @@ The `script_score` query calculates the score for
 
				 every matching document, or hit. There are faster alternative query types that
			
 
				 can efficiently skip non-competitive hits:
			
 
				 
			
 
				-* If you want to boost documents on some static fields, use the 
			
 
				+* If you want to boost documents on some static fields, use the
			
 
				  <<query-dsl-rank-feature-query, `rank_feature`>> query.
			
 
				  * If you want to boost documents closer to a date or geographic point, use the
			
 
				  <<query-dsl-distance-feature-query, `distance_feature`>> query.
			
--- a/docs/reference/reranking/learning-to-rank-model-training.asciidoc
+++ b/docs/reference/reranking/learning-to-rank-model-training.asciidoc
@@ -38,11 +38,21 @@ Feature extractors are defined using templated queries. https://eland.readthedoc
 
				 from eland.ml.ltr import QueryFeatureExtractor
			
 
				 
			
 
				 feature_extractors=[
			
 
				-    # We want to use the score of the match query for the title field as a feature:
			
 
				+    # We want to use the BM25 score of the match query for the title field as a feature:
			
 
				     QueryFeatureExtractor(
			
 
				         feature_name="title_bm25",
			
 
				         query={"match": {"title": "{{query}}"}}
			
 
				     ),
			
 
				+    # We want to use the the number of matched terms in the title field as a feature:
			
 
				+    QueryFeatureExtractor(
			
 
				+        feature_name="title_matched_term_count",
			
 
				+        query={
			
 
				+            "script_score": {
			
 
				+                "query": {"match": {"title": "{{query}}"}},
			
 
				+                "script": {"source": "return _termStats.matchedTermsCount();"},
			
 
				+            }
			
 
				+        },
			
 
				+    ),
			
 
				     # We can use a script_score query to get the value
			
 
				     # of the field rating directly as a feature:
			
 
				     QueryFeatureExtractor(
			
@@ -54,19 +64,13 @@ feature_extractors=[
 
				             }
			
 
				         },
			
 
				     ),
			
 
				-    # We can execute a script on the value of the query
			
 
				-    # and use the return value as a feature:
			
 
				-    QueryFeatureExtractor(
			
 
				-        feature_name="query_length",
			
 
				+    # We extract the number of terms in the query as feature.
			
 
				+   QueryFeatureExtractor(
			
 
				+        feature_name="query_term_count",
			
 
				         query={
			
 
				             "script_score": {
			
 
				-                "query": {"match_all": {}},
			
 
				-                "script": {
			
 
				-                    "source": "return params['query'].splitOnToken(' ').length;",
			
 
				-                    "params": {
			
 
				-                        "query": "{{query}}",
			
 
				-                    }
			
 
				-                },
			
 
				+                "query": {"match": {"title": "{{query}}"}},
			
 
				+                "script": {"source": "return _termStats.uniqueTermsCount();"},
			
 
				             }
			
 
				         },
			
 
				     ),
			
@@ -74,6 +78,15 @@ feature_extractors=[
 
				 ----
			
 
				 // NOTCONSOLE
			
 
				 
			
 
				+[NOTE]
			
 
				+.Tern statistics as features
			
 
				+===================================================
			
 
				+
			
 
				+It is very common for an LTR model to leverage raw term statistics as features.
			
 
				+To extract this information, you can use the {ref}/modules-scripting-fields.html#scripting-term-statistics[term statistics feature] provided as part of the  <<query-dsl-script-score-query,`script_score`>> query.
			
 
				+
			
 
				+===================================================
			
 
				+
			
 
				 Once the feature extractors have been defined, they are wrapped in an `eland.ml.ltr.LTRModelConfig` object for use in later training steps:
			
 
				 
			
 
				 [source,python]
			
--- a/docs/reference/reranking/learning-to-rank-search-usage.asciidoc
+++ b/docs/reference/reranking/learning-to-rank-search-usage.asciidoc
@@ -61,10 +61,3 @@ When exposing pagination to users, `window_size` should remain constant as each
 
				 ====== Negative scores
			
 
				 
			
 
				 Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.
			
 
				-
			
 
				-[discrete]
			
 
				-[[learning-to-rank-rescorer-limitations-term-statistics]]
			
 
				-====== Term statistics as features
			
 
				-
			
 
				-We do not currently support term statistics as features, however future releases will introduce this capability.
			
 
				-
			
--- a/docs/reference/scripting/fields.asciidoc
+++ b/docs/reference/scripting/fields.asciidoc
@@ -80,6 +80,79 @@ GET my-index-000001/_search
 
				 }
			
 
				 -------------------------------------
			
 
				 
			
 
				+[discrete]
			
 
				+[[scripting-term-statistics]]
			
 
				+=== Accessing term statistics of a document within a script
			
 
				+
			
 
				+Scripts used in a <<query-dsl-script-score-query,`script_score`>> query have access to the `_termStats` variable which provides statistical information about the terms in the child query.
			
 
				+
			
 
				+In the following example, `_termStats` is used within a <<query-dsl-script-score-query,`script_score`>> query to retrieve the average term frequency for the terms `quick`, `brown`, and `fox` in the `text` field:
			
 
				+
			
 
				+[source,console]
			
 
				+-------------------------------------
			
 
				+PUT my-index-000001/_doc/1?refresh
			
 
				+{
			
 
				+  "text": "quick brown fox"
			
 
				+}
			
 
				+
			
 
				+PUT my-index-000001/_doc/2?refresh
			
 
				+{
			
 
				+  "text": "quick fox"
			
 
				+}
			
 
				+
			
 
				+GET my-index-000001/_search
			
 
				+{
			
 
				+  "query": {
			
 
				+    "script_score": {
			
 
				+      "query": { <1>
			
 
				+        "match": {
			
 
				+          "text": "quick brown fox"
			
 
				+        }
			
 
				+      },
			
 
				+      "script": {
			
 
				+        "source": "_termStats.termFreq().getAverage()" <2>
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+-------------------------------------
			
 
				+
			
 
				+<1> Child query used to infer the field and the terms considered in term statistics.
			
 
				+
			
 
				+<2> The script calculates the average document frequency for the terms in the query using `_termStats`.
			
 
				+
			
 
				+`_termStats` provides access to the following functions for working with term statistics:
			
 
				+
			
 
				+- `uniqueTermsCount`: Returns the total number of unique terms in the query. This value is the same across all documents.
			
 
				+- `matchedTermsCount`: Returns the count of query terms that matched within the current document.
			
 
				+- `docFreq`: Provides document frequency statistics for the terms in the query, indicating how many documents contain each term. This value is consistent across all documents.
			
 
				+- `totalTermFreq`: Provides the total frequency of terms across all documents, representing how often each term appears in the entire corpus. This value is consistent across all documents.
			
 
				+- `termFreq`: Returns the frequency of query terms within the current document, showing how often each term appears in that document.
			
 
				+
			
 
				+[NOTE]
			
 
				+.Functions returning aggregated statistics
			
 
				+===================================================
			
 
				+
			
 
				+The `docFreq`, `termFreq` and `totalTermFreq` functions return objects that represent statistics across all terms of the child query.
			
 
				+
			
 
				+Statistics provides support for the following methods:
			
 
				+
			
 
				+`getAverage()`: Returns the average value of the metric.
			
 
				+`getMin()`: Returns the minimum value of the metric.
			
 
				+`getMax()`: Returns the maximum value of the metric.
			
 
				+`getSum()`: Returns the sum of the metric values.
			
 
				+`getCount()`: Returns the count of terms included in the metric calculation.
			
 
				+
			
 
				+===================================================
			
 
				+
			
 
				+
			
 
				+[NOTE]
			
 
				+.Painless language required
			
 
				+===================================================
			
 
				+
			
 
				+The `_termStats` variable is only available when using the <<modules-scripting-painless, Painless>> scripting language.
			
 
				+
			
 
				+===================================================
			
 
				 
			
 
				 [discrete]
			
 
				 [[modules-scripting-doc-vals]]