vor 6 Jahren · 8905d77ca4
--- a/docs/reference/how-to/indexing-speed.asciidoc
+++ b/docs/reference/how-to/indexing-speed.asciidoc
@@ -114,13 +114,6 @@ The default is `10%` which is often plenty: for example, if you give the JVM
 
				 10GB of memory, it will give 1GB to the index buffer, which is enough to host
			
 
				 two shards that are heavily indexing.
			
 
				 
			
 
				-[float]
			
 
				-=== Disable `_field_names`
			
 
				-
			
 
				-The <<mapping-field-names-field,`_field_names` field>> introduces some
			
 
				-index-time overhead, so you might want to disable it if you never need to
			
 
				-run `exists` queries.
			
 
				-
			
 
				 [float]
			
 
				 === Additional optimizations
			
 
				 
			
--- a/docs/reference/how-to/recipes.asciidoc
+++ b/docs/reference/how-to/recipes.asciidoc
@@ -3,9 +3,9 @@
 
				 
			
 
				 This section includes a few recipes to help with common problems:
			
 
				 
			
 
				-* <<mixing-exact-search-with-stemming>>
			
 
				-* <<consistent-scoring>>
			
 
				+* <<mixing-exact-search-with-stemming,Mixing exact search with stemming>>
			
 
				+* <<consistent-scoring,Getting consistent scores>>
			
 
				+* <<static-scoring-signals,Incorporating static relevance signals into the score>>
			
 
				 
			
 
				 include::recipes/stemming.asciidoc[]
			
 
				 include::recipes/scoring.asciidoc[]
			
 
				-
			
--- a/docs/reference/how-to/recipes/scoring.asciidoc
+++ b/docs/reference/how-to/recipes/scoring.asciidoc
@@ -60,8 +60,8 @@ request do not have similar index statistics and relevancy could be bad.
 
				 
			
 
				 If you have a small dataset, the easiest way to work around this issue is to
			
 
				 index everything into an index that has a single shard
			
 
				-(`index.number_of_shards: 1`). Then index statistics will be the same for all
			
 
				-documents and scores will be consistent.
			
 
				+(`index.number_of_shards: 1`), which is the default. Then index statistics
			
 
				+will be the same for all documents and scores will be consistent.
			
 
				 
			
 
				 Otherwise the recommended way to work around this issue is to use the
			
 
				 <<dfs-query-then-fetch,`dfs_query_then_fetch`>> search type. This will make
			
@@ -78,3 +78,125 @@ queries, beware that gathering statistics alone might not be cheap since all
 
				 terms have to be looked up in the terms dictionaries in order to look up
			
 
				 statistics.
			
 
				 
			
 
				+[[static-scoring-signals]]
			
 
				+=== Incorporating static relevance signals into the score
			
 
				+
			
 
				+Many domains have static signals that are known to be correlated with relevance.
			
 
				+For instance https://en.wikipedia.org/wiki/PageRank[PageRank] and url length are
			
 
				+two commonly used features for web search in order to tune the score of web
			
 
				+pages independently of the query.
			
 
				+
			
 
				+There are two main queries that allow combining static score contributions with
			
 
				+textual relevance, eg. as computed with BM25:
			
 
				+ - <<query-dsl-script-score-query,`script_score` query>>
			
 
				+ - <<query-dsl-rank-feature-query,`rank_feature` query>>
			
 
				+
			
 
				+For instance imagine that you have a `pagerank` field that you wish to
			
 
				+combine with the BM25 score so that the final score is equal to
			
 
				+`score = bm25_score + pagerank / (10 + pagerank)`.
			
 
				+
			
 
				+With the <<query-dsl-script-score-query,`script_score` query>> the query would
			
 
				+look like this:
			
 
				+
			
 
				+//////////////////////////
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+PUT index
			
 
				+{
			
 
				+    "mappings": {
			
 
				+        "properties": {
			
 
				+            "body": {
			
 
				+                "type": "text"
			
 
				+            },
			
 
				+            "pagerank": {
			
 
				+                "type": "long"
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// CONSOLE
			
 
				+// TEST
			
 
				+
			
 
				+//////////////////////////
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+GET index/_search
			
 
				+{
			
 
				+    "query" : {
			
 
				+        "script_score" : {
			
 
				+            "query" : {
			
 
				+                "match": { "body": "elasticsearch" }
			
 
				+            },
			
 
				+            "script" : {
			
 
				+                "source" : "_score * saturation(doc['pagerank'].value, 10)" <1>
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// CONSOLE
			
 
				+//TEST[continued]
			
 
				+<1> `pagerank` must be mapped as a <<number>>
			
 
				+
			
 
				+while with the <<query-dsl-rank-feature-query,`rank_feature` query>> it would
			
 
				+look like below:
			
 
				+
			
 
				+//////////////////////////
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+PUT index
			
 
				+{
			
 
				+    "mappings": {
			
 
				+        "properties": {
			
 
				+            "body": {
			
 
				+                "type": "text"
			
 
				+            },
			
 
				+            "pagerank": {
			
 
				+                "type": "rank_feature"
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// CONSOLE
			
 
				+// TEST
			
 
				+
			
 
				+//////////////////////////
			
 
				+
			
 
				+[source,js]
			
 
				+--------------------------------------------------
			
 
				+GET _search
			
 
				+{
			
 
				+    "query" : {
			
 
				+        "bool" : {
			
 
				+            "must": {
			
 
				+                "match": { "body": "elasticsearch" }
			
 
				+            },
			
 
				+            "should": {
			
 
				+                "rank_feature": {
			
 
				+                    "field": "pagerank", <1>
			
 
				+                    "saturation": {
			
 
				+                        "pivot": 10
			
 
				+                    }
			
 
				+                }
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// CONSOLE
			
 
				+<1> `pagerank` must be mapped as a <<rank-feature,`rank_feature`>> field
			
 
				+
			
 
				+While both options would return similar scores, there are trade-offs:
			
 
				+<<query-dsl-script-score-query,script_score>> provides a lot of flexibility,
			
 
				+enabling you to combine the text relevance score with static signals as you
			
 
				+prefer. On the other hand, the <<rank-feature,`rank_feature` query>> only
			
 
				+exposes a couple ways to incorporate static signails into the score. However,
			
 
				+it relies on the <<rank-feature,`rank_feature`>> and
			
 
				+<<rank-features,`rank_features`>> fields, which index values in a special way
			
 
				+that allows the <<query-dsl-rank-feature-query,`rank_feature` query>> to skip
			
 
				+over non-competitive documents and get the top matches of a query faster.
			
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@@ -395,15 +395,6 @@ be able to cope with `max_failures` node failures at once at most, then the
 
				 right number of replicas for you is
			
 
				 `max(max_failures, ceil(num_nodes / num_primaries) - 1)`.
			
 
				 
			
 
				-[float]
			
 
				-=== Turn on adaptive replica selection
			
 
				-
			
 
				-When multiple copies of data are present, elasticsearch can use a set of
			
 
				-criteria called <<search-adaptive-replica,adaptive replica selection>> to select
			
 
				-the best copy of the data based on response time, service time, and queue size
			
 
				-of the node containing each copy of the shard. This can improve query throughput
			
 
				-and reduce latency for search-heavy applications.
			
 
				-
			
 
				 === Tune your queries with the Profile API
			
 
				 
			
 
				 You can also analyse how expensive each component of your queries and 
			
@@ -419,3 +410,17 @@ Some caveats to the Profile API are that:
 
				  - the Profile API as a debugging tool adds significant overhead to search execution and can also have a very verbose output
			
 
				  - given the added overhead, the resulting took times are not reliable indicators of actual took time, but can be used comparatively between clauses for relative timing differences
			
 
				  - the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause 
			
 
				+
			
 
				+=== Faster phrase queries with `index_phrases`
			
 
				+
			
 
				+The <<text,`text`>> field has an <<index-phrases,`index_phrases`>> option that
			
 
				+indexes 2-shingles and is automatically leveraged by query parsers to run phrase
			
 
				+queries that don't have a slop. If your use-case involves running lots of phrase
			
 
				+queries, this can speed up queries significantly.
			
 
				+
			
 
				+=== Faster prefix queries with `index_prefixes`
			
 
				+
			
 
				+The <<text,`text`>> field has an <<index-phrases,`index_prefixes`>> option that
			
 
				+indexes prefixes of all terms and is automatically leveraged by query parsers to
			
 
				+run prefix queries. If your use-case involves running lots of prefix queries,
			
 
				+this can speed up queries significantly.