6 years ago · 703908ad7f
--- a/docs/reference/search/profile.asciidoc
+++ b/docs/reference/search/profile.asciidoc
@@ -204,16 +204,16 @@ by a unique ID
 
				 
			
 
				 Because a search request may be executed against one or more shards in an index, and a search may cover
			
 
				 one or more indices, the top level element in the profile response is an array of `shard` objects.
			
 
				-Each shard object lists it's `id` which uniquely identifies the shard.  The ID's format is
			
 
				+Each shard object lists its `id` which uniquely identifies the shard.  The ID's format is
			
 
				 `[nodeID][indexName][shardID]`.
			
 
				 
			
 
				 The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
			
 
				-Lucene index.  Most Search Requests submitted by the user will only execute a single `search` against the Lucene index.
			
 
				+Lucene index.  Most search requests submitted by the user will only execute a single `search` against the Lucene index.
			
 
				 But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
			
 
				 a secondary "match_all" query for the global context).
			
 
				 
			
 
				 Inside each `search` object there will be two arrays of profiled information:
			
 
				-a `query` array and a `collector` array.  Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc
			
 
				+a `query` array and a `collector` array.  Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
			
 
				 
			
 
				 There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
			
 
				 
			
@@ -344,12 +344,12 @@ The meaning of the stats are as follows:
 
				 `build_scorer`::
			
 
				 
			
 
				     This parameter shows how long it takes to build a Scorer for the query.  A Scorer is the mechanism that
			
 
				-    iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?).
			
 
				+    iterates over matching documents and generates a score per-document (e.g. how well does "foo" match the document?).
			
 
				     Note, this records the time required to generate the Scorer object, not actually score the documents.  Some
			
 
				     queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.
			
 
				     {empty} +
			
 
				     {empty} +
			
 
				-    This may also showing timing associated with caching, if enabled and/or applicable for the query
			
 
				+    This may also show timing associated with caching, if enabled and/or applicable for the query
			
 
				 
			
 
				 `next_doc`::
			
 
				 
			
@@ -369,7 +369,7 @@ The meaning of the stats are as follows:
 
				 
			
 
				 `matches`::
			
 
				 
			
 
				-    Some queries, such as phrase queries, match documents using a "Two Phase" process.  First, the document is
			
 
				+    Some queries, such as phrase queries, match documents using a "two-phase" process.  First, the document is
			
 
				     "approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous
			
 
				     (and expensive) process.  The second phase verification is what the `matches` statistic measures.
			
 
				     {empty} +
			
@@ -384,7 +384,7 @@ The meaning of the stats are as follows:
 
				 
			
 
				 `score`::
			
 
				 
			
 
				-    This records the time taken to score a particular document via it's Scorer
			
 
				+    This records the time taken to score a particular document via its Scorer
			
 
				 
			
 
				 `*_count`::
			
 
				     Records the number of invocations of the particular method.  For example, `"next_doc_count": 2,`
			
@@ -394,7 +394,7 @@ The meaning of the stats are as follows:
 
				 ==== `collectors` Section
			
 
				 
			
 
				 The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
			
 
				-which is responsible for coordinating the traversal, scoring and collection of matching documents.  Collectors
			
 
				+which is responsible for coordinating the traversal, scoring, and collection of matching documents.  Collectors
			
 
				 are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
			
 
				 filters, etc.
			
 
				 
			
@@ -422,16 +422,16 @@ Looking at the previous example:
 
				 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
			
 
				 
			
 
				 We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
			
 
				-`Collector` used by Elasticsearch.  The `reason` field attempts to give a plain english description of the class name.  The
			
 
				+`Collector` used by Elasticsearch.  The `reason` field attempts to give a plain English description of the class name.  The
			
 
				 `time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children.  Similarly, `children` lists
			
 
				 all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
			
 
				 search was cancelled and stop collecting documents as soon as it occurs.
			
 
				 
			
 
				-It should be noted that Collector times are **independent** from the Query times.  They are calculated, combined
			
 
				+It should be noted that Collector times are **independent** from the Query times.  They are calculated, combined,
			
 
				 and normalized independently!  Due to the nature of Lucene's execution, it is impossible to "merge" the times
			
 
				 from the Collectors into the Query section, so they are displayed in separate portions.
			
 
				 
			
 
				-For reference, the various collector reason's are:
			
 
				+For reference, the various collector reasons are:
			
 
				 
			
 
				 [horizontal]
			
 
				 `search_sorted`::
			
@@ -457,7 +457,7 @@ For reference, the various collector reason's are:
 
				 `search_multi`::
			
 
				 
			
 
				     A collector that wraps several other collectors.  This is seen when combinations of search, aggregations,
			
 
				-    global aggs and post_filters are combined in a single search.
			
 
				+    global aggs, and post_filters are combined in a single search.
			
 
				 
			
 
				 `search_timeout`::
			
 
				 
			
@@ -473,7 +473,7 @@ For reference, the various collector reason's are:
 
				 `global_aggregation`::
			
 
				 
			
 
				     A collector that executes an aggregation against the global query scope, rather than the specified query.
			
 
				-    Because the global scope is necessarily different from the executed query, it must execute it's own
			
 
				+    Because the global scope is necessarily different from the executed query, it must execute its own
			
 
				     match_all query (which you will see added to the Query section) to collect your entire dataset
			
 
				 
			
 
				 
			
@@ -648,9 +648,9 @@ And the response:
 
				 // TESTRESPONSE[s/\.\.\.//]
			
 
				 // TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
			
 
				 // TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
			
 
				-<1> The ``"aggregations"` portion has been omitted because it will be covered in the next section
			
 
				+<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
			
 
				 
			
 
				-As you can see, the output is significantly verbose from before.  All the major portions of the query are
			
 
				+As you can see, the output is significantly more verbose than before.  All the major portions of the query are
			
 
				 represented:
			
 
				 
			
 
				 1. The first `TermQuery` (user:test) represents the main `term` query
			
@@ -662,14 +662,14 @@ The Collector tree is fairly straightforward, showing how a single CancellableCo
 
				 
			
 
				 ==== Understanding MultiTermQuery output
			
 
				 
			
 
				-A special note needs to be made about the `MultiTermQuery` class of queries.  This includes wildcards, regex and fuzzy
			
 
				+A special note needs to be made about the `MultiTermQuery` class of queries.  This includes wildcards, regex, and fuzzy
			
 
				 queries.  These queries emit very verbose responses, and are not overly structured.
			
 
				 
			
 
				 Essentially, these queries rewrite themselves on a per-segment basis.  If you imagine the wildcard query `b*`, it technically
			
 
				 can match any token that begins with the letter "b".  It would be impossible to enumerate all possible combinations,
			
 
				-so Lucene rewrites the query in context of the segment being evaluated.  E.g. one segment may contain the tokens
			
 
				+so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
			
 
				 `[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz".  Another segment may only have the
			
 
				-token `[bakery]`, so query rewrites to a single TermQuery for "bakery".
			
 
				+token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
			
 
				 
			
 
				 Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
			
 
				 "lineage" showing how one query rewrites into the next.  At present time, all we can do is apologize, and suggest you
			
@@ -729,7 +729,7 @@ GET /twitter/_search
 
				 // TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
			
 
				 // TEST[continued]
			
 
				 
			
 
				-Which yields the following aggregation profile output
			
 
				+This yields the following aggregation profile output:
			
 
				 
			
 
				 [source,js]
			
 
				 --------------------------------------------------
			
@@ -797,7 +797,7 @@ Which yields the following aggregation profile output
 
				 
			
 
				 From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
			
 
				 aggregating, `likes`, is a numeric field).  At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`.  That
			
 
				-aggregation then has a child `LongTermsAggregator` which from the second terms aggregation on `likes`.
			
 
				+aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
			
 
				 
			
 
				 The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children.  While the overall time is useful,
			
 
				 the `breakdown` field will give detailed stats about how the time was spent.
			
@@ -859,7 +859,7 @@ The meaning of the stats are as follows:
 
				 ==== Performance Notes
			
 
				 
			
 
				 Like any profiler, the Profile API introduces a non-negligible overhead to search execution.  The act of instrumenting
			
 
				-low-level method calls such as `collect`, `advance` and `next_doc` can be fairly expensive, since these methods are called
			
 
				+low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
			
 
				 in tight loops.  Therefore, profiling should not be enabled in production settings by default, and should not
			
 
				 be compared against non-profiled query times.  Profiling is just a diagnostic tool.
			
 
				 
			
@@ -871,11 +871,11 @@ not have a drastic effect compared to other components in the profiled query.
 
				 ==== Limitations
			
 
				 
			
 
				 - Profiling currently does not measure the search fetch phase nor the network overhead
			
 
				-- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node or
			
 
				-additional work like e.g. building global ordinals (an internal data structure used to speed up search)
			
 
				+- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
			
 
				+additional work such as building global ordinals (an internal data structure used to speed up search)
			
 
				 - Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
			
 
				 - Profiling of the reduce phase of aggregation is currently not available
			
 
				 - The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
			
 
				 never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
			
 
				-diagnostics.  We hope to improve this over time. If you find obviously wrong numbers, strange query structures or
			
 
				+diagnostics.  We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
			
 
				 other bugs, please report them!