4 жил өмнө · 55d2f86b5b
--- a/docs/reference/transform/index.asciidoc
+++ b/docs/reference/transform/index.asciidoc
@@ -2,6 +2,7 @@ include::transforms.asciidoc[leveloffset=+1]
 
				 include::overview.asciidoc[leveloffset=+2]
			
 
				 include::setup.asciidoc[leveloffset=+2]
			
 
				 include::usage.asciidoc[leveloffset=+2]
			
 
				+include::transforms-at-scale.asciidoc[leveloffset=+2]
			
 
				 include::checkpoints.asciidoc[leveloffset=+2]
			
 
				 include::api-quickref.asciidoc[leveloffset=+2]
			
 
				 include::ecommerce-tutorial.asciidoc[leveloffset=+2]
			
--- a/docs/reference/transform/transforms-at-scale.asciidoc
+++ b/docs/reference/transform/transforms-at-scale.asciidoc
@@ -0,0 +1,201 @@
 
				+[role="xpack"]
			
 
				+[[transform-scale]]
			
 
				+= Working with {transforms} at scale
			
 
				+++++
			
 
				+<titleabbrev>{transforms-cap} at scale</titleabbrev>
			
 
				+++++
			
 
				+
			
 
				+{transforms-cap} convert existing {es} indices into summarized indices, which 
			
 
				+provide opportunities for new insights and analytics. The search and index 
			
 
				+operations performed by {transforms} use standard {es} features so similar 
			
 
				+considerations for working with {es} at scale are often applicable to 
			
 
				+{transforms}. If you experience performance issues, start by identifying the 
			
 
				+bottleneck areas (search, indexing, processing, or storage) then review the 
			
 
				+relevant considerations in this guide to improve performance. It also helps to 
			
 
				+understand how {transforms} work as different considerations apply depending on 
			
 
				+whether or not your transform is running in continuous mode or in batch.
			
 
				+
			
 
				+In this guide, you’ll learn how to:
			
 
				+
			
 
				+* Understand the impact of configuration options on the performance of 
			
 
				+  {transforms}.
			
 
				+
			
 
				+**Prerequisites:**
			
 
				+
			
 
				+These guildelines assume you have a {transform} you want to tune, and you’re 
			
 
				+already familiar with: 
			
 
				+
			
 
				+* <<transform-overview,How {transforms} work>>.
			
 
				+* <<transform-setup,How to set up {transforms}>>.
			
 
				+* <<transform-checkpoints,How {transform} checkpoints work in continuous mode>>.
			
 
				+
			
 
				+The following considerations are not sequential – the numbers help to navigate 
			
 
				+between the list items; you can take action on one or more of them in any order. 
			
 
				+Most of the recommendations apply to both continuous and batch {transforms}. If 
			
 
				+a list item only applies to one {transform} type, this exception is highlighted 
			
 
				+in the description.
			
 
				+
			
 
				+The keywords in parenthesis at the end of each recommendation title indicates 
			
 
				+the bottleneck area that may be improved by following the given recommendation.
			
 
				+
			
 
				+[discrete]
			
 
				+[[measure-performance]]
			
 
				+== Measure {transforms} performance
			
 
				+
			
 
				+In order to optimize {transform} performance, start by identifying the areas 
			
 
				+where most work is being done. The **Stats** interface of the 
			
 
				+**{transforms-cap}** page in {kib} contains information that covers three main 
			
 
				+areas: indexing, searching, and processing time (alternatively, you can use the 
			
 
				+<<get-transform-stats, {transforms} stats API>>). If, for example, the results 
			
 
				+show that the highest proportion of  time is spent on search, then prioritize 
			
 
				+efforts on optimizing the search query of the {transform}. {transforms-cap} also 
			
 
				+has https://esrally.readthedocs.io[Rally support] that makes it possible to run 
			
 
				+performance checks on {transforms} configurations if it is required. If you 
			
 
				+optimized the crucial factors and you still experience performance issues, you 
			
 
				+may also want to consider improving your hardware.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[frequency]]
			
 
				+== 1. Optimize `frequency` (index)
			
 
				+
			
 
				+In a {ctransform}, the `frequency` configuration option sets the interval 
			
 
				+between checks for changes in the source indices. If changes are detected, then 
			
 
				+the source data is searched and the changes are applied to the destination 
			
 
				+index. Depending on your use case, you may wish to reduce the frequency at which 
			
 
				+changes are applied. By setting `frequency` to a higher value (maximum is one 
			
 
				+hour), the workload can be spread over time at the cost of less up-to-date data.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[increase-shards-dest-index]]
			
 
				+== 2. Increase the number of shards of the destination index (index)
			
 
				+
			
 
				+Depending on the size of the destination index, you may consider increasing its 
			
 
				+shard count. {transforms-cap} use one shard by default when creating the 
			
 
				+destination index. To override the index settings, create the destination index 
			
 
				+before starting the {transform}. For more information about how the number of 
			
 
				+shards affects scalability and resilience, refer to <<scalability>>
			
 
				+
			
 
				+TIP: Use the <<preview-transform>> to check the settings that the {transform} 
			
 
				+would use to create the destination index. You can copy and adjust these in 
			
 
				+order to create the destination index prior to starting the {transform}.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[search-queries]]
			
 
				+== 3. Profile and optimize your search queries (search)
			
 
				+
			
 
				+If you have defined a {transform} source index `query`, ensure it is as 
			
 
				+efficient as possible. Use the **Search Profiler** under **Dev Tools** in {kib} 
			
 
				+to get detailed timing information about the execution of individual components 
			
 
				+in the search request. Alternatively, you can use the <<search-profile>>. The 
			
 
				+results give you insight into how search requests are executed at a low level so 
			
 
				+that you can understand why certain requests are slow, and take steps to improve 
			
 
				+them.
			
 
				+
			
 
				+{transforms-cap} execute standard {es} search requests. There are different ways 
			
 
				+to write {es} queries, and some of them are more efficient than others. Consult 
			
 
				+<<tune-for-search-speed>> to learn more about {es} performance tuning.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[limit-source-query]]
			
 
				+== 4. Limit the scope of the source query (search)
			
 
				+
			
 
				+Imagine your {ctransform} is configured to group by `IP` and calculate the sum 
			
 
				+of `bytes_sent`. For each checkpoint, a {ctransform} detects changes in the 
			
 
				+source data since the previous checkpoint, identifying the IPs for which new 
			
 
				+data has been ingested. Then it performs a second search, filtered for this 
			
 
				+group of IPs, in order to calculate the total `bytes_sent`. If this second 
			
 
				+search matches many shards, then this could be resource intensive. Consider 
			
 
				+limiting the scope that the source index pattern and query will match.
			
 
				+
			
 
				+Use an absolute time value as a date range filter in your source query (for 
			
 
				+example, greater than `2020-01-01T00:00:00`) to limit which historical indices 
			
 
				+are accessed. If you use a relative time value (for example, `now-30d`) then 
			
 
				+this date range is re-evaluated at the point of each checkpoint execution.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[optimize-shading-strategy]]
			
 
				+== 5. Optimize the sharding strategy for the source index (search)
			
 
				+
			
 
				+There is no one-size-fits-all sharding strategy. A strategy that works in one 
			
 
				+environment may not scale in another. A good sharding strategy must account for 
			
 
				+your infrastructure, use case, and performance expectations.
			
 
				+
			
 
				+Too few shards may mean that the benefits of distributing the workload cannot be 
			
 
				+realised; however too many shards may impact your cluster health. To learn more 
			
 
				+about sizing your shards, read this <<size-your-shards,guide>>.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[tune-max-page-search-size]]
			
 
				+== 6. Tune `max_page_search_size` (search)
			
 
				+
			
 
				+The `max_page_search_size` {transform} configuration option defines the number 
			
 
				+of buckets that are returned for each search request. The default value is 500. 
			
 
				+If you increase this value, you get better throughput at the cost of higher 
			
 
				+latency and memory usage.
			
 
				+
			
 
				+The ideal value of this parameter is highly dependent on your use case. If your 
			
 
				+{transform} executes memory-intensive aggregations – for example, cardinality or 
			
 
				+percentiles – then increasing `max_page_search_size` requires more available 
			
 
				+memory. If memory limits are exceeded, a circuit breaker exception occurs.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[indexed-fields-in-source]]
			
 
				+== 7. Use indexed fields in your source indices (search)
			
 
				+
			
 
				+Runtime fields and scripted fields are not indexed fields; their values are only 
			
 
				+extracted or computed at search time. While these fields provide flexibility in 
			
 
				+how you access your data, they increase performance costs at search time. If 
			
 
				+{transform} performance using runtime fields or scripted fields is a concern, 
			
 
				+you may wish to consider using indexed fields instead. For performance reasons, 
			
 
				+we do not recommend using a runtime field as the time field that synchs a 
			
 
				+{ctransform}. 
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[index-sorting-group-by-ordering]]
			
 
				+== 8. Use index sorting and `group_by` ordering (search, process)
			
 
				+
			
 
				+If you use more than one `group_by` field in your {transform}, then the order of 
			
 
				+the fields in conjunction with the use of <<index-modules-index-sorting>> may 
			
 
				+improve runtime.
			
 
				+
			
 
				+Index sorting enables you to store documents on disk in a specific order which 
			
 
				+can improve query efficiency. The ideal sorting logic depends on your use case, 
			
 
				+but the rule of thumb may be to sort the fields in descending order (high to low 
			
 
				+cardinality) starting with the time-based fields. Then put the time-based 
			
 
				+components first in the `group_by` if you have any, and then apply the same 
			
 
				+order to your `group_by` fields as configured for index sorting. Index sorting 
			
 
				+can be defined only once at index creation. If you don't already have index 
			
 
				+sorting on the index that you want to use as a source, consider reindexing it to 
			
 
				+a new, sorted index.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+[[disable-source-dest]]
			
 
				+== 9. Disable the `_source` field on the destination index (storage)
			
 
				+
			
 
				+The <<mapping-source-field>> contains the original JSON document body that was 
			
 
				+passed at index time. The `_source` field itself is not indexed (and thus is not 
			
 
				+searchable), but it is still stored in the index and incurs a storage overhead. 
			
 
				+Consider disabling `_source` to save storage space if you have a large 
			
 
				+destination index. Disabling `_source` is only possible during index creation.
			
 
				+
			
 
				+NOTE: When the `_source` field is disabled, a number of features are not 
			
 
				+supported. Consult <<disable-source-field>> to understand the consequences 
			
 
				+before disabling it.
			
 
				+
			
 
				+
			
 
				+[discrete]
			
 
				+== Further reading
			
 
				+
			
 
				+* <<tune-for-search-speed>>
			
 
				+* <<tune-for-indexing-speed>>
			
 
				+* <<size-your-shards>>
			
 
				+* <<ilm-index-lifecycle>>
			
--- a/docs/reference/transform/transforms.asciidoc
+++ b/docs/reference/transform/transforms.asciidoc
@@ -13,6 +13,8 @@ your data.
 
				 * <<transform-overview>>
			
 
				 * <<transform-setup>>
			
 
				 * <<transform-usage>>
			
 
				+* <<transform-scale>>
			
 
				+* <<transform-checkpoints>>
			
 
				 * <<transform-api-quickref>>
			
 
				 * <<ecommerce-transforms>>
			
 
				 * <<transform-examples>>