Browse Source

Documentation for geohex_grid over geo_shape (#92999)

* Documentation for geohex_grid over geo_shape

The feature to add support for geohex_grid aggregations over geo_shape
fields was added in https://github.com/elastic/elasticsearch/pull/91956.
This is the associated documentation for that.

* Update docs/reference/aggregations/bucket/geohexgrid-aggregation.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Fix explanation for geo_point vs geo_shape proj

When aggregating geohex over geoshape we use requirectangular because
underlying lucene index indexes and searches the polygons in that way.

* Correct spelling

According to grammarly, "therefor" is not an alternative spelling
of "therefore". We should use the conjunctive form here.

See https://www.grammarly.com/blog/therefore-vs-therefor/

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Craig Taverner 2 years ago
parent
commit
e8b4de9a8a

+ 35 - 10
docs/reference/aggregations/bucket/geohexgrid-aggregation.asciidoc

@@ -5,8 +5,8 @@
 <titleabbrev>Geohex grid</titleabbrev>
 ++++
 
-A multi-bucket aggregation that groups <<geo-point,`geo_point`>>
-values into buckets that represent a grid.
+A multi-bucket aggregation that groups <<geo-point,`geo_point`>> and
+<<geo-shape,`geo_shape`>> values into buckets that represent a grid.
 The resulting grid can be sparse and only
 contains cells that have matching data. Each cell corresponds to a
 https://h3geo.org/docs/core-library/h3Indexing#h3-cell-indexp[H3 cell index] and is
@@ -18,7 +18,7 @@ Precision for this aggregation can be between 0 and 15, inclusive.
 
 WARNING: High-precision requests can be very expensive in terms of RAM and
 result sizes. For example, the highest-precision geohex with a precision of 15
-produces cells that cover less than 10cm by 10cm. We recommend you use a
+produces cells that cover less than one square meter. We recommend you use a
 filter to limit high-precision requests to a smaller geographic area. For an example,
 refer to <<geohexgrid-high-precision>>.
 
@@ -220,21 +220,45 @@ Response:
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
 
-[[geohexgrid-options]]
+[discrete]
+[role="xpack"]
+[[geohexgrid-aggregating-geo-shape]]
+==== Aggregating `geo_shape` fields
+
+Aggregating on <<geo-shape>> fields works almost as it does for points. There are two key differences:
+
+* When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie
+within the edges defined by great circles. In other words the calculation is done using spherical coordinates.
+However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie
+within the edges defined as straight lines on an equirectangular projection.
+The reason is that Elasticsearch and Lucene treat edges using the equirectangular projection at index and search time.
+In order to ensure that search results and aggregation results are aligned, we therefore also use equirectangular
+projection in aggregations.
+For most data, the difference is subtle or not noticed.
+However, for low zoom levels (low precision), especially far from the equator,  this can be noticeable.
+For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible  to get
+different results when aggregating at lower resolutions.
+* As is the case with <<geotilegrid-aggregating-geo-shape,`geotile_grid`>>,
+a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values
+if any part of its shape intersects with that tile. Below is an image that demonstrates this:
+
+
+image:images/spatial/geoshape_hexgrid.png[]
+
 ==== Options
 
 [horizontal]
 field::
-(Required, string) Field containing indexed geo-point values. Must be explicitly
-mapped as a <<geo-point,`geo_point`>> field. If the field contains an array,
-`geohex_grid` aggregates all array values.
+(Required, string) Field containing indexed geo-point or geo-shape values.
+Must be explicitly mapped as a <<geo-point,`geo_point`>> or a <<geo-shape,`geo_shape`>> field.
+If the field contains an array, `geohex_grid` aggregates all array values.
 
 precision::
 (Optional, integer) Integer zoom of the key used to define cells/buckets in
 the results. Defaults to `6`. Values outside of [`0`,`15`] will be rejected.
 
 bounds::
-(Optional, object) Bounding box used to filter the geo-points in each bucket.
+(Optional, object) Bounding box used to filter the geo-points or geo-shapes in each bucket.
 Accepts the same bounding box formats as the
 <<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.
 
@@ -245,5 +269,6 @@ documents they contain.
 
 shard_size::
 (Optional, integer) Number of buckets returned from each shard. Defaults to
-`max(10,(size x number-of-shards))` to allow for more a accurate count of the
-top cells in the final result.
+`max(10,(size x number-of-shards))` to allow for a more accurate count of the
+top cells in the final result. Since each shard could have a different top result order,
+using a larger number here reduces the risk of inaccurate counts, but incurs a performance cost.

+ 36 - 22
docs/reference/aggregations/bucket/geotilegrid-aggregation.asciidoc

@@ -17,7 +17,7 @@ cover only a small area.
 * Low precision keys have a smaller range for x and y, and represent tiles that
 each cover a large area.
 
-See https://wiki.openstreetmap.org/wiki/Zoom_levels[Zoom level documentation]
+See https://wiki.openstreetmap.org/wiki/Zoom_levels[zoom level documentation]
 on how precision (zoom) correlates to size on the ground. Precision for this
 aggregation can be between 0 and 29, inclusive.
 
@@ -102,14 +102,15 @@ Response:
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
 
+[[geotilegrid-high-precision]]
 ==== High-precision requests
 
-When requesting detailed buckets (typically for displaying a "zoomed in" map)
+When requesting detailed buckets (typically for displaying a "zoomed in" map),
 a filter like <<query-dsl-geo-bounding-box-query,geo_bounding_box>> should be
-applied to narrow the subject area otherwise potentially millions of buckets
+applied to narrow the subject area. Otherwise, potentially millions of buckets
 will be created and returned.
 
-[source,console]
+[source,console,id=geotilegrid-high-precision-ex]
 --------------------------------------------------
 POST /museums/_search?size=0
 {
@@ -137,6 +138,8 @@ POST /museums/_search?size=0
 --------------------------------------------------
 // TEST[continued]
 
+Response:
+
 [source,console-result]
 --------------------------------------------------
 {
@@ -166,13 +169,14 @@ POST /museums/_search?size=0
 --------------------------------------------------
 // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
 
+[[geotilegrid-addtl-bounding-box-filtering]]
 ==== Requests with additional bounding box filtering
 
 The `geotile_grid` aggregation supports an optional `bounds` parameter
-that restricts the cells considered to those that intersects the
-bounds provided. The `bounds` parameter accepts the bounding box in
-all the same <<query-dsl-geo-bounding-box-query-accepted-formats,accepted formats>> of the
-bounds specified in the Geo Bounding Box Query. This bounding box can be used with or
+that restricts the cells considered to those that intersect the
+provided bounds. The `bounds` parameter accepts the same
+<<query-dsl-geo-bounding-box-query-accepted-formats,bounding box formats>>
+as the geo-bounding box query. This bounding box can be used with or
 without an additional `geo_bounding_box` query for filtering the points prior to aggregating.
 It is an independent bounding box that can intersect with, be equal to, or be disjoint
 to any additional `geo_bounding_box` queries defined in the context of the aggregation.
@@ -197,6 +201,8 @@ POST /museums/_search?size=0
 --------------------------------------------------
 // TEST[continued]
 
+Response:
+
 [source,console-result]
 --------------------------------------------------
 {
@@ -225,9 +231,10 @@ POST /museums/_search?size=0
 
 [discrete]
 [role="xpack"]
+[[geotilegrid-aggregating-geo-shape]]
 ==== Aggregating `geo_shape` fields
 
-Aggregating on <<geo-shape>> fields works just as it does for points, except that a single
+Aggregating on <<geo-shape>> fields works almost as it does for points, except that a single
 shape can be counted for in multiple tiles. A shape will contribute to the count of matching values
 if any part of its shape intersects with that tile. Below is an image that demonstrates this:
 
@@ -237,20 +244,27 @@ image:images/spatial/geoshape_grid.png[]
 ==== Options
 
 [horizontal]
-field::         Mandatory. The name of the field indexed with GeoPoints.
+field::
+(Required, string) Field containing indexed geo-point or geo-shape values.
+Must be explicitly mapped as a <<geo-point,`geo_point`>> or a <<geo-shape,`geo_shape`>> field.
+If the field contains an array, `geotile_grid` aggregates all array values.
 
-precision::     Optional. The integer zoom of the key used to define
-                cells/buckets in the results. Defaults to 7.
-                Values outside of [0,29] will be rejected.
+precision::
+(Optional, integer) Integer zoom of the key used to define cells/buckets in
+the results. Defaults to `7`. Values outside of [`0`,`29`] will be rejected.
 
-bounds:         Optional. The bounding box to filter the points in the bucket.
+bounds::
+(Optional, object) Bounding box used to filter the geo-points or geo-shapes in each bucket.
+Accepts the same bounding box formats as the
+<<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.
 
-size::          Optional. The maximum number of geohash buckets to return
-                (defaults to 10,000). When results are trimmed, buckets are
-                prioritised based on the volumes of documents they contain.
+size::
+(Optional, integer) Maximum number of buckets to return. Defaults to 10,000.
+When results are trimmed, buckets are prioritized based on the volume of
+documents they contain.
 
-shard_size::    Optional. To allow for more accurate counting of the top cells
-                returned in the final result the aggregation defaults to
-                returning `max(10,(size x number-of-shards))` buckets from each
-                shard. If this heuristic is undesirable, the number considered
-                from each shard can be over-ridden using this parameter.
+shard_size::
+(Optional, integer) Number of buckets returned from each shard. Defaults to
+`max(10,(size x number-of-shards))` to allow for a more accurate count of the
+top cells in the final result. Since each shard could have a different top result order,
+using a larger number here reduces the risk of inaccurate counts, but incurs a performance cost.

BIN
docs/reference/images/spatial/geoshape_hexgrid.png