geohexgrid-aggregation.asciidoc 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274
  1. [role="xpack"]
  2. [[search-aggregations-bucket-geohexgrid-aggregation]]
  3. === Geohex grid aggregation
  4. ++++
  5. <titleabbrev>Geohex grid</titleabbrev>
  6. ++++
  7. A multi-bucket aggregation that groups <<geo-point,`geo_point`>> and
  8. <<geo-shape,`geo_shape`>> values into buckets that represent a grid.
  9. The resulting grid can be sparse and only
  10. contains cells that have matching data. Each cell corresponds to a
  11. https://h3geo.org/docs/core-library/h3Indexing#h3-cell-indexp[H3 cell index] and is
  12. labeled using the https://h3geo.org/docs/core-library/h3Indexing#h3index-representation[H3Index representation].
  13. See https://h3geo.org/docs/core-library/restable[the table of cell areas for H3
  14. resolutions] on how precision (zoom) correlates to size on the ground.
  15. Precision for this aggregation can be between 0 and 15, inclusive.
  16. WARNING: High-precision requests can be very expensive in terms of RAM and
  17. result sizes. For example, the highest-precision geohex with a precision of 15
  18. produces cells that cover less than one square meter. We recommend you use a
  19. filter to limit high-precision requests to a smaller geographic area. For an example,
  20. refer to <<geohexgrid-high-precision>>.
  21. [[geohexgrid-low-precision]]
  22. ==== Simple low-precision request
  23. [source,console,id=geohexgrid-aggregation-example]
  24. --------------------------------------------------
  25. PUT /museums
  26. {
  27. "mappings": {
  28. "properties": {
  29. "location": {
  30. "type": "geo_point"
  31. }
  32. }
  33. }
  34. }
  35. POST /museums/_bulk?refresh
  36. {"index":{"_id":1}}
  37. {"location": "POINT (4.912350 52.374081)", "name": "NEMO Science Museum"}
  38. {"index":{"_id":2}}
  39. {"location": "POINT (4.901618 52.369219)", "name": "Museum Het Rembrandthuis"}
  40. {"index":{"_id":3}}
  41. {"location": "POINT (4.914722 52.371667)", "name": "Nederlands Scheepvaartmuseum"}
  42. {"index":{"_id":4}}
  43. {"location": "POINT (4.405200 51.222900)", "name": "Letterenhuis"}
  44. {"index":{"_id":5}}
  45. {"location": "POINT (2.336389 48.861111)", "name": "Musée du Louvre"}
  46. {"index":{"_id":6}}
  47. {"location": "POINT (2.327000 48.860000)", "name": "Musée d'Orsay"}
  48. POST /museums/_search?size=0
  49. {
  50. "aggregations": {
  51. "large-grid": {
  52. "geohex_grid": {
  53. "field": "location",
  54. "precision": 4
  55. }
  56. }
  57. }
  58. }
  59. --------------------------------------------------
  60. Response:
  61. [source,console-result]
  62. --------------------------------------------------
  63. {
  64. ...
  65. "aggregations": {
  66. "large-grid": {
  67. "buckets": [
  68. {
  69. "key": "841969dffffffff",
  70. "doc_count": 3
  71. },
  72. {
  73. "key": "841fb47ffffffff",
  74. "doc_count": 2
  75. },
  76. {
  77. "key": "841fa4dffffffff",
  78. "doc_count": 1
  79. }
  80. ]
  81. }
  82. }
  83. }
  84. --------------------------------------------------
  85. // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
  86. [[geohexgrid-high-precision]]
  87. ==== High-precision requests
  88. When requesting detailed buckets (typically for displaying a "zoomed in" map),
  89. a filter like <<query-dsl-geo-bounding-box-query,geo_bounding_box>> should be
  90. applied to narrow the subject area. Otherwise, potentially millions of buckets
  91. will be created and returned.
  92. [source,console,id=geohexgrid-high-precision-ex]
  93. --------------------------------------------------
  94. POST /museums/_search?size=0
  95. {
  96. "aggregations": {
  97. "zoomed-in": {
  98. "filter": {
  99. "geo_bounding_box": {
  100. "location": {
  101. "top_left": "POINT (4.9 52.4)",
  102. "bottom_right": "POINT (5.0 52.3)"
  103. }
  104. }
  105. },
  106. "aggregations": {
  107. "zoom1": {
  108. "geohex_grid": {
  109. "field": "location",
  110. "precision": 12
  111. }
  112. }
  113. }
  114. }
  115. }
  116. }
  117. --------------------------------------------------
  118. // TEST[continued]
  119. Response:
  120. [source,console-result]
  121. --------------------------------------------------
  122. {
  123. ...
  124. "aggregations": {
  125. "zoomed-in": {
  126. "doc_count": 3,
  127. "zoom1": {
  128. "buckets": [
  129. {
  130. "key": "8c1969c9b2617ff",
  131. "doc_count": 1
  132. },
  133. {
  134. "key": "8c1969526d753ff",
  135. "doc_count": 1
  136. },
  137. {
  138. "key": "8c1969526d26dff",
  139. "doc_count": 1
  140. }
  141. ]
  142. }
  143. }
  144. }
  145. }
  146. --------------------------------------------------
  147. // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
  148. [[geohexgrid-addtl-bounding-box-filtering]]
  149. ==== Requests with additional bounding box filtering
  150. The `geohex_grid` aggregation supports an optional `bounds` parameter
  151. that restricts the cells considered to those that intersect the
  152. provided bounds. The `bounds` parameter accepts the same
  153. <<query-dsl-geo-bounding-box-query-accepted-formats,bounding box formats>>
  154. as the geo-bounding box query. This bounding box can be used with or
  155. without an additional `geo_bounding_box` query for filtering the points prior to aggregating.
  156. It is an independent bounding box that can intersect with, be equal to, or be disjoint
  157. to any additional `geo_bounding_box` queries defined in the context of the aggregation.
  158. [source,console,id=geohexgrid-aggregation-with-bounds]
  159. --------------------------------------------------
  160. POST /museums/_search?size=0
  161. {
  162. "aggregations": {
  163. "tiles-in-bounds": {
  164. "geohex_grid": {
  165. "field": "location",
  166. "precision": 12,
  167. "bounds": {
  168. "top_left": "POINT (4.9 52.4)",
  169. "bottom_right": "POINT (5.0 52.3)"
  170. }
  171. }
  172. }
  173. }
  174. }
  175. --------------------------------------------------
  176. // TEST[continued]
  177. Response:
  178. [source,console-result]
  179. --------------------------------------------------
  180. {
  181. ...
  182. "aggregations": {
  183. "tiles-in-bounds": {
  184. "buckets": [
  185. {
  186. "key": "8c1969c9b2617ff",
  187. "doc_count": 1
  188. },
  189. {
  190. "key": "8c1969526d753ff",
  191. "doc_count": 1
  192. },
  193. {
  194. "key": "8c1969526d26dff",
  195. "doc_count": 1
  196. }
  197. ]
  198. }
  199. }
  200. }
  201. --------------------------------------------------
  202. // TESTRESPONSE[s/\.\.\./"took": $body.took,"_shards": $body._shards,"hits":$body.hits,"timed_out":false,/]
  203. [discrete]
  204. [role="xpack"]
  205. [[geohexgrid-aggregating-geo-shape]]
  206. ==== Aggregating `geo_shape` fields
  207. Aggregating on <<geo-shape>> fields works almost as it does for points. There are two key differences:
  208. * When aggregating over `geo_point` data, points are considered within a hexagonal tile if they lie
  209. within the edges defined by great circles. In other words the calculation is done using spherical coordinates.
  210. However, when aggregating over `geo_shape` data, the shapes are considered within a hexagon if they lie
  211. within the edges defined as straight lines on an equirectangular projection.
  212. The reason is that Elasticsearch and Lucene treat edges using the equirectangular projection at index and search time.
  213. In order to ensure that search results and aggregation results are aligned, we therefore also use equirectangular
  214. projection in aggregations.
  215. For most data, the difference is subtle or not noticed.
  216. However, for low zoom levels (low precision), especially far from the equator, this can be noticeable.
  217. For example, if the same point data is indexed as `geo_point` and `geo_shape`, it is possible to get
  218. different results when aggregating at lower resolutions.
  219. * As is the case with <<geotilegrid-aggregating-geo-shape,`geotile_grid`>>,
  220. a single shape can be counted for in multiple tiles. A shape will contribute to the count of matching values
  221. if any part of its shape intersects with that tile. Below is an image that demonstrates this:
  222. image:images/spatial/geoshape_hexgrid.png[]
  223. ==== Options
  224. [horizontal]
  225. field::
  226. (Required, string) Field containing indexed geo-point or geo-shape values.
  227. Must be explicitly mapped as a <<geo-point,`geo_point`>> or a <<geo-shape,`geo_shape`>> field.
  228. If the field contains an array, `geohex_grid` aggregates all array values.
  229. precision::
  230. (Optional, integer) Integer zoom of the key used to define cells/buckets in
  231. the results. Defaults to `6`. Values outside of [`0`,`15`] will be rejected.
  232. bounds::
  233. (Optional, object) Bounding box used to filter the geo-points or geo-shapes in each bucket.
  234. Accepts the same bounding box formats as the
  235. <<query-dsl-geo-bounding-box-query-accepted-formats,geo-bounding box query>>.
  236. size::
  237. (Optional, integer) Maximum number of buckets to return. Defaults to 10,000.
  238. When results are trimmed, buckets are prioritized based on the volume of
  239. documents they contain.
  240. shard_size::
  241. (Optional, integer) Number of buckets returned from each shard. Defaults to
  242. `max(10,(size x number-of-shards))` to allow for a more accurate count of the
  243. top cells in the final result. Since each shard could have a different top result order,
  244. using a larger number here reduces the risk of inaccurate counts, but incurs a performance cost.