geoline-aggregation.asciidoc 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483
  1. [role="xpack"]
  2. [[search-aggregations-metrics-geo-line]]
  3. === Geo-Line Aggregation
  4. ++++
  5. <titleabbrev>Geo-Line</titleabbrev>
  6. ++++
  7. The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a `LineString` ordered
  8. by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
  9. https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.
  10. [source,console,id=search-aggregations-metrics-geo-line-simple]
  11. ----
  12. PUT test
  13. {
  14. "mappings": {
  15. "properties": {
  16. "my_location": { "type": "geo_point" },
  17. "group": { "type": "keyword" },
  18. "@timestamp": { "type": "date" }
  19. }
  20. }
  21. }
  22. POST /test/_bulk?refresh
  23. {"index":{}}
  24. {"my_location": {"lat":52.373184, "lon":4.889187}, "@timestamp": "2023-01-02T09:00:00Z"}
  25. {"index":{}}
  26. {"my_location": {"lat":52.370159, "lon":4.885057}, "@timestamp": "2023-01-02T10:00:00Z"}
  27. {"index":{}}
  28. {"my_location": {"lat":52.369219, "lon":4.901618}, "@timestamp": "2023-01-02T13:00:00Z"}
  29. {"index":{}}
  30. {"my_location": {"lat":52.374081, "lon":4.912350}, "@timestamp": "2023-01-02T16:00:00Z"}
  31. {"index":{}}
  32. {"my_location": {"lat":52.371667, "lon":4.914722}, "@timestamp": "2023-01-03T12:00:00Z"}
  33. POST /test/_search?filter_path=aggregations
  34. {
  35. "aggs": {
  36. "line": {
  37. "geo_line": {
  38. "point": {"field": "my_location"},
  39. "sort": {"field": "@timestamp"}
  40. }
  41. }
  42. }
  43. }
  44. ----
  45. Which returns:
  46. [source,js]
  47. ----
  48. {
  49. "aggregations": {
  50. "line": {
  51. "type": "Feature",
  52. "geometry": {
  53. "type": "LineString",
  54. "coordinates": [
  55. [ 4.889187, 52.373184 ],
  56. [ 4.885057, 52.370159 ],
  57. [ 4.901618, 52.369219 ],
  58. [ 4.912350, 52.374081 ],
  59. [ 4.914722, 52.371667 ]
  60. ]
  61. },
  62. "properties": {
  63. "complete": true
  64. }
  65. }
  66. }
  67. }
  68. ----
  69. // TESTRESPONSE
  70. The resulting https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] contains both a `LineString` geometry
  71. for the path generated by the aggregation, as well as a map of `properties`.
  72. The property `complete` informs of whether all documents matched were used to generate the geometry.
  73. The `size` option described below can be used to limit the number of documents included in the aggregation,
  74. leading to results with `complete: false`.
  75. Exactly which documents are dropped from results depends on whether the aggregation is based
  76. on `time_series` or not, and this is discussed in
  77. <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,more detail below>>.
  78. The above result could be displayed in a map user interface:
  79. image:images/spatial/geo_line.png[Kibana map with museum tour of Amsterdam]
  80. [[search-aggregations-metrics-geo-line-options]]
  81. ==== Options
  82. `point`::
  83. (Required)
  84. This option specifies the name of the `geo_point` field
  85. Example usage configuring `my_location` as the point field:
  86. [source,js]
  87. ----
  88. "point": {
  89. "field": "my_location"
  90. }
  91. ----
  92. // NOTCONSOLE
  93. `sort`::
  94. (Required outside <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>> aggregations)
  95. This option specifies the name of the numeric field to use as the sort key for ordering the points.
  96. When the `geo_line` aggregation is nested inside a
  97. <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
  98. aggregation, this field defaults to `@timestamp`, and any other value will result in error.
  99. Example usage configuring `@timestamp` as the sort key:
  100. [source,js]
  101. ----
  102. "sort": {
  103. "field": "@timestamp"
  104. }
  105. ----
  106. // NOTCONSOLE
  107. `include_sort`::
  108. (Optional, boolean, default: `false`) This option includes, when true, an additional array of the sort values in the
  109. feature properties.
  110. `sort_order`::
  111. (Optional, string, default: `"ASC"`) This option accepts one of two values: "ASC", "DESC".
  112. The line is sorted in ascending order by the sort key when set to "ASC", and in descending
  113. with "DESC".
  114. `size`::
  115. (Optional, integer, default: `10000`) The maximum length of the line represented in the aggregation.
  116. Valid sizes are between one and 10000.
  117. Within <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
  118. the aggregation uses line simplification to constrain the size, otherwise it uses truncation.
  119. See <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,below>>
  120. for a discussion on the subtleties involved.
  121. [[search-aggregations-metrics-geo-line-grouping]]
  122. ==== Grouping
  123. The simple example above will produce a single track for all the data selected by the query. However, it is far more
  124. common to need to group the data into multiple tracks. For example, grouping flight transponder measurements by
  125. flight call-sign before sorting each flight by timestamp and producing a separate track for each.
  126. In the following examples we will group the locations of points of interest in the cities of
  127. Amsterdam, Antwerp and Paris.
  128. The tracks will be ordered by the planned visit sequence for a walking tour of the museums and others attractions.
  129. In order to demonstrate the difference between a time-series grouping and a non-time-series grouping, we will
  130. first create an index with <<tsds-index-settings,time-series enabled>>,
  131. and then give examples of grouping the same data without time-series and with time-series.
  132. [source,console,id=search-aggregations-metrics-geo-line-grouping-setup]
  133. ----
  134. PUT tour
  135. {
  136. "mappings": {
  137. "properties": {
  138. "city": {
  139. "type": "keyword",
  140. "time_series_dimension": true
  141. },
  142. "category": { "type": "keyword" },
  143. "route": { "type": "long" },
  144. "name": { "type": "keyword" },
  145. "location": { "type": "geo_point" },
  146. "@timestamp": { "type": "date" }
  147. }
  148. },
  149. "settings": {
  150. "index": {
  151. "mode": "time_series",
  152. "routing_path": [ "city" ],
  153. "time_series": {
  154. "start_time": "2023-01-01T00:00:00Z",
  155. "end_time": "2024-01-01T00:00:00Z"
  156. }
  157. }
  158. }
  159. }
  160. POST /tour/_bulk?refresh
  161. {"index":{}}
  162. {"@timestamp": "2023-01-02T09:00:00Z", "route": 0, "location": "POINT(4.889187 52.373184)", "city": "Amsterdam", "category": "Attraction", "name": "Royal Palace Amsterdam"}
  163. {"index":{}}
  164. {"@timestamp": "2023-01-02T10:00:00Z", "route": 1, "location": "POINT(4.885057 52.370159)", "city": "Amsterdam", "category": "Attraction", "name": "The Amsterdam Dungeon"}
  165. {"index":{}}
  166. {"@timestamp": "2023-01-02T13:00:00Z", "route": 2, "location": "POINT(4.901618 52.369219)", "city": "Amsterdam", "category": "Museum", "name": "Museum Het Rembrandthuis"}
  167. {"index":{}}
  168. {"@timestamp": "2023-01-02T16:00:00Z", "route": 3, "location": "POINT(4.912350 52.374081)", "city": "Amsterdam", "category": "Museum", "name": "NEMO Science Museum"}
  169. {"index":{}}
  170. {"@timestamp": "2023-01-03T12:00:00Z", "route": 4, "location": "POINT(4.914722 52.371667)", "city": "Amsterdam", "category": "Museum", "name": "Nederlands Scheepvaartmuseum"}
  171. {"index":{}}
  172. {"@timestamp": "2023-01-04T09:00:00Z", "route": 5, "location": "POINT(4.401384 51.220292)", "city": "Antwerp", "category": "Attraction", "name": "Cathedral of Our Lady"}
  173. {"index":{}}
  174. {"@timestamp": "2023-01-04T12:00:00Z", "route": 6, "location": "POINT(4.405819 51.221758)", "city": "Antwerp", "category": "Museum", "name": "Snijders&Rockoxhuis"}
  175. {"index":{}}
  176. {"@timestamp": "2023-01-04T15:00:00Z", "route": 7, "location": "POINT(4.405200 51.222900)", "city": "Antwerp", "category": "Museum", "name": "Letterenhuis"}
  177. {"index":{}}
  178. {"@timestamp": "2023-01-05T10:00:00Z", "route": 8, "location": "POINT(2.336389 48.861111)", "city": "Paris", "category": "Museum", "name": "Musée du Louvre"}
  179. {"index":{}}
  180. {"@timestamp": "2023-01-05T14:00:00Z", "route": 9, "location": "POINT(2.327000 48.860000)", "city": "Paris", "category": "Museum", "name": "Musée dOrsay"}
  181. ----
  182. [[search-aggregations-metrics-geo-line-grouping-terms]]
  183. ==== Grouping with terms
  184. Using the above data, for a non-time-series use case, the grouping can be done using a
  185. <<search-aggregations-bucket-terms-aggregation,terms aggregation>> based on city name.
  186. This would work whether or not we had defined the `tour` index as a time series index.
  187. [source,console,id=search-aggregations-metrics-geo-line-terms]
  188. ----
  189. POST /tour/_search?filter_path=aggregations
  190. {
  191. "aggregations": {
  192. "path": {
  193. "terms": {"field": "city"},
  194. "aggregations": {
  195. "museum_tour": {
  196. "geo_line": {
  197. "point": {"field": "location"},
  198. "sort": {"field": "@timestamp"}
  199. }
  200. }
  201. }
  202. }
  203. }
  204. }
  205. ----
  206. // TEST[continued]
  207. Which returns:
  208. [source,js]
  209. ----
  210. {
  211. "aggregations": {
  212. "path": {
  213. "doc_count_error_upper_bound": 0,
  214. "sum_other_doc_count": 0,
  215. "buckets": [
  216. {
  217. "key": "Amsterdam",
  218. "doc_count": 5,
  219. "museum_tour": {
  220. "type": "Feature",
  221. "geometry": {
  222. "coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
  223. "type": "LineString"
  224. },
  225. "properties": {
  226. "complete": true
  227. }
  228. }
  229. },
  230. {
  231. "key": "Antwerp",
  232. "doc_count": 3,
  233. "museum_tour": {
  234. "type": "Feature",
  235. "geometry": {
  236. "coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
  237. "type": "LineString"
  238. },
  239. "properties": {
  240. "complete": true
  241. }
  242. }
  243. },
  244. {
  245. "key": "Paris",
  246. "doc_count": 2,
  247. "museum_tour": {
  248. "type": "Feature",
  249. "geometry": {
  250. "coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
  251. "type": "LineString"
  252. },
  253. "properties": {
  254. "complete": true
  255. }
  256. }
  257. }
  258. ]
  259. }
  260. }
  261. }
  262. ----
  263. // TESTRESPONSE
  264. The above results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
  265. of the `city` field, and an inner aggregation result called `museum_tour` containing a
  266. https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
  267. actual route between the various attractions in that city.
  268. Each result also includes a `properties` object with a `complete` value which will be `false` if the geometry
  269. was truncated to the limits specified in the `size` parameter.
  270. Note that when we use `time_series` in the example below, we will get the same results structured a little differently.
  271. [[search-aggregations-metrics-geo-line-grouping-time-series]]
  272. ==== Grouping with time-series
  273. Using the same data as before, we can also perform the grouping with a
  274. <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
  275. This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`,
  276. in this case the same `city` field used in the previous
  277. <<search-aggregations-bucket-terms-aggregation,terms aggregation>>.
  278. This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
  279. [source,console,id=search-aggregations-metrics-geo-line-time-series]
  280. ----
  281. POST /tour/_search?filter_path=aggregations
  282. {
  283. "aggregations": {
  284. "path": {
  285. "time_series": {},
  286. "aggregations": {
  287. "museum_tour": {
  288. "geo_line": {
  289. "point": {"field": "location"}
  290. }
  291. }
  292. }
  293. }
  294. }
  295. }
  296. ----
  297. // TEST[continued]
  298. NOTE: The `geo_line` aggregation no longer requires the `sort` field when nested within a
  299. <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
  300. This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by.
  301. If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
  302. The above query will result in:
  303. [source,js]
  304. ----
  305. {
  306. "aggregations": {
  307. "path": {
  308. "buckets": {
  309. "{city=Paris}": {
  310. "key": {
  311. "city": "Paris"
  312. },
  313. "doc_count": 2,
  314. "museum_tour": {
  315. "type": "Feature",
  316. "geometry": {
  317. "coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
  318. "type": "LineString"
  319. },
  320. "properties": {
  321. "complete": true
  322. }
  323. }
  324. },
  325. "{city=Antwerp}": {
  326. "key": {
  327. "city": "Antwerp"
  328. },
  329. "doc_count": 3,
  330. "museum_tour": {
  331. "type": "Feature",
  332. "geometry": {
  333. "coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
  334. "type": "LineString"
  335. },
  336. "properties": {
  337. "complete": true
  338. }
  339. }
  340. },
  341. "{city=Amsterdam}": {
  342. "key": {
  343. "city": "Amsterdam"
  344. },
  345. "doc_count": 5,
  346. "museum_tour": {
  347. "type": "Feature",
  348. "geometry": {
  349. "coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
  350. "type": "LineString"
  351. },
  352. "properties": {
  353. "complete": true
  354. }
  355. }
  356. }
  357. }
  358. }
  359. }
  360. }
  361. ----
  362. // TESTRESPONSE
  363. The above results are essentially the same as with the previous `terms` aggregation example, but structured differently.
  364. Here we see the buckets returned as a map, where the key is an internal description of the TSID.
  365. This TSID is unique for each unique combination of fields with `time_series_dimension: true`.
  366. Each bucket contains a `key` field which is also a map of all dimension values for the TSID, in this case only the city
  367. name is used for grouping.
  368. In addition, there is an inner aggregation result called `museum_tour` containing a
  369. https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
  370. actual route between the various attractions in that city.
  371. Each result also includes a `properties` object with a `complete` value which will be false if the geometry
  372. was simplified to the limits specified in the `size` parameter.
  373. [[search-aggregations-metrics-geo-line-grouping-time-series-advantages]]
  374. ==== Why group with time-series?
  375. When reviewing the above examples, you might think that there is little difference between using
  376. <<search-aggregations-bucket-terms-aggregation,`terms`>> or
  377. <<search-aggregations-bucket-time-series-aggregation,`time_series`>>
  378. to group the geo-lines. However, there are some important differences in behaviour between the two cases.
  379. Time series indexes are stored in a very specific order on disk.
  380. They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field.
  381. This allows the `geo_line` aggregation to be considerably optimized:
  382. * The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets.
  383. This is substantially less memory than required for non-time-series cases where all buckets are collected
  384. concurrently.
  385. * No sorting needs to be done, since the data is pre-sorted by `@timestamp`.
  386. The time-series data will naturally arrive at the aggregation collector in `DESC` order.
  387. This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order,
  388. but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
  389. * The `size` parameter can be used for a streaming line-simplification algorithm.
  390. Without time-series, we are forced to truncate data, by default after 10000 documents per bucket, in order to
  391. prevent memory usage from being unbounded.
  392. This can result in geo-lines being truncated, and therefor loosing important data.
  393. With time-series we can run a streaming line-simplification algorithm, retaining control over memory usage,
  394. while also maintaining the overall geometry shape.
  395. In fact, for most use cases it would work to set this `size` parameter to a much lower bound, and save even more
  396. memory. For example, if the `geo_line` is to be drawn on a display map with a specific resolution, it might look
  397. just as good to simplify to as few as 100 or 200 points. This will save memory on the server, on the network and
  398. in the client.
  399. Note: There are other significant advantages to working with time-series data and using `time_series` index mode.
  400. These are discussed in the documentation on <<tsds,time series data streams>>.
  401. [[search-aggregations-metrics-geo-line-simplification]]
  402. ==== Streaming line simplification
  403. Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map
  404. user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the
  405. entire geometry to be maintained in memory together with supporting data for the simplification itself.
  406. The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification
  407. process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting
  408. is required, which is the case when grouping is done by the
  409. <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>,
  410. running on an index with the `time_series` index mode.
  411. Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that
  412. memory with the incoming documents.
  413. Once the memory is completely filled, documents from within the line are removed as new documents are added.
  414. The choice of document to remove is made to minimize the visual impact on the geometry.
  415. This process makes use of the
  416. https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm[Visvalingam–Whyatt algorithm].
  417. Essentially this means points are removed if they have the minimum triangle area, with the triangle defined
  418. by the point under consideration and the two points before and after it in the line.
  419. In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
  420. In order to demonstrate how much better line simplification is to line truncation, consider this example of the north
  421. shore of Kodiak Island.
  422. The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
  423. image:images/spatial/kodiak_geo_line_truncated.png[North short of Kodiak Island truncated to 100 points]
  424. The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different
  425. geometry than the original.
  426. Now consider the same geometry simplified to 100 points.
  427. image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
  428. For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry
  429. in magenta. It is possible to see where the new simplified line deviates from the original, but the overall
  430. geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.