| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483 |
- [role="xpack"]
- [[search-aggregations-metrics-geo-line]]
- === Geo-Line Aggregation
- ++++
- <titleabbrev>Geo-Line</titleabbrev>
- ++++
- The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a `LineString` ordered
- by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
- https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.
- [source,console,id=search-aggregations-metrics-geo-line-simple]
- ----
- PUT test
- {
- "mappings": {
- "properties": {
- "my_location": { "type": "geo_point" },
- "group": { "type": "keyword" },
- "@timestamp": { "type": "date" }
- }
- }
- }
- POST /test/_bulk?refresh
- {"index":{}}
- {"my_location": {"lat":52.373184, "lon":4.889187}, "@timestamp": "2023-01-02T09:00:00Z"}
- {"index":{}}
- {"my_location": {"lat":52.370159, "lon":4.885057}, "@timestamp": "2023-01-02T10:00:00Z"}
- {"index":{}}
- {"my_location": {"lat":52.369219, "lon":4.901618}, "@timestamp": "2023-01-02T13:00:00Z"}
- {"index":{}}
- {"my_location": {"lat":52.374081, "lon":4.912350}, "@timestamp": "2023-01-02T16:00:00Z"}
- {"index":{}}
- {"my_location": {"lat":52.371667, "lon":4.914722}, "@timestamp": "2023-01-03T12:00:00Z"}
- POST /test/_search?filter_path=aggregations
- {
- "aggs": {
- "line": {
- "geo_line": {
- "point": {"field": "my_location"},
- "sort": {"field": "@timestamp"}
- }
- }
- }
- }
- ----
- Which returns:
- [source,js]
- ----
- {
- "aggregations": {
- "line": {
- "type": "Feature",
- "geometry": {
- "type": "LineString",
- "coordinates": [
- [ 4.889187, 52.373184 ],
- [ 4.885057, 52.370159 ],
- [ 4.901618, 52.369219 ],
- [ 4.912350, 52.374081 ],
- [ 4.914722, 52.371667 ]
- ]
- },
- "properties": {
- "complete": true
- }
- }
- }
- }
- ----
- // TESTRESPONSE
- The resulting https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] contains both a `LineString` geometry
- for the path generated by the aggregation, as well as a map of `properties`.
- The property `complete` informs of whether all documents matched were used to generate the geometry.
- The `size` option described below can be used to limit the number of documents included in the aggregation,
- leading to results with `complete: false`.
- Exactly which documents are dropped from results depends on whether the aggregation is based
- on `time_series` or not, and this is discussed in
- <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,more detail below>>.
- The above result could be displayed in a map user interface:
- image:images/spatial/geo_line.png[Kibana map with museum tour of Amsterdam]
- [[search-aggregations-metrics-geo-line-options]]
- ==== Options
- `point`::
- (Required)
- This option specifies the name of the `geo_point` field
- Example usage configuring `my_location` as the point field:
- [source,js]
- ----
- "point": {
- "field": "my_location"
- }
- ----
- // NOTCONSOLE
- `sort`::
- (Required outside <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>> aggregations)
- This option specifies the name of the numeric field to use as the sort key for ordering the points.
- When the `geo_line` aggregation is nested inside a
- <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
- aggregation, this field defaults to `@timestamp`, and any other value will result in error.
- Example usage configuring `@timestamp` as the sort key:
- [source,js]
- ----
- "sort": {
- "field": "@timestamp"
- }
- ----
- // NOTCONSOLE
- `include_sort`::
- (Optional, boolean, default: `false`) This option includes, when true, an additional array of the sort values in the
- feature properties.
- `sort_order`::
- (Optional, string, default: `"ASC"`) This option accepts one of two values: "ASC", "DESC".
- The line is sorted in ascending order by the sort key when set to "ASC", and in descending
- with "DESC".
- `size`::
- (Optional, integer, default: `10000`) The maximum length of the line represented in the aggregation.
- Valid sizes are between one and 10000.
- Within <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
- the aggregation uses line simplification to constrain the size, otherwise it uses truncation.
- See <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,below>>
- for a discussion on the subtleties involved.
- [[search-aggregations-metrics-geo-line-grouping]]
- ==== Grouping
- The simple example above will produce a single track for all the data selected by the query. However, it is far more
- common to need to group the data into multiple tracks. For example, grouping flight transponder measurements by
- flight call-sign before sorting each flight by timestamp and producing a separate track for each.
- In the following examples we will group the locations of points of interest in the cities of
- Amsterdam, Antwerp and Paris.
- The tracks will be ordered by the planned visit sequence for a walking tour of the museums and others attractions.
- In order to demonstrate the difference between a time-series grouping and a non-time-series grouping, we will
- first create an index with <<tsds-index-settings,time-series enabled>>,
- and then give examples of grouping the same data without time-series and with time-series.
- [source,console,id=search-aggregations-metrics-geo-line-grouping-setup]
- ----
- PUT tour
- {
- "mappings": {
- "properties": {
- "city": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "category": { "type": "keyword" },
- "route": { "type": "long" },
- "name": { "type": "keyword" },
- "location": { "type": "geo_point" },
- "@timestamp": { "type": "date" }
- }
- },
- "settings": {
- "index": {
- "mode": "time_series",
- "routing_path": [ "city" ],
- "time_series": {
- "start_time": "2023-01-01T00:00:00Z",
- "end_time": "2024-01-01T00:00:00Z"
- }
- }
- }
- }
- POST /tour/_bulk?refresh
- {"index":{}}
- {"@timestamp": "2023-01-02T09:00:00Z", "route": 0, "location": "POINT(4.889187 52.373184)", "city": "Amsterdam", "category": "Attraction", "name": "Royal Palace Amsterdam"}
- {"index":{}}
- {"@timestamp": "2023-01-02T10:00:00Z", "route": 1, "location": "POINT(4.885057 52.370159)", "city": "Amsterdam", "category": "Attraction", "name": "The Amsterdam Dungeon"}
- {"index":{}}
- {"@timestamp": "2023-01-02T13:00:00Z", "route": 2, "location": "POINT(4.901618 52.369219)", "city": "Amsterdam", "category": "Museum", "name": "Museum Het Rembrandthuis"}
- {"index":{}}
- {"@timestamp": "2023-01-02T16:00:00Z", "route": 3, "location": "POINT(4.912350 52.374081)", "city": "Amsterdam", "category": "Museum", "name": "NEMO Science Museum"}
- {"index":{}}
- {"@timestamp": "2023-01-03T12:00:00Z", "route": 4, "location": "POINT(4.914722 52.371667)", "city": "Amsterdam", "category": "Museum", "name": "Nederlands Scheepvaartmuseum"}
- {"index":{}}
- {"@timestamp": "2023-01-04T09:00:00Z", "route": 5, "location": "POINT(4.401384 51.220292)", "city": "Antwerp", "category": "Attraction", "name": "Cathedral of Our Lady"}
- {"index":{}}
- {"@timestamp": "2023-01-04T12:00:00Z", "route": 6, "location": "POINT(4.405819 51.221758)", "city": "Antwerp", "category": "Museum", "name": "Snijders&Rockoxhuis"}
- {"index":{}}
- {"@timestamp": "2023-01-04T15:00:00Z", "route": 7, "location": "POINT(4.405200 51.222900)", "city": "Antwerp", "category": "Museum", "name": "Letterenhuis"}
- {"index":{}}
- {"@timestamp": "2023-01-05T10:00:00Z", "route": 8, "location": "POINT(2.336389 48.861111)", "city": "Paris", "category": "Museum", "name": "Musée du Louvre"}
- {"index":{}}
- {"@timestamp": "2023-01-05T14:00:00Z", "route": 9, "location": "POINT(2.327000 48.860000)", "city": "Paris", "category": "Museum", "name": "Musée dOrsay"}
- ----
- [[search-aggregations-metrics-geo-line-grouping-terms]]
- ==== Grouping with terms
- Using the above data, for a non-time-series use case, the grouping can be done using a
- <<search-aggregations-bucket-terms-aggregation,terms aggregation>> based on city name.
- This would work whether or not we had defined the `tour` index as a time series index.
- [source,console,id=search-aggregations-metrics-geo-line-terms]
- ----
- POST /tour/_search?filter_path=aggregations
- {
- "aggregations": {
- "path": {
- "terms": {"field": "city"},
- "aggregations": {
- "museum_tour": {
- "geo_line": {
- "point": {"field": "location"},
- "sort": {"field": "@timestamp"}
- }
- }
- }
- }
- }
- }
- ----
- // TEST[continued]
- Which returns:
- [source,js]
- ----
- {
- "aggregations": {
- "path": {
- "doc_count_error_upper_bound": 0,
- "sum_other_doc_count": 0,
- "buckets": [
- {
- "key": "Amsterdam",
- "doc_count": 5,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- },
- {
- "key": "Antwerp",
- "doc_count": 3,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- },
- {
- "key": "Paris",
- "doc_count": 2,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- }
- ]
- }
- }
- }
- ----
- // TESTRESPONSE
- The above results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
- of the `city` field, and an inner aggregation result called `museum_tour` containing a
- https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
- actual route between the various attractions in that city.
- Each result also includes a `properties` object with a `complete` value which will be `false` if the geometry
- was truncated to the limits specified in the `size` parameter.
- Note that when we use `time_series` in the example below, we will get the same results structured a little differently.
- [[search-aggregations-metrics-geo-line-grouping-time-series]]
- ==== Grouping with time-series
- Using the same data as before, we can also perform the grouping with a
- <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
- This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`,
- in this case the same `city` field used in the previous
- <<search-aggregations-bucket-terms-aggregation,terms aggregation>>.
- This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
- [source,console,id=search-aggregations-metrics-geo-line-time-series]
- ----
- POST /tour/_search?filter_path=aggregations
- {
- "aggregations": {
- "path": {
- "time_series": {},
- "aggregations": {
- "museum_tour": {
- "geo_line": {
- "point": {"field": "location"}
- }
- }
- }
- }
- }
- }
- ----
- // TEST[continued]
- NOTE: The `geo_line` aggregation no longer requires the `sort` field when nested within a
- <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
- This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by.
- If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
- The above query will result in:
- [source,js]
- ----
- {
- "aggregations": {
- "path": {
- "buckets": {
- "{city=Paris}": {
- "key": {
- "city": "Paris"
- },
- "doc_count": 2,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- },
- "{city=Antwerp}": {
- "key": {
- "city": "Antwerp"
- },
- "doc_count": 3,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- },
- "{city=Amsterdam}": {
- "key": {
- "city": "Amsterdam"
- },
- "doc_count": 5,
- "museum_tour": {
- "type": "Feature",
- "geometry": {
- "coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
- "type": "LineString"
- },
- "properties": {
- "complete": true
- }
- }
- }
- }
- }
- }
- }
- ----
- // TESTRESPONSE
- The above results are essentially the same as with the previous `terms` aggregation example, but structured differently.
- Here we see the buckets returned as a map, where the key is an internal description of the TSID.
- This TSID is unique for each unique combination of fields with `time_series_dimension: true`.
- Each bucket contains a `key` field which is also a map of all dimension values for the TSID, in this case only the city
- name is used for grouping.
- In addition, there is an inner aggregation result called `museum_tour` containing a
- https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
- actual route between the various attractions in that city.
- Each result also includes a `properties` object with a `complete` value which will be false if the geometry
- was simplified to the limits specified in the `size` parameter.
- [[search-aggregations-metrics-geo-line-grouping-time-series-advantages]]
- ==== Why group with time-series?
- When reviewing the above examples, you might think that there is little difference between using
- <<search-aggregations-bucket-terms-aggregation,`terms`>> or
- <<search-aggregations-bucket-time-series-aggregation,`time_series`>>
- to group the geo-lines. However, there are some important differences in behaviour between the two cases.
- Time series indexes are stored in a very specific order on disk.
- They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field.
- This allows the `geo_line` aggregation to be considerably optimized:
- * The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets.
- This is substantially less memory than required for non-time-series cases where all buckets are collected
- concurrently.
- * No sorting needs to be done, since the data is pre-sorted by `@timestamp`.
- The time-series data will naturally arrive at the aggregation collector in `DESC` order.
- This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order,
- but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
- * The `size` parameter can be used for a streaming line-simplification algorithm.
- Without time-series, we are forced to truncate data, by default after 10000 documents per bucket, in order to
- prevent memory usage from being unbounded.
- This can result in geo-lines being truncated, and therefor loosing important data.
- With time-series we can run a streaming line-simplification algorithm, retaining control over memory usage,
- while also maintaining the overall geometry shape.
- In fact, for most use cases it would work to set this `size` parameter to a much lower bound, and save even more
- memory. For example, if the `geo_line` is to be drawn on a display map with a specific resolution, it might look
- just as good to simplify to as few as 100 or 200 points. This will save memory on the server, on the network and
- in the client.
- Note: There are other significant advantages to working with time-series data and using `time_series` index mode.
- These are discussed in the documentation on <<tsds,time series data streams>>.
- [[search-aggregations-metrics-geo-line-simplification]]
- ==== Streaming line simplification
- Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map
- user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the
- entire geometry to be maintained in memory together with supporting data for the simplification itself.
- The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification
- process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting
- is required, which is the case when grouping is done by the
- <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>,
- running on an index with the `time_series` index mode.
- Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that
- memory with the incoming documents.
- Once the memory is completely filled, documents from within the line are removed as new documents are added.
- The choice of document to remove is made to minimize the visual impact on the geometry.
- This process makes use of the
- https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm[Visvalingam–Whyatt algorithm].
- Essentially this means points are removed if they have the minimum triangle area, with the triangle defined
- by the point under consideration and the two points before and after it in the line.
- In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
- In order to demonstrate how much better line simplification is to line truncation, consider this example of the north
- shore of Kodiak Island.
- The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
- image:images/spatial/kodiak_geo_line_truncated.png[North short of Kodiak Island truncated to 100 points]
- The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different
- geometry than the original.
- Now consider the same geometry simplified to 100 points.
- image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
- For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry
- in magenta. It is possible to see where the new simplified line deviates from the original, but the overall
- geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.
|