geoip.asciidoc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441
  1. [[geoip-processor]]
  2. === GeoIP processor
  3. ++++
  4. <titleabbrev>GeoIP</titleabbrev>
  5. ++++
  6. The `geoip` processor adds information about the geographical location of an
  7. IPv4 or IPv6 address.
  8. [[geoip-automatic-updates]]
  9. By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
  10. ASN GeoIP2 databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
  11. CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either:
  12. * `ingest.geoip.downloader.eager.download` is set to true
  13. * your cluster has at least one pipeline with a `geoip` processor
  14. {es} automatically downloads updates for these databases from the Elastic GeoIP endpoint:
  15. https://geoip.elastic.co/v1/database. To get download statistics for these
  16. updates, use the <<geoip-stats-api,GeoIP stats API>>.
  17. If your cluster can't connect to the Elastic GeoIP endpoint or you want to
  18. manage your own updates, see <<manage-geoip-database-updates>>.
  19. If {es} can't connect to the endpoint for 30 days all updated databases will become
  20. invalid. {es} will stop enriching documents with geoip data and will add `tags: ["_geoip_expired_database"]`
  21. field instead.
  22. [[using-ingest-geoip]]
  23. ==== Using the `geoip` Processor in a Pipeline
  24. [[ingest-geoip-options]]
  25. .`geoip` options
  26. [options="header"]
  27. |======
  28. | Name | Required | Default | Description
  29. | `field` | yes | - | The field to get the ip address from for the geographical lookup.
  30. | `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
  31. | `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
  32. | `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
  33. | `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
  34. | `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
  35. | `download_database_on_pipeline_creation` | no | `true` | If `true` (and if `ingest.geoip.downloader.eager.download` is `false`), the missing database is downloaded when the pipeline is created. Else, the download is triggered by when the pipeline is used as the `default_pipeline` or `final_pipeline` in an index.
  36. |======
  37. *Depends on what is available in `database_file`:
  38. * If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
  39. `country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
  40. and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
  41. * If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
  42. `country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
  43. were configured in `properties`.
  44. * If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
  45. `asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
  46. in `properties`.
  47. Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
  48. [source,console]
  49. --------------------------------------------------
  50. PUT _ingest/pipeline/geoip
  51. {
  52. "description" : "Add geoip info",
  53. "processors" : [
  54. {
  55. "geoip" : {
  56. "field" : "ip"
  57. }
  58. }
  59. ]
  60. }
  61. PUT my-index-000001/_doc/my_id?pipeline=geoip
  62. {
  63. "ip": "89.160.20.128"
  64. }
  65. GET my-index-000001/_doc/my_id
  66. --------------------------------------------------
  67. Which returns:
  68. [source,console-result]
  69. --------------------------------------------------
  70. {
  71. "found": true,
  72. "_index": "my-index-000001",
  73. "_id": "my_id",
  74. "_version": 1,
  75. "_seq_no": 55,
  76. "_primary_term": 1,
  77. "_source": {
  78. "ip": "89.160.20.128",
  79. "geoip": {
  80. "continent_name": "Europe",
  81. "country_name": "Sweden",
  82. "country_iso_code": "SE",
  83. "city_name" : "Linköping",
  84. "region_iso_code" : "SE-E",
  85. "region_name" : "Östergötland County",
  86. "location": { "lat": 58.4167, "lon": 15.6167 }
  87. }
  88. }
  89. }
  90. --------------------------------------------------
  91. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
  92. Here is an example that uses the default country database and adds the
  93. geographical information to the `geo` field based on the `ip` field. Note that
  94. this database is included in the module. So this:
  95. [source,console]
  96. --------------------------------------------------
  97. PUT _ingest/pipeline/geoip
  98. {
  99. "description" : "Add geoip info",
  100. "processors" : [
  101. {
  102. "geoip" : {
  103. "field" : "ip",
  104. "target_field" : "geo",
  105. "database_file" : "GeoLite2-Country.mmdb"
  106. }
  107. }
  108. ]
  109. }
  110. PUT my-index-000001/_doc/my_id?pipeline=geoip
  111. {
  112. "ip": "89.160.20.128"
  113. }
  114. GET my-index-000001/_doc/my_id
  115. --------------------------------------------------
  116. returns this:
  117. [source,console-result]
  118. --------------------------------------------------
  119. {
  120. "found": true,
  121. "_index": "my-index-000001",
  122. "_id": "my_id",
  123. "_version": 1,
  124. "_seq_no": 65,
  125. "_primary_term": 1,
  126. "_source": {
  127. "ip": "89.160.20.128",
  128. "geo": {
  129. "continent_name": "Europe",
  130. "country_name": "Sweden",
  131. "country_iso_code": "SE"
  132. }
  133. }
  134. }
  135. --------------------------------------------------
  136. // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
  137. Not all IP addresses find geo information from the database, When this
  138. occurs, no `target_field` is inserted into the document.
  139. Here is an example of what documents will be indexed as when information for "80.231.5.0"
  140. cannot be found:
  141. [source,console]
  142. --------------------------------------------------
  143. PUT _ingest/pipeline/geoip
  144. {
  145. "description" : "Add geoip info",
  146. "processors" : [
  147. {
  148. "geoip" : {
  149. "field" : "ip"
  150. }
  151. }
  152. ]
  153. }
  154. PUT my-index-000001/_doc/my_id?pipeline=geoip
  155. {
  156. "ip": "80.231.5.0"
  157. }
  158. GET my-index-000001/_doc/my_id
  159. --------------------------------------------------
  160. Which returns:
  161. [source,console-result]
  162. --------------------------------------------------
  163. {
  164. "_index" : "my-index-000001",
  165. "_id" : "my_id",
  166. "_version" : 1,
  167. "_seq_no" : 71,
  168. "_primary_term": 1,
  169. "found" : true,
  170. "_source" : {
  171. "ip" : "80.231.5.0"
  172. }
  173. }
  174. --------------------------------------------------
  175. // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
  176. [[ingest-geoip-mappings-note]]
  177. ===== Recognizing Location as a Geopoint
  178. Although this processor enriches your document with a `location` field containing
  179. the estimated latitude and longitude of the IP address, this field will not be
  180. indexed as a {ref}/geo-point.html[`geo_point`] type in Elasticsearch without explicitly defining it
  181. as such in the mapping.
  182. You can use the following mapping for the example index above:
  183. [source,console]
  184. --------------------------------------------------
  185. PUT my_ip_locations
  186. {
  187. "mappings": {
  188. "properties": {
  189. "geoip": {
  190. "properties": {
  191. "location": { "type": "geo_point" }
  192. }
  193. }
  194. }
  195. }
  196. }
  197. --------------------------------------------------
  198. ////
  199. [source,console]
  200. --------------------------------------------------
  201. PUT _ingest/pipeline/geoip
  202. {
  203. "description" : "Add geoip info",
  204. "processors" : [
  205. {
  206. "geoip" : {
  207. "field" : "ip"
  208. }
  209. }
  210. ]
  211. }
  212. PUT my_ip_locations/_doc/1?refresh=true&pipeline=geoip
  213. {
  214. "ip": "89.160.20.128"
  215. }
  216. GET /my_ip_locations/_search
  217. {
  218. "query": {
  219. "bool": {
  220. "must": {
  221. "match_all": {}
  222. },
  223. "filter": {
  224. "geo_distance": {
  225. "distance": "1m",
  226. "geoip.location": {
  227. "lon": 15.6167,
  228. "lat": 58.4167
  229. }
  230. }
  231. }
  232. }
  233. }
  234. }
  235. --------------------------------------------------
  236. // TEST[continued]
  237. [source,console-result]
  238. --------------------------------------------------
  239. {
  240. "took" : 3,
  241. "timed_out" : false,
  242. "_shards" : {
  243. "total" : 1,
  244. "successful" : 1,
  245. "skipped" : 0,
  246. "failed" : 0
  247. },
  248. "hits" : {
  249. "total" : {
  250. "value": 1,
  251. "relation": "eq"
  252. },
  253. "max_score" : 1.0,
  254. "hits" : [
  255. {
  256. "_index" : "my_ip_locations",
  257. "_id" : "1",
  258. "_score" : 1.0,
  259. "_source" : {
  260. "geoip" : {
  261. "continent_name" : "Europe",
  262. "country_name" : "Sweden",
  263. "country_iso_code" : "SE",
  264. "city_name" : "Linköping",
  265. "region_iso_code" : "SE-E",
  266. "region_name" : "Östergötland County",
  267. "location" : {
  268. "lon" : 15.6167,
  269. "lat" : 58.4167
  270. }
  271. },
  272. "ip" : "89.160.20.128"
  273. }
  274. }
  275. ]
  276. }
  277. }
  278. --------------------------------------------------
  279. // TESTRESPONSE[s/"took" : 3/"took" : $body.took/]
  280. ////
  281. [[manage-geoip-database-updates]]
  282. ==== Manage your own GeoIP2 database updates
  283. If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
  284. databases from the Elastic endpoint, you have a few other options:
  285. * <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
  286. * <<use-custom-geoip-endpoint,Use a custom endpoint>>
  287. * <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
  288. [[use-proxy-geoip-endpoint]]
  289. **Use a proxy endpoint**
  290. If you can't connect directly to the Elastic GeoIP endpoint, consider setting up
  291. a secure proxy. You can then specify the proxy endpoint URL in the
  292. <<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
  293. of each node’s `elasticsearch.yml` file.
  294. In a strict setup the following domains may need to be added to the allowed
  295. domains list:
  296. * `geoip.elastic.co`
  297. * `storage.googleapis.com`
  298. [[use-custom-geoip-endpoint]]
  299. **Use a custom endpoint**
  300. You can create a service that mimics the Elastic GeoIP endpoint. You can then
  301. get automatic updates from this service.
  302. . Download your `.mmdb` database files from the
  303. http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
  304. . Copy your database files to a single directory.
  305. . From your {es} directory, run:
  306. +
  307. [source,sh]
  308. ----
  309. ./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]
  310. ----
  311. . Serve the static database files from your directory. For example, you can use
  312. Docker to serve the files from an nginx server:
  313. +
  314. [source,sh]
  315. ----
  316. docker run -v my/source/dir:/usr/share/nginx/html:ro nginx
  317. ----
  318. . Specify the service's endpoint URL in the
  319. <<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
  320. of each node’s `elasticsearch.yml` file.
  321. +
  322. By default, {es} checks the endpoint for updates every three days. To use
  323. another polling interval, use the <<cluster-update-settings,cluster update
  324. settings API>> to set
  325. <<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.
  326. [[manually-update-geoip-databases]]
  327. **Manually update your GeoIP2 databases**
  328. . Use the <<cluster-update-settings,cluster update settings API>> to set
  329. `ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
  330. that may overwrite your database changes. This also deletes all downloaded
  331. databases.
  332. . Download your `.mmdb` database files from the
  333. http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
  334. +
  335. You can also use custom city, country, and ASN `.mmdb` files. These files must
  336. be uncompressed and use the respective `-City.mmdb`, `-Country.mmdb`, or
  337. `-ASN.mmdb` extensions.
  338. . On {ess} deployments upload database using
  339. a {cloud}/ec-custom-bundles.html[custom bundle].
  340. . On self-managed deployments copy the database files to `$ES_CONFIG/ingest-geoip`.
  341. . In your `geoip` processors, configure the `database_file` parameter to use a
  342. custom database file.
  343. [[ingest-geoip-settings]]
  344. ===== Node Settings
  345. The `geoip` processor supports the following setting:
  346. `ingest.geoip.cache_size`::
  347. The maximum number of results that should be cached. Defaults to `1000`.
  348. Note that these settings are node settings and apply to all `geoip` processors, i.e. there is one cache for all defined `geoip` processors.
  349. [[geoip-cluster-settings]]
  350. ===== Cluster settings
  351. [[ingest-geoip-downloader-enabled]]
  352. `ingest.geoip.downloader.enabled`::
  353. (<<dynamic-cluster-setting,Dynamic>>, Boolean)
  354. If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
  355. from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
  356. updates and deletes all downloaded databases. Defaults to `true`.
  357. [[ingest-geoip-downloader-eager-download]]
  358. `ingest.geoip.downloader.eager.download`::
  359. (<<dynamic-cluster-setting,Dynamic>>, Boolean)
  360. If `true`, {es} downloads GeoIP2 databases immediately, regardless of whether a
  361. pipeline exists with a geoip processor. If `false`, {es} only begins downloading
  362. the databases if a pipeline with a geoip processor exists or is added. Defaults
  363. to `false`.
  364. [[ingest-geoip-downloader-endpoint]]
  365. `ingest.geoip.downloader.endpoint`::
  366. (<<static-cluster-setting,Static>>, string)
  367. Endpoint URL used to download updates for GeoIP2 databases. For example, `https://myDomain.com/overview.json`.
  368. Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
  369. each node's <<es-tmpdir,temporary directory>> at `$ES_TMPDIR/geoip-databases/<node_id>`.
  370. Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`,
  371. expecting the list of metadata about databases typically found in `overview.json`.
  372. [[ingest-geoip-downloader-poll-interval]]
  373. `ingest.geoip.downloader.poll.interval`::
  374. (<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
  375. How often {es} checks for GeoIP2 database updates at the
  376. `ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
  377. to `3d` (three days).