1
0

getting-started.asciidoc 34 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955
  1. [[getting-started]]
  2. = Getting started with {es}
  3. [partintro]
  4. --
  5. Ready to take {es} for a test drive and see for yourself how you can use the
  6. REST APIs to store, search, and analyze data?
  7. Follow this getting started tutorial to:
  8. . Get an {es} cluster up and running
  9. . Index some sample documents
  10. . Search for documents using the {es} query language
  11. . Analyze the results using bucket and metrics aggregations
  12. Need more context?
  13. Check out the <<elasticsearch-intro,
  14. Elasticsearch Introduction>> to learn the lingo and understand the basics of
  15. how {es} works. If you're already familiar with {es} and want to see how it works
  16. with the rest of the stack, you might want to jump to the
  17. {stack-gs}/get-started-elastic-stack.html[Elastic Stack
  18. Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
  19. {beats}, and {ls}.
  20. TIP: The fastest way to get started with {es} is to
  21. https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
  22. trial of Elasticsearch Service] in the cloud.
  23. --
  24. [[getting-started-install]]
  25. == Get {es} up and running
  26. To take {es} for a test drive, you can create a one-click cloud deployment
  27. on the https://www.elastic.co/cloud/elasticsearch-service/signup[Elasticsearch Service],
  28. or <<run-elasticsearch-local, set up a multi-node {es} cluster>> on your own
  29. Linux, macOS, or Windows machine.
  30. [float]
  31. [[run-elasticsearch-local]]
  32. === Run {es} locally on Linux, macOS, or Windows
  33. When you create a cluster on the Elasticsearch Service, you automatically
  34. get a three-node cluster. By installing from the tar or zip archive, you can
  35. start multiple instances of {es} locally to see how a multi-node cluster behaves.
  36. To run a three-node {es} cluster locally:
  37. . Download the Elasticsearch archive for your OS:
  38. +
  39. Linux: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz[elasticsearch-{version}-linux-x86_64.tar.gz]
  40. +
  41. ["source","sh",subs="attributes,callouts"]
  42. --------------------------------------------------
  43. curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz
  44. --------------------------------------------------
  45. // NOTCONSOLE
  46. +
  47. macOS: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-darwin-x86_64.tar.gz[elasticsearch-{version}-darwin-x86_64.tar.gz]
  48. +
  49. ["source","sh",subs="attributes,callouts"]
  50. --------------------------------------------------
  51. curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-darwin-x86_64.tar.gz
  52. --------------------------------------------------
  53. // NOTCONSOLE
  54. +
  55. Windows:
  56. https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-windows-x86_64.zip[elasticsearch-{version}-windows-x86_64.zip]
  57. . Extract the archive:
  58. +
  59. Linux:
  60. +
  61. ["source","sh",subs="attributes,callouts"]
  62. --------------------------------------------------
  63. tar -xvf elasticsearch-{version}-linux-x86_64.tar.gz
  64. --------------------------------------------------
  65. +
  66. macOS:
  67. +
  68. ["source","sh",subs="attributes,callouts"]
  69. --------------------------------------------------
  70. tar -xvf elasticsearch-{version}-darwin-x86_64.tar.gz
  71. --------------------------------------------------
  72. +
  73. Windows PowerShell:
  74. +
  75. ["source","powershell",subs="attributes,callouts"]
  76. --------------------------------------------------
  77. Expand-Archive elasticsearch-{version}-windows-x86_64.zip
  78. --------------------------------------------------
  79. . Start elasticsearch from the `bin` directory:
  80. +
  81. Linux and macOS:
  82. +
  83. ["source","sh",subs="attributes,callouts"]
  84. --------------------------------------------------
  85. cd elasticsearch-{version}/bin
  86. ./elasticsearch
  87. --------------------------------------------------
  88. +
  89. Windows:
  90. +
  91. ["source","powershell",subs="attributes,callouts"]
  92. --------------------------------------------------
  93. cd elasticsearch-{version}\bin
  94. .\elasticsearch.bat
  95. --------------------------------------------------
  96. +
  97. You now have a single-node {es} cluster up and running!
  98. . Start two more instances of {es} so you can see how a typical multi-node
  99. cluster behaves. You need to specify unique data and log paths
  100. for each node.
  101. +
  102. Linux and macOS:
  103. +
  104. ["source","sh",subs="attributes,callouts"]
  105. --------------------------------------------------
  106. ./elasticsearch -Epath.data=data2 -Epath.logs=log2
  107. ./elasticsearch -Epath.data=data3 -Epath.logs=log3
  108. --------------------------------------------------
  109. +
  110. Windows:
  111. +
  112. ["source","powershell",subs="attributes,callouts"]
  113. --------------------------------------------------
  114. .\elasticsearch.bat -E path.data=data2 -E path.logs=log2
  115. .\elasticsearch.bat -E path.data=data3 -E path.logs=log3
  116. --------------------------------------------------
  117. +
  118. The additional nodes are assigned unique IDs. Because you're running all three
  119. nodes locally, they automatically join the cluster with the first node.
  120. . Use the cat health API to verify that your three-node cluster is up running.
  121. The cat APIs return information about your cluster and indices in a
  122. format that's easier to read than raw JSON.
  123. +
  124. You can interact directly with your cluster by submitting HTTP requests to
  125. the {es} REST API. Most of the examples in this guide enable you to copy the
  126. appropriate cURL command and submit the request to your local {es} instance from
  127. the command line. If you have Kibana installed and running, you can also
  128. open Kibana and submit requests through the Dev Console.
  129. +
  130. TIP: You'll want to check out the
  131. https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} language
  132. clients] when you're ready to start using {es} in your own applications.
  133. +
  134. [source,js]
  135. --------------------------------------------------
  136. GET /_cat/health?v
  137. --------------------------------------------------
  138. // CONSOLE
  139. +
  140. The response should indicate that the status of the `elasticsearch` cluster
  141. is `green` and it has three nodes:
  142. +
  143. [source,txt]
  144. --------------------------------------------------
  145. epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
  146. 1565052807 00:53:27 elasticsearch green 3 3 6 3 0 0 0 0 - 100.0%
  147. --------------------------------------------------
  148. // TESTRESPONSE[s/1565052807 00:53:27 elasticsearch/\\d+ \\d+:\\d+:\\d+ integTest/]
  149. // TESTRESPONSE[s/3 3 6 3/\\d+ \\d+ \\d+ \\d+/]
  150. // TESTRESPONSE[s/0 0 -/0 \\d+ -/]
  151. // TESTRESPONSE[non_json]
  152. +
  153. NOTE: The cluster status will remain yellow if you are only running a single
  154. instance of {es}. A single node cluster is fully functional, but data
  155. cannot be replicated to another node to provide resiliency. Replica shards must
  156. be available for the cluster status to be green. If the cluster status is red,
  157. some data is unavailable.
  158. [float]
  159. [[gs-other-install]]
  160. === Other installation options
  161. Installing {es} from an archive file enables you to easily install and run
  162. multiple instances locally so you can try things out. To run a single instance,
  163. you can run {es} in a Docker container, install {es} using the DEB or RPM
  164. packages on Linux, install using Homebrew on macOS, or install using the MSI
  165. package installer on Windows. See <<install-elasticsearch>> for more information.
  166. [[getting-started-index]]
  167. == Index some documents
  168. Once you have a cluster up and running, you're ready to index some data.
  169. There are a variety of ingest options for {es}, but in the end they all
  170. do the same thing: put JSON documents into an {es} index.
  171. You can do this directly with a simple PUT request that specifies
  172. the index you want to add the document, a unique document ID, and one or more
  173. `"field": "value"` pairs in the request body:
  174. [source,js]
  175. --------------------------------------------------
  176. PUT /customer/_doc/1
  177. {
  178. "name": "John Doe"
  179. }
  180. --------------------------------------------------
  181. // CONSOLE
  182. This request automatically creates the `customer` index if it doesn't already
  183. exist, adds a new document that has an ID of `1`, and stores and
  184. indexes the `name` field.
  185. Since this is a new document, the response shows that the result of the
  186. operation was that version 1 of the document was created:
  187. [source,js]
  188. --------------------------------------------------
  189. {
  190. "_index" : "customer",
  191. "_type" : "_doc",
  192. "_id" : "1",
  193. "_version" : 1,
  194. "result" : "created",
  195. "_shards" : {
  196. "total" : 2,
  197. "successful" : 2,
  198. "failed" : 0
  199. },
  200. "_seq_no" : 26,
  201. "_primary_term" : 4
  202. }
  203. --------------------------------------------------
  204. // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/]
  205. // TESTRESPONSE[s/"successful" : \d+/"successful" : $body._shards.successful/]
  206. // TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
  207. The new document is available immediately from any node in the cluster.
  208. You can retrieve it with a GET request that specifies its document ID:
  209. [source,js]
  210. --------------------------------------------------
  211. GET /customer/_doc/1
  212. --------------------------------------------------
  213. // CONSOLE
  214. // TEST[continued]
  215. The response indicates that a document with the specified ID was found
  216. and shows the original source fields that were indexed.
  217. [source,js]
  218. --------------------------------------------------
  219. {
  220. "_index" : "customer",
  221. "_type" : "_doc",
  222. "_id" : "1",
  223. "_version" : 1,
  224. "_seq_no" : 26,
  225. "_primary_term" : 4,
  226. "found" : true,
  227. "_source" : {
  228. "name": "John Doe"
  229. }
  230. }
  231. --------------------------------------------------
  232. // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ ]
  233. // TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
  234. [float]
  235. [[getting-started-batch-processing]]
  236. === Indexing documents in bulk
  237. If you have a lot of documents to index, you can submit them in batches with
  238. the {ref}/docs-bulk.html[bulk API]. Using bulk to batch document
  239. operations is significantly faster than submitting requests individually as it minimizes network roundtrips.
  240. The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents
  241. and a total payload between 5MB and 15MB. From there, you can experiment
  242. to find the sweet spot.
  243. To get some data into {es} that you can start searching and analyzing:
  244. . Download the https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[`accounts.json`] sample data set. The documents in this randomly-generated data set represent user accounts with the following information:
  245. +
  246. [source,js]
  247. --------------------------------------------------
  248. {
  249. "account_number": 0,
  250. "balance": 16623,
  251. "firstname": "Bradshaw",
  252. "lastname": "Mckenzie",
  253. "age": 29,
  254. "gender": "F",
  255. "address": "244 Columbus Place",
  256. "employer": "Euron",
  257. "email": "bradshawmckenzie@euron.com",
  258. "city": "Hobucken",
  259. "state": "CO"
  260. }
  261. --------------------------------------------------
  262. // NOTCONSOLE
  263. . Index the account data into the `bank` index with the following `_bulk` request:
  264. +
  265. [source,sh]
  266. --------------------------------------------------
  267. curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
  268. curl "localhost:9200/_cat/indices?v"
  269. --------------------------------------------------
  270. // NOTCONSOLE
  271. +
  272. ////
  273. This replicates the above in a document-testing friendly way but isn't visible
  274. in the docs:
  275. +
  276. [source,js]
  277. --------------------------------------------------
  278. GET /_cat/indices?v
  279. --------------------------------------------------
  280. // CONSOLE
  281. // TEST[setup:bank]
  282. ////
  283. +
  284. The response indicates that 1,000 documents were indexed successfully.
  285. +
  286. [source,txt]
  287. --------------------------------------------------
  288. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  289. yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb
  290. --------------------------------------------------
  291. // TESTRESPONSE[s/128.6kb/\\d+(\\.\\d+)?[mk]?b/]
  292. // TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ non_json]
  293. [[getting-started-search]]
  294. == Start searching
  295. Now let's start with some simple searches. There are two basic ways to run searches: one is by sending search parameters through the {ref}/search-uri-request.html[REST request URI] and the other by sending them through the {ref}/search-request-body.html[REST request body]. The request body method allows you to be more expressive and also to define your searches in a more readable JSON format. We'll try one example of the request URI method but for the remainder of this tutorial, we will exclusively be using the request body method.
  296. The REST API for search is accessible from the `_search` endpoint. This example returns all documents in the bank index:
  297. [source,js]
  298. --------------------------------------------------
  299. GET /bank/_search?q=*&sort=account_number:asc&pretty
  300. --------------------------------------------------
  301. // CONSOLE
  302. // TEST[continued]
  303. Let's first dissect the search call. We are searching (`_search` endpoint) in the bank index, and the `q=*` parameter instructs Elasticsearch to match all documents in the index. The `sort=account_number:asc` parameter indicates to sort the results using the `account_number` field of each document in an ascending order. The `pretty` parameter, again, just tells Elasticsearch to return pretty-printed JSON results.
  304. And the response (partially shown):
  305. [source,js]
  306. --------------------------------------------------
  307. {
  308. "took" : 63,
  309. "timed_out" : false,
  310. "_shards" : {
  311. "total" : 5,
  312. "successful" : 5,
  313. "skipped" : 0,
  314. "failed" : 0
  315. },
  316. "hits" : {
  317. "total" : {
  318. "value": 1000,
  319. "relation": "eq"
  320. },
  321. "max_score" : null,
  322. "hits" : [ {
  323. "_index" : "bank",
  324. "_type" : "_doc",
  325. "_id" : "0",
  326. "sort": [0],
  327. "_score" : null,
  328. "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
  329. }, {
  330. "_index" : "bank",
  331. "_type" : "_doc",
  332. "_id" : "1",
  333. "sort": [1],
  334. "_score" : null,
  335. "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
  336. }, ...
  337. ]
  338. }
  339. }
  340. --------------------------------------------------
  341. // TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
  342. // TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
  343. As for the response, we see the following parts:
  344. * `took` – time in milliseconds for Elasticsearch to execute the search
  345. * `timed_out` – tells us if the search timed out or not
  346. * `_shards` – tells us how many shards were searched, as well as a count of the successful/failed searched shards
  347. * `hits` – search results
  348. * `hits.total` – an object that contains information about the total number of documents matching our search criteria
  349. ** `hits.total.value` - the value of the total hit count (must be interpreted in the context of `hits.total.relation`).
  350. ** `hits.total.relation` - whether `hits.total.value` is the exact hit count, in which case it is equal to `"eq"` or a
  351. lower bound of the total hit count (greater than or equals), in which case it is equal to `gte`.
  352. * `hits.hits` – actual array of search results (defaults to first 10 documents)
  353. * `hits.sort` - sort value of the sort key for each result (missing if sorting by score)
  354. * `hits._score` and `max_score` - ignore these fields for now
  355. The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
  356. the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
  357. which means that the total hit count is accurately tracked up to `10,000` documents.
  358. You can force an accurate count by setting `track_total_hits` to true explicitly.
  359. See the <<request-body-search-track-total-hits, request body>> documentation
  360. for more details.
  361. Here is the same exact search above using the alternative request body method:
  362. [source,js]
  363. --------------------------------------------------
  364. GET /bank/_search
  365. {
  366. "query": { "match_all": {} },
  367. "sort": [
  368. { "account_number": "asc" }
  369. ]
  370. }
  371. --------------------------------------------------
  372. // CONSOLE
  373. // TEST[continued]
  374. The difference here is that instead of passing `q=*` in the URI, we provide a JSON-style query request body to the `_search` API. We'll discuss this JSON query in the next section.
  375. ////
  376. Hidden response just so we can assert that it is indeed the same but don't have
  377. to clutter the docs with it:
  378. [source,js]
  379. --------------------------------------------------
  380. {
  381. "took" : 63,
  382. "timed_out" : false,
  383. "_shards" : {
  384. "total" : 5,
  385. "successful" : 5,
  386. "skipped" : 0,
  387. "failed" : 0
  388. },
  389. "hits" : {
  390. "total" : {
  391. "value": 1000,
  392. "relation": "eq"
  393. },
  394. "max_score": null,
  395. "hits" : [ {
  396. "_index" : "bank",
  397. "_type" : "_doc",
  398. "_id" : "0",
  399. "sort": [0],
  400. "_score": null,
  401. "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
  402. }, {
  403. "_index" : "bank",
  404. "_type" : "_doc",
  405. "_id" : "1",
  406. "sort": [1],
  407. "_score": null,
  408. "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
  409. }, ...
  410. ]
  411. }
  412. }
  413. --------------------------------------------------
  414. // TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
  415. // TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
  416. ////
  417. It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor.
  418. [float]
  419. [[getting-started-query-lang]]
  420. === Introducing the Query Language
  421. Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries. This is referred to as the {ref}/query-dsl.html[Query DSL]. The query language is quite comprehensive and can be intimidating at first glance but the best way to actually learn it is to start with a few basic examples.
  422. Going back to our last example, we executed this query:
  423. [source,js]
  424. --------------------------------------------------
  425. GET /bank/_search
  426. {
  427. "query": { "match_all": {} }
  428. }
  429. --------------------------------------------------
  430. // CONSOLE
  431. // TEST[continued]
  432. Dissecting the above, the `query` part tells us what our query definition is and the `match_all` part is simply the type of query that we want to run. The `match_all` query is simply a search for all documents in the specified index.
  433. In addition to the `query` parameter, we also can pass other parameters to
  434. influence the search results. In the example in the section above we passed in
  435. `sort`, here we pass in `size`:
  436. [source,js]
  437. --------------------------------------------------
  438. GET /bank/_search
  439. {
  440. "query": { "match_all": {} },
  441. "size": 1
  442. }
  443. --------------------------------------------------
  444. // CONSOLE
  445. // TEST[continued]
  446. Note that if `size` is not specified, it defaults to 10.
  447. This example does a `match_all` and returns documents 10 through 19:
  448. [source,js]
  449. --------------------------------------------------
  450. GET /bank/_search
  451. {
  452. "query": { "match_all": {} },
  453. "from": 10,
  454. "size": 10
  455. }
  456. --------------------------------------------------
  457. // CONSOLE
  458. // TEST[continued]
  459. The `from` parameter (0-based) specifies which document index to start from and the `size` parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if `from` is not specified, it defaults to 0.
  460. This example does a `match_all` and sorts the results by account balance in descending order and returns the top 10 (default size) documents.
  461. [source,js]
  462. --------------------------------------------------
  463. GET /bank/_search
  464. {
  465. "query": { "match_all": {} },
  466. "sort": { "balance": { "order": "desc" } }
  467. }
  468. --------------------------------------------------
  469. // CONSOLE
  470. // TEST[continued]
  471. Now that we have seen a few of the basic search parameters, let's dig in some more into the Query DSL. Let's first take a look at the returned document fields. By default, the full JSON document is returned as part of all searches. This is referred to as the source (`_source` field in the search hits). If we don't want the entire source document returned, we have the ability to request only a few fields from within source to be returned.
  472. This example shows how to return two fields, `account_number` and `balance` (inside of `_source`), from the search:
  473. [source,js]
  474. --------------------------------------------------
  475. GET /bank/_search
  476. {
  477. "query": { "match_all": {} },
  478. "_source": ["account_number", "balance"]
  479. }
  480. --------------------------------------------------
  481. // CONSOLE
  482. // TEST[continued]
  483. Note that the above example simply reduces the `_source` field. It will still only return one field named `_source` but within it, only the fields `account_number` and `balance` are included.
  484. If you come from a SQL background, the above is somewhat similar in concept to the `SQL SELECT FROM` field list.
  485. Now let's move on to the query part. Previously, we've seen how the `match_all` query is used to match all documents. Let's now introduce a new query called the {ref}/query-dsl-match-query.html[`match` query], which can be thought of as a basic fielded search query (i.e. a search done against a specific field or set of fields).
  486. This example returns the account numbered 20:
  487. [source,js]
  488. --------------------------------------------------
  489. GET /bank/_search
  490. {
  491. "query": { "match": { "account_number": 20 } }
  492. }
  493. --------------------------------------------------
  494. // CONSOLE
  495. // TEST[continued]
  496. This example returns all accounts containing the term "mill" in the address:
  497. [source,js]
  498. --------------------------------------------------
  499. GET /bank/_search
  500. {
  501. "query": { "match": { "address": "mill" } }
  502. }
  503. --------------------------------------------------
  504. // CONSOLE
  505. // TEST[continued]
  506. This example returns all accounts containing the term "mill" or "lane" in the address:
  507. [source,js]
  508. --------------------------------------------------
  509. GET /bank/_search
  510. {
  511. "query": { "match": { "address": "mill lane" } }
  512. }
  513. --------------------------------------------------
  514. // CONSOLE
  515. // TEST[continued]
  516. This example is a variant of `match` (`match_phrase`) that returns all accounts containing the phrase "mill lane" in the address:
  517. [source,js]
  518. --------------------------------------------------
  519. GET /bank/_search
  520. {
  521. "query": { "match_phrase": { "address": "mill lane" } }
  522. }
  523. --------------------------------------------------
  524. // CONSOLE
  525. // TEST[continued]
  526. Let's now introduce the {ref}/query-dsl-bool-query.html[`bool` query]. The `bool` query allows us to compose smaller queries into bigger queries using boolean logic.
  527. This example composes two `match` queries and returns all accounts containing "mill" and "lane" in the address:
  528. [source,js]
  529. --------------------------------------------------
  530. GET /bank/_search
  531. {
  532. "query": {
  533. "bool": {
  534. "must": [
  535. { "match": { "address": "mill" } },
  536. { "match": { "address": "lane" } }
  537. ]
  538. }
  539. }
  540. }
  541. --------------------------------------------------
  542. // CONSOLE
  543. // TEST[continued]
  544. In the above example, the `bool must` clause specifies all the queries that must be true for a document to be considered a match.
  545. In contrast, this example composes two `match` queries and returns all accounts containing "mill" or "lane" in the address:
  546. [source,js]
  547. --------------------------------------------------
  548. GET /bank/_search
  549. {
  550. "query": {
  551. "bool": {
  552. "should": [
  553. { "match": { "address": "mill" } },
  554. { "match": { "address": "lane" } }
  555. ]
  556. }
  557. }
  558. }
  559. --------------------------------------------------
  560. // CONSOLE
  561. // TEST[continued]
  562. In the above example, the `bool should` clause specifies a list of queries either of which must be true for a document to be considered a match.
  563. This example composes two `match` queries and returns all accounts that contain neither "mill" nor "lane" in the address:
  564. [source,js]
  565. --------------------------------------------------
  566. GET /bank/_search
  567. {
  568. "query": {
  569. "bool": {
  570. "must_not": [
  571. { "match": { "address": "mill" } },
  572. { "match": { "address": "lane" } }
  573. ]
  574. }
  575. }
  576. }
  577. --------------------------------------------------
  578. // CONSOLE
  579. // TEST[continued]
  580. In the above example, the `bool must_not` clause specifies a list of queries none of which must be true for a document to be considered a match.
  581. We can combine `must`, `should`, and `must_not` clauses simultaneously inside a `bool` query. Furthermore, we can compose `bool` queries inside any of these `bool` clauses to mimic any complex multi-level boolean logic.
  582. This example returns all accounts of anybody who is 40 years old but doesn't live in ID(aho):
  583. [source,js]
  584. --------------------------------------------------
  585. GET /bank/_search
  586. {
  587. "query": {
  588. "bool": {
  589. "must": [
  590. { "match": { "age": "40" } }
  591. ],
  592. "must_not": [
  593. { "match": { "state": "ID" } }
  594. ]
  595. }
  596. }
  597. }
  598. --------------------------------------------------
  599. // CONSOLE
  600. // TEST[continued]
  601. [float]
  602. [[getting-started-filters]]
  603. === Executing filters
  604. In the previous section, we skipped over a little detail called the document score (`_score` field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is.
  605. But queries do not always need to produce scores, in particular when they are only used for "filtering" the document set. Elasticsearch detects these situations and automatically optimizes query execution in order not to compute useless scores.
  606. The {ref}/query-dsl-bool-query.html[`bool` query] that we introduced in the previous section also supports `filter` clauses which allow us to use a query to restrict the documents that will be matched by other clauses, without changing how scores are computed. As an example, let's introduce the {ref}/query-dsl-range-query.html[`range` query], which allows us to filter documents by a range of values. This is generally used for numeric or date filtering.
  607. This example uses a bool query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
  608. [source,js]
  609. --------------------------------------------------
  610. GET /bank/_search
  611. {
  612. "query": {
  613. "bool": {
  614. "must": { "match_all": {} },
  615. "filter": {
  616. "range": {
  617. "balance": {
  618. "gte": 20000,
  619. "lte": 30000
  620. }
  621. }
  622. }
  623. }
  624. }
  625. }
  626. --------------------------------------------------
  627. // CONSOLE
  628. // TEST[continued]
  629. Dissecting the above, the bool query contains a `match_all` query (the query part) and a `range` query (the filter part). We can substitute any other queries into the query and the filter parts. In the above case, the range query makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.
  630. In addition to the `match_all`, `match`, `bool`, and `range` queries, there are a lot of other query types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query types.
  631. [[getting-started-aggregations]]
  632. == Analyze results with aggregations
  633. Aggregations provide the ability to group and extract statistics from your data. The easiest way to think about aggregations is by roughly equating it to the SQL GROUP BY and the SQL aggregate functions. In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.
  634. To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default):
  635. [source,js]
  636. --------------------------------------------------
  637. GET /bank/_search
  638. {
  639. "size": 0,
  640. "aggs": {
  641. "group_by_state": {
  642. "terms": {
  643. "field": "state.keyword"
  644. }
  645. }
  646. }
  647. }
  648. --------------------------------------------------
  649. // CONSOLE
  650. // TEST[continued]
  651. In SQL, the above aggregation is similar in concept to:
  652. [source,sh]
  653. --------------------------------------------------
  654. SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
  655. --------------------------------------------------
  656. And the response (partially shown):
  657. [source,js]
  658. --------------------------------------------------
  659. {
  660. "took": 29,
  661. "timed_out": false,
  662. "_shards": {
  663. "total": 5,
  664. "successful": 5,
  665. "skipped" : 0,
  666. "failed": 0
  667. },
  668. "hits" : {
  669. "total" : {
  670. "value": 1000,
  671. "relation": "eq"
  672. },
  673. "max_score" : null,
  674. "hits" : [ ]
  675. },
  676. "aggregations" : {
  677. "group_by_state" : {
  678. "doc_count_error_upper_bound": 20,
  679. "sum_other_doc_count": 770,
  680. "buckets" : [ {
  681. "key" : "ID",
  682. "doc_count" : 27
  683. }, {
  684. "key" : "TX",
  685. "doc_count" : 27
  686. }, {
  687. "key" : "AL",
  688. "doc_count" : 25
  689. }, {
  690. "key" : "MD",
  691. "doc_count" : 25
  692. }, {
  693. "key" : "TN",
  694. "doc_count" : 23
  695. }, {
  696. "key" : "MA",
  697. "doc_count" : 21
  698. }, {
  699. "key" : "NC",
  700. "doc_count" : 21
  701. }, {
  702. "key" : "ND",
  703. "doc_count" : 21
  704. }, {
  705. "key" : "ME",
  706. "doc_count" : 20
  707. }, {
  708. "key" : "MO",
  709. "doc_count" : 20
  710. } ]
  711. }
  712. }
  713. }
  714. --------------------------------------------------
  715. // TESTRESPONSE[s/"took": 29/"took": $body.took/]
  716. We can see that there are 27 accounts in `ID` (Idaho), followed by 27 accounts
  717. in `TX` (Texas), followed by 25 accounts in `AL` (Alabama), and so forth.
  718. Note that we set `size=0` to not show search hits because we only want to see the aggregation results in the response.
  719. Building on the previous aggregation, this example calculates the average account balance by state (again only for the top 10 states sorted by count in descending order):
  720. [source,js]
  721. --------------------------------------------------
  722. GET /bank/_search
  723. {
  724. "size": 0,
  725. "aggs": {
  726. "group_by_state": {
  727. "terms": {
  728. "field": "state.keyword"
  729. },
  730. "aggs": {
  731. "average_balance": {
  732. "avg": {
  733. "field": "balance"
  734. }
  735. }
  736. }
  737. }
  738. }
  739. }
  740. --------------------------------------------------
  741. // CONSOLE
  742. // TEST[continued]
  743. Notice how we nested the `average_balance` aggregation inside the `group_by_state` aggregation. This is a common pattern for all the aggregations. You can nest aggregations inside aggregations arbitrarily to extract pivoted summarizations that you require from your data.
  744. Building on the previous aggregation, let's now sort on the average balance in descending order:
  745. [source,js]
  746. --------------------------------------------------
  747. GET /bank/_search
  748. {
  749. "size": 0,
  750. "aggs": {
  751. "group_by_state": {
  752. "terms": {
  753. "field": "state.keyword",
  754. "order": {
  755. "average_balance": "desc"
  756. }
  757. },
  758. "aggs": {
  759. "average_balance": {
  760. "avg": {
  761. "field": "balance"
  762. }
  763. }
  764. }
  765. }
  766. }
  767. }
  768. --------------------------------------------------
  769. // CONSOLE
  770. // TEST[continued]
  771. This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender:
  772. [source,js]
  773. --------------------------------------------------
  774. GET /bank/_search
  775. {
  776. "size": 0,
  777. "aggs": {
  778. "group_by_age": {
  779. "range": {
  780. "field": "age",
  781. "ranges": [
  782. {
  783. "from": 20,
  784. "to": 30
  785. },
  786. {
  787. "from": 30,
  788. "to": 40
  789. },
  790. {
  791. "from": 40,
  792. "to": 50
  793. }
  794. ]
  795. },
  796. "aggs": {
  797. "group_by_gender": {
  798. "terms": {
  799. "field": "gender.keyword"
  800. },
  801. "aggs": {
  802. "average_balance": {
  803. "avg": {
  804. "field": "balance"
  805. }
  806. }
  807. }
  808. }
  809. }
  810. }
  811. }
  812. }
  813. --------------------------------------------------
  814. // CONSOLE
  815. // TEST[continued]
  816. There are many other aggregations capabilities that we won't go into detail here. The {ref}/search-aggregations.html[aggregations reference guide] is a great starting point if you want to do further experimentation.
  817. [[getting-started-next-steps]]
  818. == Where to go from here
  819. Now that you've set up a cluster, indexed some documents, and run some
  820. searches and aggregations, you might want to:
  821. * {stack-gs}/get-started-elastic-stack.html#install-kibana[Dive in to the Elastic
  822. Stack Tutorial] to install Kibana, Logstash, and Beats and
  823. set up a basic system monitoring solution.
  824. * {kibana-ref}/add-sample-data.html[Load one of the sample data sets into Kibana]
  825. to see how you can use {es} and Kibana together to visualize your data.
  826. * Try out one of the Elastic search solutions:
  827. ** https://swiftype.com/documentation/site-search/crawler-quick-start[Site Search]
  828. ** https://swiftype.com/documentation/app-search/getting-started[App Search]
  829. ** https://swiftype.com/documentation/enterprise-search/getting-started[Enterprise Search]