1
0

getting-started.asciidoc 51 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250
  1. [[getting-started]]
  2. = Getting started with {es}
  3. [partintro]
  4. --
  5. Ready to take {es} for a test drive and see for yourself how you can use the
  6. REST APIs to store, search, and analyze data?
  7. Follow this getting started tutorial to:
  8. . Get an {es} instance up and running
  9. . Index some sample documents
  10. . Search for documents using the {es} query language
  11. . Analyze the results using bucket and metrics aggregations
  12. Need more context?
  13. Check out the <<elasticsearch-intro,
  14. Elasticsearch Introduction>> to learn the lingo and understand the basics of
  15. how {es} works. If you're already familiar with {es} and want to see how it works
  16. with the rest of the stack, you might want to jump to the
  17. {stack-gs}/get-started-elastic-stack.html[Elastic Stack
  18. Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
  19. {beats}, and {ls}.
  20. TIP: The fastest way to get started with {es} is to
  21. https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
  22. trial of Elasticsearch Service] in the cloud.
  23. --
  24. [[getting-started-install]]
  25. == Get {es} up and running
  26. To take {es} for a test drive, you can create a one-click cloud deployment on
  27. the https://www.elastic.co/cloud/elasticsearch-service/signup[Elasticsearch
  28. Service], or set up a basic {es} cluster on your own
  29. <<run-elasticsearch-linux,Linux>>,
  30. macOS, or
  31. <<run-elasticsearch-win,Windows>> machine.
  32. [float]
  33. [[run-elasticsearch-linux]]
  34. === Run {es} on Linux
  35. For simplicity, let's use the {ref}/targz.html[tar] file.
  36. Let's download the Elasticsearch {version} Linux tar as follows:
  37. ["source","sh",subs="attributes,callouts"]
  38. --------------------------------------------------
  39. curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz
  40. --------------------------------------------------
  41. // NOTCONSOLE
  42. Then extract it as follows:
  43. ["source","sh",subs="attributes,callouts"]
  44. --------------------------------------------------
  45. tar -xvf elasticsearch-{version}-linux-x86_64.tar.gz
  46. --------------------------------------------------
  47. It will then create a bunch of files and folders in your current directory. We then go into the bin directory as follows:
  48. ["source","sh",subs="attributes,callouts"]
  49. --------------------------------------------------
  50. cd elasticsearch-{version}/bin
  51. --------------------------------------------------
  52. And now we are ready to start our node and single cluster:
  53. [source,sh]
  54. --------------------------------------------------
  55. ./elasticsearch
  56. --------------------------------------------------
  57. [float]
  58. [[run-elasticsearch-win]]
  59. === Run {es} on Windows
  60. For Windows users, we recommend using the {ref}/windows.html[MSI Installer package]. The package contains a graphical user interface (GUI) that guides you through the installation process.
  61. First, download the Elasticsearch {version} MSI from
  62. https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}.msi.
  63. Then double-click the downloaded file to launch the GUI. Within the first screen, select the deployment directories:
  64. [[getting-started-msi-installer-locations]]
  65. image::images/msi_installer/msi_installer_locations.png[]
  66. Then select whether to install as a service or start Elasticsearch manually as needed.
  67. To align with the Linux example, choose not to install as a service:
  68. [[getting-started-msi-installer-service]]
  69. image::images/msi_installer/msi_installer_no_service.png[]
  70. For configuration, simply leave the default values:
  71. [[getting-started-msi-installer-configuration]]
  72. image::images/msi_installer/msi_installer_configuration.png[]
  73. Again, to align with the tar example, uncheck all plugins to not install any plugins:
  74. [[getting-started-msi-installer-plugins]]
  75. image::images/msi_installer/msi_installer_plugins.png[]
  76. After clicking the install button, Elasticsearch will be installed:
  77. [[getting-started-msi-installer-success]]
  78. image::images/msi_installer/msi_installer_success.png[]
  79. By default, Elasticsearch will be installed at `%PROGRAMFILES%\Elastic\Elasticsearch`. Navigate here and go into the bin directory as follows:
  80. **with Command Prompt:**
  81. [source,sh]
  82. --------------------------------------------------
  83. cd %PROGRAMFILES%\Elastic\Elasticsearch\bin
  84. --------------------------------------------------
  85. **with PowerShell:**
  86. [source,powershell]
  87. --------------------------------------------------
  88. cd $env:PROGRAMFILES\Elastic\Elasticsearch\bin
  89. --------------------------------------------------
  90. And now we are ready to start our node and single cluster:
  91. [source,sh]
  92. --------------------------------------------------
  93. .\elasticsearch.exe
  94. --------------------------------------------------
  95. [float]
  96. [[successfully-running-node]]
  97. === Verify that {es} is running
  98. If everything goes well with installation, you should see a bunch of messages that look like below:
  99. ["source","sh",subs="attributes,callouts"]
  100. --------------------------------------------------
  101. [2018-09-13T12:20:01,766][INFO ][o.e.e.NodeEnvironment ] [localhost.localdomain] using [1] data paths, mounts [[/home (/dev/mapper/fedora-home)]], net usable_space [335.3gb], net total_space [410.3gb], types [ext4]
  102. [2018-09-13T12:20:01,772][INFO ][o.e.e.NodeEnvironment ] [localhost.localdomain] heap size [990.7mb], compressed ordinary object pointers [true]
  103. [2018-09-13T12:20:01,774][INFO ][o.e.n.Node ] [localhost.localdomain] node name [localhost.localdomain], node ID [B0aEHNagTiWx7SYj-l4NTw]
  104. [2018-09-13T12:20:01,775][INFO ][o.e.n.Node ] [localhost.localdomain] version[{version}], pid[13030], build[oss/zip/77fc20e/2018-09-13T15:37:57.478402Z], OS[Linux/4.16.11-100.fc26.x86_64/amd64], JVM["Oracle Corporation"/OpenJDK 64-Bit Server VM/10/10+46]
  105. [2018-09-13T12:20:01,775][INFO ][o.e.n.Node ] [localhost.localdomain] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.LN1ctLCi, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Dio.netty.allocator.type=unpooled, -Des.path.home=/home/manybubbles/Workspaces/Elastic/master/elasticsearch/qa/unconfigured-node-name/build/cluster/integTestCluster node0/elasticsearch-7.0.0-alpha1-SNAPSHOT, -Des.path.conf=/home/manybubbles/Workspaces/Elastic/master/elasticsearch/qa/unconfigured-node-name/build/cluster/integTestCluster node0/elasticsearch-7.0.0-alpha1-SNAPSHOT/config, -Des.distribution.flavor=oss, -Des.distribution.type=zip]
  106. [2018-09-13T12:20:02,543][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [aggs-matrix-stats]
  107. [2018-09-13T12:20:02,543][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [analysis-common]
  108. [2018-09-13T12:20:02,543][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [ingest-common]
  109. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [lang-expression]
  110. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [lang-mustache]
  111. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [lang-painless]
  112. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [mapper-extras]
  113. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [parent-join]
  114. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [percolator]
  115. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [rank-eval]
  116. [2018-09-13T12:20:02,544][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [reindex]
  117. [2018-09-13T12:20:02,545][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [repository-url]
  118. [2018-09-13T12:20:02,545][INFO ][o.e.p.PluginsService ] [localhost.localdomain] loaded module [transport-netty4]
  119. [2018-09-13T12:20:02,545][INFO ][o.e.p.PluginsService ] [localhost.localdomain] no plugins loaded
  120. [2018-09-13T12:20:04,657][INFO ][o.e.d.DiscoveryModule ] [localhost.localdomain] using discovery type [zen]
  121. [2018-09-13T12:20:05,006][INFO ][o.e.n.Node ] [localhost.localdomain] initialized
  122. [2018-09-13T12:20:05,007][INFO ][o.e.n.Node ] [localhost.localdomain] starting ...
  123. [2018-09-13T12:20:05,202][INFO ][o.e.t.TransportService ] [localhost.localdomain] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
  124. [2018-09-13T12:20:05,221][WARN ][o.e.b.BootstrapChecks ] [localhost.localdomain] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
  125. [2018-09-13T12:20:05,221][WARN ][o.e.b.BootstrapChecks ] [localhost.localdomain] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
  126. [2018-09-13T12:20:08,355][INFO ][o.e.c.s.MasterService ] [localhost.localdomain] elected-as-master ([0] nodes joined)[, ], reason: master node changed {previous [], current [{localhost.localdomain}{B0aEHNagTiWx7SYj-l4NTw}{hzsQz6CVQMCTpMCVLM4IHg}{127.0.0.1}{127.0.0.1:9300}{testattr=test}]}
  127. [2018-09-13T12:20:08,360][INFO ][o.e.c.s.ClusterApplierService] [localhost.localdomain] master node changed {previous [], current [{localhost.localdomain}{B0aEHNagTiWx7SYj-l4NTw}{hzsQz6CVQMCTpMCVLM4IHg}{127.0.0.1}{127.0.0.1:9300}{testattr=test}]}, reason: apply cluster state (from master [master {localhost.localdomain}{B0aEHNagTiWx7SYj-l4NTw}{hzsQz6CVQMCTpMCVLM4IHg}{127.0.0.1}{127.0.0.1:9300}{testattr=test} committed version [1] source [elected-as-master ([0] nodes joined)[, ]]])
  128. [2018-09-13T12:20:08,384][INFO ][o.e.h.n.Netty4HttpServerTransport] [localhost.localdomain] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
  129. [2018-09-13T12:20:08,384][INFO ][o.e.n.Node ] [localhost.localdomain] started
  130. --------------------------------------------------
  131. Without going too much into detail, we can see that our node named "localhost.localdomain" has started and elected itself as a master in a single cluster. Don't worry yet at the moment what master means. The main thing that is important here is that we have started one node within one cluster.
  132. As mentioned previously, we can override either the cluster or node name. This can be done from the command line when starting Elasticsearch as follows:
  133. [source,sh]
  134. --------------------------------------------------
  135. ./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name
  136. --------------------------------------------------
  137. Also note the line marked http with information about the HTTP address (`192.168.8.112`) and port (`9200`) that our node is reachable from. By default, Elasticsearch uses port `9200` to provide access to its REST API. This port is configurable if necessary.
  138. [[getting-started-explore]]
  139. == Exploring Your Cluster
  140. [float]
  141. === The REST API
  142. Now that we have our node (and cluster) up and running, the next step is to understand how to communicate with it. Fortunately, Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
  143. * Check your cluster, node, and index health, status, and statistics
  144. * Administer your cluster, node, and index data and metadata
  145. * Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
  146. * Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others
  147. [[getting-started-cluster-health]]
  148. === Cluster Health
  149. Let's start with a basic health check, which we can use to see how our cluster is doing. We'll be using curl to do this but you can use any tool that allows you to make HTTP/REST calls. Let's assume that we are still on the same node where we started Elasticsearch on and open another command shell window.
  150. To check the cluster health, we will be using the {ref}/cat.html[`_cat` API]. You can
  151. run the command below in {kibana-ref}/console-kibana.html[Kibana's Console]
  152. by clicking "VIEW IN CONSOLE" or with `curl` by clicking the "COPY AS CURL"
  153. link below and pasting it into a terminal.
  154. [source,js]
  155. --------------------------------------------------
  156. GET /_cat/health?v
  157. --------------------------------------------------
  158. // CONSOLE
  159. And the response:
  160. [source,txt]
  161. --------------------------------------------------
  162. epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
  163. 1475247709 17:01:49 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0%
  164. --------------------------------------------------
  165. // TESTRESPONSE[s/1475247709 17:01:49 elasticsearch/\\d+ \\d+:\\d+:\\d+ integTest/]
  166. // TESTRESPONSE[s/0 0 -/0 \\d+ -/]
  167. // TESTRESPONSE[non_json]
  168. We can see that our cluster named "elasticsearch" is up with a green status.
  169. Whenever we ask for the cluster health, we either get green, yellow, or red.
  170. * Green - everything is good (cluster is fully functional)
  171. * Yellow - all data is available but some replicas are not yet allocated (cluster is fully functional)
  172. * Red - some data is not available for whatever reason (cluster is partially functional)
  173. **Note:** When a cluster is red, it will continue to serve search requests from the available shards but you will likely need to fix it ASAP since there are unassigned shards.
  174. Also from the above response, we can see a total of 1 node and that we have 0 shards since we have no data in it yet. Note that since we are using the default cluster name (elasticsearch) and since Elasticsearch uses unicast network discovery by default to find other nodes on the same machine, it is possible that you could accidentally start up more than one node on your computer and have them all join a single cluster. In this scenario, you may see more than 1 node in the above response.
  175. We can also get a list of nodes in our cluster as follows:
  176. [source,js]
  177. --------------------------------------------------
  178. GET /_cat/nodes?v
  179. --------------------------------------------------
  180. // CONSOLE
  181. And the response:
  182. [source,txt]
  183. --------------------------------------------------
  184. ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
  185. 127.0.0.1 10 5 5 4.46 dim * PB2SGZY
  186. --------------------------------------------------
  187. // TESTRESPONSE[s/10 5 5 4.46/\\d+ \\d+ \\d+ (\\d+\\.\\d+)? (\\d+\\.\\d+)? (\\d+\.\\d+)?/]
  188. // TESTRESPONSE[s/[*]/[*]/ s/PB2SGZY/.+/ non_json]
  189. Here, we can see our one node named "PB2SGZY", which is the single node that is currently in our cluster.
  190. [[getting-started-list-indices]]
  191. === List All Indices
  192. Now let's take a peek at our indices:
  193. [source,js]
  194. --------------------------------------------------
  195. GET /_cat/indices?v
  196. --------------------------------------------------
  197. // CONSOLE
  198. And the response:
  199. [source,txt]
  200. --------------------------------------------------
  201. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  202. --------------------------------------------------
  203. // TESTRESPONSE[non_json]
  204. Which simply means we have no indices yet in the cluster.
  205. [[getting-started-create-index]]
  206. === Create an Index
  207. Now let's create an index named "customer" and then list all the indexes again:
  208. [source,js]
  209. --------------------------------------------------
  210. PUT /customer?pretty
  211. GET /_cat/indices?v
  212. --------------------------------------------------
  213. // CONSOLE
  214. The first command creates the index named "customer" using the PUT verb. We simply append `pretty` to the end of the call to tell it to pretty-print the JSON response (if any).
  215. And the response:
  216. [source,txt]
  217. --------------------------------------------------
  218. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  219. yellow open customer 95SQ4TSUT7mWBT7VNHH67A 1 1 0 0 260b 260b
  220. --------------------------------------------------
  221. // TESTRESPONSE[s/95SQ4TSUT7mWBT7VNHH67A/.+/ s/260b/\\d+\\.?\\d?k?b/ non_json]
  222. The results of the second command tells us that we now have one index named customer and it has one primary shard and one replica (the defaults) and it contains zero documents in it.
  223. You might also notice that the customer index has a yellow health tagged to it. Recall from our previous discussion that yellow means that some replicas are not (yet) allocated. The reason this happens for this index is because Elasticsearch by default created one replica for this index. Since we only have one node running at the moment, that one replica cannot yet be allocated (for high availability) until a later point in time when another node joins the cluster. Once that replica gets allocated onto a second node, the health status for this index will turn to green.
  224. [[getting-started-query-document]]
  225. === Index and Query a Document
  226. Let's now put something into our customer index. We'll index a simple customer document into the customer index, with an ID of 1 as follows:
  227. [source,js]
  228. --------------------------------------------------
  229. PUT /customer/_doc/1?pretty
  230. {
  231. "name": "John Doe"
  232. }
  233. --------------------------------------------------
  234. // CONSOLE
  235. And the response:
  236. [source,js]
  237. --------------------------------------------------
  238. {
  239. "_index" : "customer",
  240. "_type" : "_doc",
  241. "_id" : "1",
  242. "_version" : 1,
  243. "result" : "created",
  244. "_shards" : {
  245. "total" : 2,
  246. "successful" : 1,
  247. "failed" : 0
  248. },
  249. "_seq_no" : 0,
  250. "_primary_term" : 1
  251. }
  252. --------------------------------------------------
  253. // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
  254. From the above, we can see that a new customer document was successfully created inside the customer index. The document also has an internal id of 1 which we specified at index time.
  255. It is important to note that Elasticsearch does not require you to explicitly create an index first before you can index documents into it. In the previous example, Elasticsearch will automatically create the customer index if it didn't already exist beforehand.
  256. Let's now retrieve that document that we just indexed:
  257. [source,js]
  258. --------------------------------------------------
  259. GET /customer/_doc/1?pretty
  260. --------------------------------------------------
  261. // CONSOLE
  262. // TEST[continued]
  263. And the response:
  264. [source,js]
  265. --------------------------------------------------
  266. {
  267. "_index" : "customer",
  268. "_type" : "_doc",
  269. "_id" : "1",
  270. "_version" : 1,
  271. "_seq_no" : 25,
  272. "_primary_term" : 1,
  273. "found" : true,
  274. "_source" : { "name": "John Doe" }
  275. }
  276. --------------------------------------------------
  277. // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
  278. Nothing out of the ordinary here other than a field, `found`, stating that we found a document with the requested ID 1 and another field, `_source`, which returns the full JSON document that we indexed from the previous step.
  279. [[getting-started-delete-index]]
  280. === Delete an Index
  281. Now let's delete the index that we just created and then list all the indexes again:
  282. [source,js]
  283. --------------------------------------------------
  284. DELETE /customer?pretty
  285. GET /_cat/indices?v
  286. --------------------------------------------------
  287. // CONSOLE
  288. // TEST[continued]
  289. And the response:
  290. [source,txt]
  291. --------------------------------------------------
  292. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  293. --------------------------------------------------
  294. // TESTRESPONSE[non_json]
  295. Which means that the index was deleted successfully and we are now back to where we started with nothing in our cluster.
  296. Before we move on, let's take a closer look again at some of the API commands that we have learned so far:
  297. [source,js]
  298. --------------------------------------------------
  299. PUT /customer
  300. PUT /customer/_doc/1
  301. {
  302. "name": "John Doe"
  303. }
  304. GET /customer/_doc/1
  305. DELETE /customer
  306. --------------------------------------------------
  307. // CONSOLE
  308. If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch. That pattern can be summarized as follows:
  309. [source,js]
  310. --------------------------------------------------
  311. <HTTP Verb> /<Index>/<Endpoint>/<ID>
  312. --------------------------------------------------
  313. // NOTCONSOLE
  314. This REST access pattern is so pervasive throughout all the API commands that if you can simply remember it, you will have a good head start at mastering Elasticsearch.
  315. [[getting-started-modify-data]]
  316. == Modifying Your Data
  317. Elasticsearch provides data manipulation and search capabilities in near real time. By default, you can expect a one second delay (refresh interval) from the time you index/update/delete your data until the time that it appears in your search results. This is an important distinction from other platforms like SQL wherein data is immediately available after a transaction is completed.
  318. [float]
  319. [[indexing-replacing-documents]]
  320. === Indexing/Replacing Documents
  321. We've previously seen how we can index a single document. Let's recall that command again:
  322. [source,js]
  323. --------------------------------------------------
  324. PUT /customer/_doc/1?pretty
  325. {
  326. "name": "John Doe"
  327. }
  328. --------------------------------------------------
  329. // CONSOLE
  330. Again, the above will index the specified document into the customer index, with the ID of 1. If we then executed the above command again with a different (or same) document, Elasticsearch will replace (i.e. reindex) a new document on top of the existing one with the ID of 1:
  331. [source,js]
  332. --------------------------------------------------
  333. PUT /customer/_doc/1?pretty
  334. {
  335. "name": "Jane Doe"
  336. }
  337. --------------------------------------------------
  338. // CONSOLE
  339. // TEST[continued]
  340. The above changes the name of the document with the ID of 1 from "John Doe" to "Jane Doe". If, on the other hand, we use a different ID, a new document will be indexed and the existing document(s) already in the index remains untouched.
  341. [source,js]
  342. --------------------------------------------------
  343. PUT /customer/_doc/2?pretty
  344. {
  345. "name": "Jane Doe"
  346. }
  347. --------------------------------------------------
  348. // CONSOLE
  349. // TEST[continued]
  350. The above indexes a new document with an ID of 2.
  351. When indexing, the ID part is optional. If not specified, Elasticsearch will generate a random ID and then use it to index the document. The actual ID Elasticsearch generates (or whatever we specified explicitly in the previous examples) is returned as part of the index API call.
  352. This example shows how to index a document without an explicit ID:
  353. [source,js]
  354. --------------------------------------------------
  355. POST /customer/_doc?pretty
  356. {
  357. "name": "Jane Doe"
  358. }
  359. --------------------------------------------------
  360. // CONSOLE
  361. // TEST[continued]
  362. Note that in the above case, we are using the `POST` verb instead of PUT since we didn't specify an ID.
  363. [[getting-started-update-documents]]
  364. === Updating Documents
  365. In addition to being able to index and replace documents, we can also update documents. Note though that Elasticsearch does not actually do in-place updates under the hood. Whenever we do an update, Elasticsearch deletes the old document and then indexes a new document with the update applied to it in one shot.
  366. This example shows how to update our previous document (ID of 1) by changing the name field to "Jane Doe":
  367. [source,js]
  368. --------------------------------------------------
  369. POST /customer/_update/1?pretty
  370. {
  371. "doc": { "name": "Jane Doe" }
  372. }
  373. --------------------------------------------------
  374. // CONSOLE
  375. // TEST[continued]
  376. This example shows how to update our previous document (ID of 1) by changing the name field to "Jane Doe" and at the same time add an age field to it:
  377. [source,js]
  378. --------------------------------------------------
  379. POST /customer/_update/1?pretty
  380. {
  381. "doc": { "name": "Jane Doe", "age": 20 }
  382. }
  383. --------------------------------------------------
  384. // CONSOLE
  385. // TEST[continued]
  386. Updates can also be performed by using simple scripts. This example uses a script to increment the age by 5:
  387. [source,js]
  388. --------------------------------------------------
  389. POST /customer/_update/1?pretty
  390. {
  391. "script" : "ctx._source.age += 5"
  392. }
  393. --------------------------------------------------
  394. // CONSOLE
  395. // TEST[continued]
  396. In the above example, `ctx._source` refers to the current source document that is about to be updated.
  397. Elasticsearch provides the ability to update multiple documents given a query condition (like an `SQL UPDATE-WHERE` statement). See {ref}/docs-update-by-query.html[`docs-update-by-query` API]
  398. [[getting-started-delete-documents]]
  399. === Deleting Documents
  400. Deleting a document is fairly straightforward. This example shows how to delete our previous customer with the ID of 2:
  401. [source,js]
  402. --------------------------------------------------
  403. DELETE /customer/_doc/2?pretty
  404. --------------------------------------------------
  405. // CONSOLE
  406. // TEST[continued]
  407. See the {ref}/docs-delete-by-query.html[`_delete_by_query` API] to delete all documents matching a specific query.
  408. It is worth noting that it is much more efficient to delete a whole index
  409. instead of deleting all documents with the Delete By Query API.
  410. [[getting-started-batch-processing]]
  411. === Batch Processing
  412. In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the {ref}/docs-bulk.html[`_bulk` API]. This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as few network roundtrips as possible.
  413. As a quick example, the following call indexes two documents (ID 1 - John Doe and ID 2 - Jane Doe) in one bulk operation:
  414. [source,js]
  415. --------------------------------------------------
  416. POST /customer/_bulk?pretty
  417. {"index":{"_id":"1"}}
  418. {"name": "John Doe" }
  419. {"index":{"_id":"2"}}
  420. {"name": "Jane Doe" }
  421. --------------------------------------------------
  422. // CONSOLE
  423. This example updates the first document (ID of 1) and then deletes the second document (ID of 2) in one bulk operation:
  424. [source,sh]
  425. --------------------------------------------------
  426. POST /customer/_bulk?pretty
  427. {"update":{"_id":"1"}}
  428. {"doc": { "name": "John Doe becomes Jane Doe" } }
  429. {"delete":{"_id":"2"}}
  430. --------------------------------------------------
  431. // CONSOLE
  432. // TEST[continued]
  433. Note above that for the delete action, there is no corresponding source document after it since deletes only require the ID of the document to be deleted.
  434. The Bulk API does not fail due to failures in one of the actions. If a single action fails for whatever reason, it will continue to process the remainder of the actions after it. When the bulk API returns, it will provide a status for each action (in the same order it was sent in) so that you can check if a specific action failed or not.
  435. [[getting-started-explore-data]]
  436. == Exploring Your Data
  437. [float]
  438. === Sample Dataset
  439. Now that we've gotten a glimpse of the basics, let's try to work on a more realistic dataset. I've prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following schema:
  440. [source,js]
  441. --------------------------------------------------
  442. {
  443. "account_number": 0,
  444. "balance": 16623,
  445. "firstname": "Bradshaw",
  446. "lastname": "Mckenzie",
  447. "age": 29,
  448. "gender": "F",
  449. "address": "244 Columbus Place",
  450. "employer": "Euron",
  451. "email": "bradshawmckenzie@euron.com",
  452. "city": "Hobucken",
  453. "state": "CO"
  454. }
  455. --------------------------------------------------
  456. // NOTCONSOLE
  457. For the curious, this data was generated using http://www.json-generator.com/[`www.json-generator.com/`], so please ignore the actual values and semantics of the data as these are all randomly generated.
  458. [float]
  459. === Loading the Sample Dataset
  460. You can download the sample dataset (accounts.json) from https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[here]. Extract it to our current directory and let's load it into our cluster as follows:
  461. [source,sh]
  462. --------------------------------------------------
  463. curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
  464. curl "localhost:9200/_cat/indices?v"
  465. --------------------------------------------------
  466. // NOTCONSOLE
  467. ////
  468. This replicates the above in a document-testing friendly way but isn't visible
  469. in the docs:
  470. [source,js]
  471. --------------------------------------------------
  472. GET /_cat/indices?v
  473. --------------------------------------------------
  474. // CONSOLE
  475. // TEST[setup:bank]
  476. ////
  477. And the response:
  478. [source,txt]
  479. --------------------------------------------------
  480. health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
  481. yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb
  482. --------------------------------------------------
  483. // TESTRESPONSE[s/128.6kb/\\d+(\\.\\d+)?[mk]?b/]
  484. // TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ non_json]
  485. Which means that we just successfully bulk indexed 1000 documents into the bank index.
  486. [[getting-started-search-API]]
  487. === The Search API
  488. Now let's start with some simple searches. There are two basic ways to run searches: one is by sending search parameters through the {ref}/search-uri-request.html[REST request URI] and the other by sending them through the {ref}/search-request-body.html[REST request body]. The request body method allows you to be more expressive and also to define your searches in a more readable JSON format. We'll try one example of the request URI method but for the remainder of this tutorial, we will exclusively be using the request body method.
  489. The REST API for search is accessible from the `_search` endpoint. This example returns all documents in the bank index:
  490. [source,js]
  491. --------------------------------------------------
  492. GET /bank/_search?q=*&sort=account_number:asc&pretty
  493. --------------------------------------------------
  494. // CONSOLE
  495. // TEST[continued]
  496. Let's first dissect the search call. We are searching (`_search` endpoint) in the bank index, and the `q=*` parameter instructs Elasticsearch to match all documents in the index. The `sort=account_number:asc` parameter indicates to sort the results using the `account_number` field of each document in an ascending order. The `pretty` parameter, again, just tells Elasticsearch to return pretty-printed JSON results.
  497. And the response (partially shown):
  498. [source,js]
  499. --------------------------------------------------
  500. {
  501. "took" : 63,
  502. "timed_out" : false,
  503. "_shards" : {
  504. "total" : 5,
  505. "successful" : 5,
  506. "skipped" : 0,
  507. "failed" : 0
  508. },
  509. "hits" : {
  510. "total" : {
  511. "value": 1000,
  512. "relation": "eq"
  513. },
  514. "max_score" : null,
  515. "hits" : [ {
  516. "_index" : "bank",
  517. "_type" : "_doc",
  518. "_id" : "0",
  519. "sort": [0],
  520. "_score" : null,
  521. "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
  522. }, {
  523. "_index" : "bank",
  524. "_type" : "_doc",
  525. "_id" : "1",
  526. "sort": [1],
  527. "_score" : null,
  528. "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
  529. }, ...
  530. ]
  531. }
  532. }
  533. --------------------------------------------------
  534. // TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
  535. // TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
  536. As for the response, we see the following parts:
  537. * `took` – time in milliseconds for Elasticsearch to execute the search
  538. * `timed_out` – tells us if the search timed out or not
  539. * `_shards` – tells us how many shards were searched, as well as a count of the successful/failed searched shards
  540. * `hits` – search results
  541. * `hits.total` – an object that contains information about the total number of documents matching our search criteria
  542. ** `hits.total.value` - the value of the total hit count (must be interpreted in the context of `hits.total.relation`).
  543. ** `hits.total.relation` - whether `hits.total.value` is the exact hit count, in which case it is equal to `"eq"` or a
  544. lower bound of the total hit count (greater than or equals), in which case it is equal to `gte`.
  545. * `hits.hits` – actual array of search results (defaults to first 10 documents)
  546. * `hits.sort` - sort value of the sort key for each result (missing if sorting by score)
  547. * `hits._score` and `max_score` - ignore these fields for now
  548. The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
  549. the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
  550. which means that the total hit count is accurately tracked up to `10,000` documents.
  551. You can force an accurate count by setting `track_total_hits` to true explicitly.
  552. See the <<request-body-search-track-total-hits, request body>> documentation
  553. for more details.
  554. Here is the same exact search above using the alternative request body method:
  555. [source,js]
  556. --------------------------------------------------
  557. GET /bank/_search
  558. {
  559. "query": { "match_all": {} },
  560. "sort": [
  561. { "account_number": "asc" }
  562. ]
  563. }
  564. --------------------------------------------------
  565. // CONSOLE
  566. // TEST[continued]
  567. The difference here is that instead of passing `q=*` in the URI, we provide a JSON-style query request body to the `_search` API. We'll discuss this JSON query in the next section.
  568. ////
  569. Hidden response just so we can assert that it is indeed the same but don't have
  570. to clutter the docs with it:
  571. [source,js]
  572. --------------------------------------------------
  573. {
  574. "took" : 63,
  575. "timed_out" : false,
  576. "_shards" : {
  577. "total" : 5,
  578. "successful" : 5,
  579. "skipped" : 0,
  580. "failed" : 0
  581. },
  582. "hits" : {
  583. "total" : {
  584. "value": 1000,
  585. "relation": "eq"
  586. },
  587. "max_score": null,
  588. "hits" : [ {
  589. "_index" : "bank",
  590. "_type" : "_doc",
  591. "_id" : "0",
  592. "sort": [0],
  593. "_score": null,
  594. "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
  595. }, {
  596. "_index" : "bank",
  597. "_type" : "_doc",
  598. "_id" : "1",
  599. "sort": [1],
  600. "_score": null,
  601. "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
  602. }, ...
  603. ]
  604. }
  605. }
  606. --------------------------------------------------
  607. // TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
  608. // TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
  609. ////
  610. It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor.
  611. [[getting-started-query-lang]]
  612. === Introducing the Query Language
  613. Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries. This is referred to as the {ref}/query-dsl.html[Query DSL]. The query language is quite comprehensive and can be intimidating at first glance but the best way to actually learn it is to start with a few basic examples.
  614. Going back to our last example, we executed this query:
  615. [source,js]
  616. --------------------------------------------------
  617. GET /bank/_search
  618. {
  619. "query": { "match_all": {} }
  620. }
  621. --------------------------------------------------
  622. // CONSOLE
  623. // TEST[continued]
  624. Dissecting the above, the `query` part tells us what our query definition is and the `match_all` part is simply the type of query that we want to run. The `match_all` query is simply a search for all documents in the specified index.
  625. In addition to the `query` parameter, we also can pass other parameters to
  626. influence the search results. In the example in the section above we passed in
  627. `sort`, here we pass in `size`:
  628. [source,js]
  629. --------------------------------------------------
  630. GET /bank/_search
  631. {
  632. "query": { "match_all": {} },
  633. "size": 1
  634. }
  635. --------------------------------------------------
  636. // CONSOLE
  637. // TEST[continued]
  638. Note that if `size` is not specified, it defaults to 10.
  639. This example does a `match_all` and returns documents 10 through 19:
  640. [source,js]
  641. --------------------------------------------------
  642. GET /bank/_search
  643. {
  644. "query": { "match_all": {} },
  645. "from": 10,
  646. "size": 10
  647. }
  648. --------------------------------------------------
  649. // CONSOLE
  650. // TEST[continued]
  651. The `from` parameter (0-based) specifies which document index to start from and the `size` parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if `from` is not specified, it defaults to 0.
  652. This example does a `match_all` and sorts the results by account balance in descending order and returns the top 10 (default size) documents.
  653. [source,js]
  654. --------------------------------------------------
  655. GET /bank/_search
  656. {
  657. "query": { "match_all": {} },
  658. "sort": { "balance": { "order": "desc" } }
  659. }
  660. --------------------------------------------------
  661. // CONSOLE
  662. // TEST[continued]
  663. [[getting-started-search]]
  664. === Executing Searches
  665. Now that we have seen a few of the basic search parameters, let's dig in some more into the Query DSL. Let's first take a look at the returned document fields. By default, the full JSON document is returned as part of all searches. This is referred to as the source (`_source` field in the search hits). If we don't want the entire source document returned, we have the ability to request only a few fields from within source to be returned.
  666. This example shows how to return two fields, `account_number` and `balance` (inside of `_source`), from the search:
  667. [source,js]
  668. --------------------------------------------------
  669. GET /bank/_search
  670. {
  671. "query": { "match_all": {} },
  672. "_source": ["account_number", "balance"]
  673. }
  674. --------------------------------------------------
  675. // CONSOLE
  676. // TEST[continued]
  677. Note that the above example simply reduces the `_source` field. It will still only return one field named `_source` but within it, only the fields `account_number` and `balance` are included.
  678. If you come from a SQL background, the above is somewhat similar in concept to the `SQL SELECT FROM` field list.
  679. Now let's move on to the query part. Previously, we've seen how the `match_all` query is used to match all documents. Let's now introduce a new query called the {ref}/query-dsl-match-query.html[`match` query], which can be thought of as a basic fielded search query (i.e. a search done against a specific field or set of fields).
  680. This example returns the account numbered 20:
  681. [source,js]
  682. --------------------------------------------------
  683. GET /bank/_search
  684. {
  685. "query": { "match": { "account_number": 20 } }
  686. }
  687. --------------------------------------------------
  688. // CONSOLE
  689. // TEST[continued]
  690. This example returns all accounts containing the term "mill" in the address:
  691. [source,js]
  692. --------------------------------------------------
  693. GET /bank/_search
  694. {
  695. "query": { "match": { "address": "mill" } }
  696. }
  697. --------------------------------------------------
  698. // CONSOLE
  699. // TEST[continued]
  700. This example returns all accounts containing the term "mill" or "lane" in the address:
  701. [source,js]
  702. --------------------------------------------------
  703. GET /bank/_search
  704. {
  705. "query": { "match": { "address": "mill lane" } }
  706. }
  707. --------------------------------------------------
  708. // CONSOLE
  709. // TEST[continued]
  710. This example is a variant of `match` (`match_phrase`) that returns all accounts containing the phrase "mill lane" in the address:
  711. [source,js]
  712. --------------------------------------------------
  713. GET /bank/_search
  714. {
  715. "query": { "match_phrase": { "address": "mill lane" } }
  716. }
  717. --------------------------------------------------
  718. // CONSOLE
  719. // TEST[continued]
  720. Let's now introduce the {ref}/query-dsl-bool-query.html[`bool` query]. The `bool` query allows us to compose smaller queries into bigger queries using boolean logic.
  721. This example composes two `match` queries and returns all accounts containing "mill" and "lane" in the address:
  722. [source,js]
  723. --------------------------------------------------
  724. GET /bank/_search
  725. {
  726. "query": {
  727. "bool": {
  728. "must": [
  729. { "match": { "address": "mill" } },
  730. { "match": { "address": "lane" } }
  731. ]
  732. }
  733. }
  734. }
  735. --------------------------------------------------
  736. // CONSOLE
  737. // TEST[continued]
  738. In the above example, the `bool must` clause specifies all the queries that must be true for a document to be considered a match.
  739. In contrast, this example composes two `match` queries and returns all accounts containing "mill" or "lane" in the address:
  740. [source,js]
  741. --------------------------------------------------
  742. GET /bank/_search
  743. {
  744. "query": {
  745. "bool": {
  746. "should": [
  747. { "match": { "address": "mill" } },
  748. { "match": { "address": "lane" } }
  749. ]
  750. }
  751. }
  752. }
  753. --------------------------------------------------
  754. // CONSOLE
  755. // TEST[continued]
  756. In the above example, the `bool should` clause specifies a list of queries either of which must be true for a document to be considered a match.
  757. This example composes two `match` queries and returns all accounts that contain neither "mill" nor "lane" in the address:
  758. [source,js]
  759. --------------------------------------------------
  760. GET /bank/_search
  761. {
  762. "query": {
  763. "bool": {
  764. "must_not": [
  765. { "match": { "address": "mill" } },
  766. { "match": { "address": "lane" } }
  767. ]
  768. }
  769. }
  770. }
  771. --------------------------------------------------
  772. // CONSOLE
  773. // TEST[continued]
  774. In the above example, the `bool must_not` clause specifies a list of queries none of which must be true for a document to be considered a match.
  775. We can combine `must`, `should`, and `must_not` clauses simultaneously inside a `bool` query. Furthermore, we can compose `bool` queries inside any of these `bool` clauses to mimic any complex multi-level boolean logic.
  776. This example returns all accounts of anybody who is 40 years old but doesn't live in ID(aho):
  777. [source,js]
  778. --------------------------------------------------
  779. GET /bank/_search
  780. {
  781. "query": {
  782. "bool": {
  783. "must": [
  784. { "match": { "age": "40" } }
  785. ],
  786. "must_not": [
  787. { "match": { "state": "ID" } }
  788. ]
  789. }
  790. }
  791. }
  792. --------------------------------------------------
  793. // CONSOLE
  794. // TEST[continued]
  795. [[getting-started-filters]]
  796. === Executing Filters
  797. In the previous section, we skipped over a little detail called the document score (`_score` field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is.
  798. But queries do not always need to produce scores, in particular when they are only used for "filtering" the document set. Elasticsearch detects these situations and automatically optimizes query execution in order not to compute useless scores.
  799. The {ref}/query-dsl-bool-query.html[`bool` query] that we introduced in the previous section also supports `filter` clauses which allow us to use a query to restrict the documents that will be matched by other clauses, without changing how scores are computed. As an example, let's introduce the {ref}/query-dsl-range-query.html[`range` query], which allows us to filter documents by a range of values. This is generally used for numeric or date filtering.
  800. This example uses a bool query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
  801. [source,js]
  802. --------------------------------------------------
  803. GET /bank/_search
  804. {
  805. "query": {
  806. "bool": {
  807. "must": { "match_all": {} },
  808. "filter": {
  809. "range": {
  810. "balance": {
  811. "gte": 20000,
  812. "lte": 30000
  813. }
  814. }
  815. }
  816. }
  817. }
  818. }
  819. --------------------------------------------------
  820. // CONSOLE
  821. // TEST[continued]
  822. Dissecting the above, the bool query contains a `match_all` query (the query part) and a `range` query (the filter part). We can substitute any other queries into the query and the filter parts. In the above case, the range query makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.
  823. In addition to the `match_all`, `match`, `bool`, and `range` queries, there are a lot of other query types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query types.
  824. [[getting-started-aggregations]]
  825. === Executing Aggregations
  826. Aggregations provide the ability to group and extract statistics from your data. The easiest way to think about aggregations is by roughly equating it to the SQL GROUP BY and the SQL aggregate functions. In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.
  827. To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default):
  828. [source,js]
  829. --------------------------------------------------
  830. GET /bank/_search
  831. {
  832. "size": 0,
  833. "aggs": {
  834. "group_by_state": {
  835. "terms": {
  836. "field": "state.keyword"
  837. }
  838. }
  839. }
  840. }
  841. --------------------------------------------------
  842. // CONSOLE
  843. // TEST[continued]
  844. In SQL, the above aggregation is similar in concept to:
  845. [source,sh]
  846. --------------------------------------------------
  847. SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
  848. --------------------------------------------------
  849. And the response (partially shown):
  850. [source,js]
  851. --------------------------------------------------
  852. {
  853. "took": 29,
  854. "timed_out": false,
  855. "_shards": {
  856. "total": 5,
  857. "successful": 5,
  858. "skipped" : 0,
  859. "failed": 0
  860. },
  861. "hits" : {
  862. "total" : {
  863. "value": 1000,
  864. "relation": "eq"
  865. },
  866. "max_score" : null,
  867. "hits" : [ ]
  868. },
  869. "aggregations" : {
  870. "group_by_state" : {
  871. "doc_count_error_upper_bound": 20,
  872. "sum_other_doc_count": 770,
  873. "buckets" : [ {
  874. "key" : "ID",
  875. "doc_count" : 27
  876. }, {
  877. "key" : "TX",
  878. "doc_count" : 27
  879. }, {
  880. "key" : "AL",
  881. "doc_count" : 25
  882. }, {
  883. "key" : "MD",
  884. "doc_count" : 25
  885. }, {
  886. "key" : "TN",
  887. "doc_count" : 23
  888. }, {
  889. "key" : "MA",
  890. "doc_count" : 21
  891. }, {
  892. "key" : "NC",
  893. "doc_count" : 21
  894. }, {
  895. "key" : "ND",
  896. "doc_count" : 21
  897. }, {
  898. "key" : "ME",
  899. "doc_count" : 20
  900. }, {
  901. "key" : "MO",
  902. "doc_count" : 20
  903. } ]
  904. }
  905. }
  906. }
  907. --------------------------------------------------
  908. // TESTRESPONSE[s/"took": 29/"took": $body.took/]
  909. We can see that there are 27 accounts in `ID` (Idaho), followed by 27 accounts
  910. in `TX` (Texas), followed by 25 accounts in `AL` (Alabama), and so forth.
  911. Note that we set `size=0` to not show search hits because we only want to see the aggregation results in the response.
  912. Building on the previous aggregation, this example calculates the average account balance by state (again only for the top 10 states sorted by count in descending order):
  913. [source,js]
  914. --------------------------------------------------
  915. GET /bank/_search
  916. {
  917. "size": 0,
  918. "aggs": {
  919. "group_by_state": {
  920. "terms": {
  921. "field": "state.keyword"
  922. },
  923. "aggs": {
  924. "average_balance": {
  925. "avg": {
  926. "field": "balance"
  927. }
  928. }
  929. }
  930. }
  931. }
  932. }
  933. --------------------------------------------------
  934. // CONSOLE
  935. // TEST[continued]
  936. Notice how we nested the `average_balance` aggregation inside the `group_by_state` aggregation. This is a common pattern for all the aggregations. You can nest aggregations inside aggregations arbitrarily to extract pivoted summarizations that you require from your data.
  937. Building on the previous aggregation, let's now sort on the average balance in descending order:
  938. [source,js]
  939. --------------------------------------------------
  940. GET /bank/_search
  941. {
  942. "size": 0,
  943. "aggs": {
  944. "group_by_state": {
  945. "terms": {
  946. "field": "state.keyword",
  947. "order": {
  948. "average_balance": "desc"
  949. }
  950. },
  951. "aggs": {
  952. "average_balance": {
  953. "avg": {
  954. "field": "balance"
  955. }
  956. }
  957. }
  958. }
  959. }
  960. }
  961. --------------------------------------------------
  962. // CONSOLE
  963. // TEST[continued]
  964. This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender:
  965. [source,js]
  966. --------------------------------------------------
  967. GET /bank/_search
  968. {
  969. "size": 0,
  970. "aggs": {
  971. "group_by_age": {
  972. "range": {
  973. "field": "age",
  974. "ranges": [
  975. {
  976. "from": 20,
  977. "to": 30
  978. },
  979. {
  980. "from": 30,
  981. "to": 40
  982. },
  983. {
  984. "from": 40,
  985. "to": 50
  986. }
  987. ]
  988. },
  989. "aggs": {
  990. "group_by_gender": {
  991. "terms": {
  992. "field": "gender.keyword"
  993. },
  994. "aggs": {
  995. "average_balance": {
  996. "avg": {
  997. "field": "balance"
  998. }
  999. }
  1000. }
  1001. }
  1002. }
  1003. }
  1004. }
  1005. }
  1006. --------------------------------------------------
  1007. // CONSOLE
  1008. // TEST[continued]
  1009. There are many other aggregations capabilities that we won't go into detail here. The {ref}/search-aggregations.html[aggregations reference guide] is a great starting point if you want to do further experimentation.
  1010. [[getting-started-conclusion]]
  1011. == Conclusion
  1012. Elasticsearch is both a simple and complex product. We've so far learned the basics of what it is, how to look inside of it, and how to work with it using some of the REST APIs. Hopefully this tutorial has given you a better understanding of what Elasticsearch is and more importantly, inspired you to further experiment with the rest of its great features!