|
@@ -22,7 +22,7 @@ how {es} works. If you're already familiar with {es} and want to see how it work
|
|
|
with the rest of the stack, you might want to jump to the
|
|
|
{stack-gs}/get-started-elastic-stack.html[Elastic Stack
|
|
|
Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
|
|
|
-{beats}, and {ls}.
|
|
|
+{beats}, and {ls}.
|
|
|
|
|
|
TIP: The fastest way to get started with {es} is to
|
|
|
https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
|
|
@@ -135,8 +135,8 @@ Windows:
|
|
|
The additional nodes are assigned unique IDs. Because you're running all three
|
|
|
nodes locally, they automatically join the cluster with the first node.
|
|
|
|
|
|
-. Use the `cat health` API to verify that your three-node cluster is up running.
|
|
|
-The `cat` APIs return information about your cluster and indices in a
|
|
|
+. Use the cat health API to verify that your three-node cluster is up running.
|
|
|
+The cat APIs return information about your cluster and indices in a
|
|
|
format that's easier to read than raw JSON.
|
|
|
+
|
|
|
You can interact directly with your cluster by submitting HTTP requests to
|
|
@@ -155,8 +155,8 @@ GET /_cat/health?v
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
+
|
|
|
-The response should indicate that the status of the _elasticsearch_ cluster
|
|
|
-is _green_ and it has three nodes:
|
|
|
+The response should indicate that the status of the `elasticsearch` cluster
|
|
|
+is `green` and it has three nodes:
|
|
|
+
|
|
|
[source,txt]
|
|
|
--------------------------------------------------
|
|
@@ -191,8 +191,8 @@ Once you have a cluster up and running, you're ready to index some data.
|
|
|
There are a variety of ingest options for {es}, but in the end they all
|
|
|
do the same thing: put JSON documents into an {es} index.
|
|
|
|
|
|
-You can do this directly with a simple POST request that identifies
|
|
|
-the index you want to add the document to and specifies one or more
|
|
|
+You can do this directly with a simple PUT request that specifies
|
|
|
+the index you want to add the document, a unique document ID, and one or more
|
|
|
`"field": "value"` pairs in the request body:
|
|
|
|
|
|
[source,js]
|
|
@@ -204,9 +204,9 @@ PUT /customer/_doc/1
|
|
|
--------------------------------------------------
|
|
|
// CONSOLE
|
|
|
|
|
|
-This request automatically creates the _customer_ index if it doesn't already
|
|
|
+This request automatically creates the `customer` index if it doesn't already
|
|
|
exist, adds a new document that has an ID of `1`, and stores and
|
|
|
-indexes the _name_ field.
|
|
|
+indexes the `name` field.
|
|
|
|
|
|
Since this is a new document, the response shows that the result of the
|
|
|
operation was that version 1 of the document was created:
|
|
@@ -264,46 +264,22 @@ and shows the original source fields that were indexed.
|
|
|
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ ]
|
|
|
// TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
|
|
|
|
|
|
-
|
|
|
[float]
|
|
|
[[getting-started-batch-processing]]
|
|
|
-=== Batch processing
|
|
|
-
|
|
|
-In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the {ref}/docs-bulk.html[`_bulk` API]. This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as few network roundtrips as possible.
|
|
|
-
|
|
|
-As a quick example, the following call indexes two documents (ID 1 - John Doe and ID 2 - Jane Doe) in one bulk operation:
|
|
|
-
|
|
|
-[source,js]
|
|
|
---------------------------------------------------
|
|
|
-POST /customer/_bulk?pretty
|
|
|
-{"index":{"_id":"1"}}
|
|
|
-{"name": "John Doe" }
|
|
|
-{"index":{"_id":"2"}}
|
|
|
-{"name": "Jane Doe" }
|
|
|
---------------------------------------------------
|
|
|
-// CONSOLE
|
|
|
-
|
|
|
-This example updates the first document (ID of 1) and then deletes the second document (ID of 2) in one bulk operation:
|
|
|
-
|
|
|
-[source,sh]
|
|
|
---------------------------------------------------
|
|
|
-POST /customer/_bulk
|
|
|
-{"update":{"_id":"1"}}
|
|
|
-{"doc": { "name": "John Doe becomes Jane Doe" } }
|
|
|
-{"delete":{"_id":"2"}}
|
|
|
---------------------------------------------------
|
|
|
-// CONSOLE
|
|
|
-// TEST[continued]
|
|
|
+=== Indexing documents in bulk
|
|
|
|
|
|
-Note above that for the delete action, there is no corresponding source document after it since deletes only require the ID of the document to be deleted.
|
|
|
+If you have a lot of documents to index, you can submit them in batches with
|
|
|
+the {ref}/docs-bulk.html[bulk API]. Using bulk to batch document
|
|
|
+operations is significantly faster than submitting requests individually as it minimizes network roundtrips.
|
|
|
|
|
|
-The Bulk API does not fail due to failures in one of the actions. If a single action fails for whatever reason, it will continue to process the remainder of the actions after it. When the bulk API returns, it will provide a status for each action (in the same order it was sent in) so that you can check if a specific action failed or not.
|
|
|
+The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents
|
|
|
+and a total payload between 5MB and 15MB. From there, you can experiment
|
|
|
+to find the sweet spot.
|
|
|
|
|
|
-[float]
|
|
|
-=== Sample dataset
|
|
|
-
|
|
|
-Now that we've gotten a glimpse of the basics, let's try to work on a more realistic dataset. I've prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following schema:
|
|
|
+To get some data into {es} that you can start searching and analyzing:
|
|
|
|
|
|
+. Download the https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[`accounts.json`] sample data set. The documents in this randomly-generated data set represent user accounts with the following information:
|
|
|
++
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
{
|
|
@@ -322,21 +298,19 @@ Now that we've gotten a glimpse of the basics, let's try to work on a more reali
|
|
|
--------------------------------------------------
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
-For the curious, this data was generated using http://www.json-generator.com/[`www.json-generator.com/`], so please ignore the actual values and semantics of the data as these are all randomly generated.
|
|
|
-
|
|
|
-You can download the sample dataset (accounts.json) from https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[here]. Extract it to our current directory and let's load it into our cluster as follows:
|
|
|
-
|
|
|
+. Index the account data into the `bank` index with the following `_bulk` request:
|
|
|
++
|
|
|
[source,sh]
|
|
|
--------------------------------------------------
|
|
|
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
|
|
|
curl "localhost:9200/_cat/indices?v"
|
|
|
--------------------------------------------------
|
|
|
// NOTCONSOLE
|
|
|
-
|
|
|
++
|
|
|
////
|
|
|
This replicates the above in a document-testing friendly way but isn't visible
|
|
|
in the docs:
|
|
|
-
|
|
|
++
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
GET /_cat/indices?v
|
|
@@ -344,9 +318,9 @@ GET /_cat/indices?v
|
|
|
// CONSOLE
|
|
|
// TEST[setup:bank]
|
|
|
////
|
|
|
-
|
|
|
-And the response:
|
|
|
-
|
|
|
++
|
|
|
+The response indicates that 1,000 documents were indexed successfully.
|
|
|
++
|
|
|
[source,txt]
|
|
|
--------------------------------------------------
|
|
|
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
|
|
@@ -355,8 +329,6 @@ yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 12
|
|
|
// TESTRESPONSE[s/128.6kb/\\d+(\\.\\d+)?[mk]?b/]
|
|
|
// TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ non_json]
|
|
|
|
|
|
-Which means that we just successfully bulk indexed 1000 documents into the bank index.
|
|
|
-
|
|
|
[[getting-started-search]]
|
|
|
== Start searching
|
|
|
|