lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
				
					
						
						
							1234567891011121314151617181920212223242526272829303132333435363738
							[[ingest-attachment]]
=== Ingest Attachment Processor Plugin

The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by
using the Apache text extraction library http://lucene.apache.org/tika/[Tika].

You can use the ingest attachment plugin as a replacement for the mapper attachment plugin.

The source field must be a base64 encoded binary.

[[ingest-attachment-options]]
.Attachment options
[options="header"]
|======
| Name                   | Required  | Default          | Description
| `source_field`         | yes       | -                | The field to get the base64 encoded field from
| `target_field`         | no        | attachment       | The field that will hold the attachment information
| `indexed_chars`        | no        | 100000           | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
| `fields`               | no        | all              | Properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
|======

[source,js]
--------------------------------------------------
{
  "description" : "...",
  "processors" : [
    {
      "attachment" : {
        "source_field" : "data"
      }
    }
  ]
}
--------------------------------------------------

NOTE: Extracting contents from binary data is a resource intensive operation and
      consumes a lot of resources. It is highly recommended to run pipelines
      using this processor in a dedicated ingest node.