ingest-attachment.asciidoc 1.6 KB

1234567891011121314151617181920212223242526272829303132333435363738
  1. [[ingest-attachment]]
  2. === Ingest Attachment Processor Plugin
  3. The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by
  4. using the Apache text extraction library http://lucene.apache.org/tika/[Tika].
  5. You can use the ingest attachment plugin as a replacement for the mapper attachment plugin.
  6. The source field must be a base64 encoded binary.
  7. [[ingest-attachment-options]]
  8. .Attachment options
  9. [options="header"]
  10. |======
  11. | Name | Required | Default | Description
  12. | `source_field` | yes | - | The field to get the base64 encoded field from
  13. | `target_field` | no | attachment | The field that will hold the attachment information
  14. | `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
  15. | `fields` | no | all | Properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
  16. |======
  17. [source,js]
  18. --------------------------------------------------
  19. {
  20. "description" : "...",
  21. "processors" : [
  22. {
  23. "attachment" : {
  24. "source_field" : "data"
  25. }
  26. }
  27. ]
  28. }
  29. --------------------------------------------------
  30. NOTE: Extracting contents from binary data is a resource intensive operation and
  31. consumes a lot of resources. It is highly recommended to run pipelines
  32. using this processor in a dedicated ingest node.