ingest-attachment.asciidoc 1.5 KB

1234567891011121314151617181920212223242526272829303132333435363738
  1. [[ingest-attachment]]
  2. == Ingest Attachment Processor Plugin
  3. The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, PDF)
  4. using the Apache text extraction library http://lucene.apache.org/tika/[Tika].
  5. It can be used as replacement for the mapper attachment plugin.
  6. The source field must be a base64 encoded binary.
  7. [[ingest-attachment-options]]
  8. .Attachment options
  9. [options="header"]
  10. |======
  11. | Name | Required | Default | Description
  12. | `source_field` | yes | - | The field to get the base64 encoded field from
  13. | `target_field` | no | attachment | The field that will hold the attachment information
  14. | `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
  15. | `fields` | no | all | Properties to select to be stored, can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
  16. |======
  17. [source,js]
  18. --------------------------------------------------
  19. {
  20. "description" : "...",
  21. "processors" : [
  22. {
  23. "attachment" : {
  24. "source_field" : "data"
  25. }
  26. }
  27. ]
  28. }
  29. --------------------------------------------------
  30. NOTE: Extracting contents from binary data is a resource intensive operation and
  31. consumes a lot of resources. It is highly recommended to run pipelines
  32. using this processor in a dedicated ingest node.