repository-hdfs.asciidoc 5.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
  1. [[repository-hdfs]]
  2. === Hadoop HDFS Repository Plugin
  3. The HDFS repository plugin adds support for using HDFS File System as a repository for
  4. {ref}/modules-snapshots.html[Snapshot/Restore].
  5. [[repository-hdfs-install]]
  6. [float]
  7. ==== Installation
  8. This plugin can be installed using the plugin manager:
  9. [source,sh]
  10. ----------------------------------------------------------------
  11. sudo bin/plugin install repository-hdfs
  12. sudo bin/plugin install repository-hdfs-hadoop2
  13. sudo bin/plugin install repository-hdfs-lite
  14. ----------------------------------------------------------------
  15. The plugin must be installed on every node in the cluster, and each node must
  16. be restarted after installation.
  17. [[repository-hdfs-remove]]
  18. [float]
  19. ==== Removal
  20. The plugin can be removed with the following command:
  21. [source,sh]
  22. ----------------------------------------------------------------
  23. sudo bin/plugin remove repository-hdfs
  24. sudo bin/plugin remove repository-hdfs-hadoop2
  25. sudo bin/plugin remove repository-hdfs-lite
  26. ----------------------------------------------------------------
  27. The node must be stopped before removing the plugin.
  28. [[repository-hdfs-usage]]
  29. ==== Getting started with HDFS
  30. The HDFS snapshot/restore plugin comes in three _flavors_:
  31. * Default / Hadoop 1.x::
  32. The default version contains the plugin jar alongside Apache Hadoop 1.x (stable) dependencies.
  33. * YARN / Hadoop 2.x::
  34. The `hadoop2` version contains the plugin jar plus the Apache Hadoop 2.x (also known as YARN) dependencies.
  35. * Lite::
  36. The `lite` version contains just the plugin jar, without any Hadoop dependencies. The user should provide these (read below).
  37. [[repository-hdfs-flavor]]
  38. ===== What version to use?
  39. It depends on whether Hadoop is locally installed or not and if not, whether it is compatible with Apache Hadoop clients.
  40. * Are you using Apache Hadoop (or a _compatible_ distro) and do not have installed on the Elasticsearch nodes?::
  41. +
  42. If the answer is yes, for Apache Hadoop 1 use the default `repository-hdfs` or `repository-hdfs-hadoop2` for Apache Hadoop 2.
  43. +
  44. * If you are have Hadoop installed locally on the Elasticsearch nodes or are using a certain distro::
  45. +
  46. Use the `lite` version and place your Hadoop _client_ jars and their dependencies in the plugin folder under `hadoop-libs`.
  47. For large deployments, it is recommended to package the libraries in the plugin zip and deploy it manually across nodes
  48. (and thus avoiding having to do the libraries setup on each node).
  49. [[repository-hdfs-security]]
  50. ==== Handling JVM Security and Permissions
  51. Out of the box, Elasticsearch runs in a JVM with the security manager turned _on_ to make sure that unsafe or sensitive actions
  52. are allowed only from trusted code. Hadoop however is not really designed to run under one; it does not rely on privileged blocks
  53. to execute sensitive code, of which it uses plenty.
  54. The `repository-hdfs` plugin provides the necessary permissions for both Apache Hadoop 1.x and 2.x (latest versions) to successfully
  55. run in a secured JVM as one can tell from the number of permissions required when installing the plugin.
  56. However using a certain Hadoop File-System (outside DFS), a certain distro or operating system (in particular Windows), might require
  57. additional permissions which are not provided by the plugin.
  58. In this case there are several workarounds:
  59. * add the permission into `plugin-security.policy` (available in the plugin folder)
  60. * disable the security manager through `es.security.manager.enabled=false` configurations setting - NOT RECOMMENDED
  61. If you find yourself in such a situation, please let us know what Hadoop distro version and OS you are using and what permission is missing
  62. by raising an issue. Thank you!
  63. [[repository-hdfs-config]]
  64. ==== Configuration Properties
  65. Once installed, define the configuration for the `hdfs` repository through `elasticsearch.yml` or the
  66. {ref}/modules-snapshots.html[REST API]:
  67. [source]
  68. ----
  69. repositories
  70. hdfs:
  71. uri: "hdfs://<host>:<port>/" # optional - Hadoop file-system URI
  72. path: "some/path" # required - path with the file-system where data is stored/loaded
  73. load_defaults: "true" # optional - whether to load the default Hadoop configuration (default) or not
  74. conf_location: "extra-cfg.xml" # optional - Hadoop configuration XML to be loaded (use commas for multi values)
  75. conf.<key> : "<value>" # optional - 'inlined' key=value added to the Hadoop configuration
  76. concurrent_streams: 5 # optional - the number of concurrent streams (defaults to 5)
  77. compress: "false" # optional - whether to compress the metadata or not (default)
  78. chunk_size: "10mb" # optional - chunk size (disabled by default)
  79. ----
  80. NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while
  81. others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.
  82. ===== Plugging other file-systems
  83. Any HDFS-compatible file-systems (like Amazon `s3://` or Google `gs://`) can be used as long as the proper Hadoop
  84. configuration is passed to the Elasticsearch plugin. In practice, this means making sure the correct Hadoop configuration
  85. files (`core-site.xml` and `hdfs-site.xml`) and its jars are available in plugin classpath, just as you would with any
  86. other Hadoop client or job.
  87. Otherwise, the plugin will only read the _default_, vanilla configuration of Hadoop and will not be able to recognized
  88. the plugged-in file-system.