repository-hdfs.asciidoc 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
  1. [[repository-hdfs]]
  2. === Hadoop HDFS Repository Plugin
  3. The HDFS repository plugin adds support for using HDFS File System as a repository for
  4. {ref}/modules-snapshots.html[Snapshot/Restore].
  5. [[repository-hdfs-install]]
  6. [float]
  7. ==== Installation
  8. This plugin can be installed through the plugin manager:
  9. [source,sh]
  10. ----------------------------------------------------------------
  11. sudo bin/plugin install repository-hdfs
  12. ----------------------------------------------------------------
  13. The plugin must be installed on _every_ node in the cluster, and each node must
  14. be restarted after installation.
  15. [[repository-hdfs-remove]]
  16. [float]
  17. ==== Removal
  18. The plugin can be removed by specifying the _installed_ package:
  19. [source,sh]
  20. ----------------------------------------------------------------
  21. sudo bin/plugin remove repository-hdfs
  22. ----------------------------------------------------------------
  23. The node must be stopped before removing the plugin.
  24. [[repository-hdfs-usage]]
  25. ==== Getting started with HDFS
  26. The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol
  27. compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).
  28. Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder.
  29. Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).
  30. Windows Users::
  31. Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
  32. plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).
  33. [[repository-hdfs-config]]
  34. ==== Configuration Properties
  35. Once installed, define the configuration for the `hdfs` repository through `elasticsearch.yml` or the
  36. {ref}/modules-snapshots.html[REST API]:
  37. [source,yaml]
  38. ----
  39. repositories
  40. hdfs:
  41. uri: "hdfs://<host>:<port>/" \# optional - Hadoop file-system URI
  42. path: "some/path" \# required - path with the file-system where data is stored/loaded
  43. load_defaults: "true" \# optional - whether to load the default Hadoop configuration (default) or not
  44. conf_location: "extra-cfg.xml" \# optional - Hadoop configuration XML to be loaded (use commas for multi values)
  45. conf.<key> : "<value>" \# optional - 'inlined' key=value added to the Hadoop configuration
  46. concurrent_streams: 5 \# optional - the number of concurrent streams (defaults to 5)
  47. compress: "false" \# optional - whether to compress the metadata or not (default)
  48. chunk_size: "10mb" \# optional - chunk size (disabled by default)
  49. ----
  50. NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while
  51. others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.