123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869 |
- [[repository-hdfs]]
- === Hadoop HDFS Repository Plugin
- The HDFS repository plugin adds support for using HDFS File System as a repository for
- {ref}/modules-snapshots.html[Snapshot/Restore].
- [[repository-hdfs-install]]
- [float]
- ==== Installation
- This plugin can be installed through the plugin manager:
- [source,sh]
- ----------------------------------------------------------------
- sudo bin/plugin install repository-hdfs
- ----------------------------------------------------------------
- The plugin must be installed on _every_ node in the cluster, and each node must
- be restarted after installation.
- [[repository-hdfs-remove]]
- [float]
- ==== Removal
- The plugin can be removed by specifying the _installed_ package:
- [source,sh]
- ----------------------------------------------------------------
- sudo bin/plugin remove repository-hdfs
- ----------------------------------------------------------------
- The node must be stopped before removing the plugin.
- [[repository-hdfs-usage]]
- ==== Getting started with HDFS
- The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol
- compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).
- Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder.
- Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).
- Windows Users::
- Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
- plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).
- [[repository-hdfs-config]]
- ==== Configuration Properties
- Once installed, define the configuration for the `hdfs` repository through `elasticsearch.yml` or the
- {ref}/modules-snapshots.html[REST API]:
- [source,yaml]
- ----
- repositories
- hdfs:
- uri: "hdfs://<host>:<port>/" \# optional - Hadoop file-system URI
- path: "some/path" \# required - path with the file-system where data is stored/loaded
- load_defaults: "true" \# optional - whether to load the default Hadoop configuration (default) or not
- conf_location: "extra-cfg.xml" \# optional - Hadoop configuration XML to be loaded (use commas for multi values)
- conf.<key> : "<value>" \# optional - 'inlined' key=value added to the Hadoop configuration
- concurrent_streams: 5 \# optional - the number of concurrent streams (defaults to 5)
- compress: "false" \# optional - whether to compress the metadata or not (default)
- chunk_size: "10mb" \# optional - chunk size (disabled by default)
-
- ----
- NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while
- others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.
|