connectors-s3.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437
  1. [#es-connectors-s3]
  2. === Elastic S3 connector reference
  3. ++++
  4. <titleabbrev>S3</titleabbrev>
  5. ++++
  6. // Attributes used in this file:
  7. :service-name: Amazon S3
  8. :service-name-stub: s3
  9. The _Elastic S3 connector_ is a <<es-connectors,connector>> for https://aws.amazon.com/s3/[Amazon S3^] data sources.
  10. // //////// //// //// //// //// //// //// ////////
  11. // //////// NATIVE CONNECTOR REFERENCE (MANAGED SERVICE) ///////
  12. // //////// //// //// //// //// //// //// ////////
  13. [discrete#es-connectors-s3-native-connector-reference]
  14. ==== *Elastic managed connector reference*
  15. .View *Elastic managed connector* reference
  16. [%collapsible]
  17. ===============
  18. [discrete#es-connectors-s3-prerequisites]
  19. ===== Availability and prerequisites
  20. This connector is available natively in Elastic Cloud as of version *8.12.0*.
  21. To use this connector, satisfy all <<es-native-connectors, managed connector requirements>>.
  22. [discrete#es-connectors-s3-create-native-connector]
  23. ===== Create a {service-name} connector
  24. include::_connectors-create-native.asciidoc[]
  25. [discrete#es-connectors-s3-usage]
  26. ===== Usage
  27. To use this managed connector, see <<es-native-connectors>>.
  28. For additional operations, see <<es-connectors-usage>>.
  29. S3 users will also need to <<es-connectors-s3-usage-create-iam, Create an IAM identity>>
  30. [discrete#es-connectors-s3-usage-create-iam]
  31. ====== Create an IAM identity
  32. Users need to create an IAM identity to use this connector as a *self-managed connector*.
  33. Refer to https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-set-up.html[the AWS documentation^].
  34. The https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html[policy^] associated with the IAM identity must have the following *AWS permissions*:
  35. * `ListAllMyBuckets`
  36. * `ListBucket`
  37. * `GetBucketLocation`
  38. * `GetObject`
  39. [discrete#es-connectors-s3-compatibility]
  40. ===== Compatibility
  41. Currently the connector does not support S3-compatible vendors.
  42. [discrete#es-connectors-s3-configuration]
  43. ===== Configuration
  44. The following configuration fields are required to *set up* the connector:
  45. AWS Buckets::
  46. List of S3 bucket names.
  47. `*` will fetch data from all buckets.
  48. Examples:
  49. +
  50. * `testbucket, prodbucket`
  51. * `testbucket`
  52. * `*`
  53. [NOTE]
  54. ====
  55. This field is ignored when using advanced sync rules.
  56. ====
  57. AWS Access Key ID::
  58. Access Key ID for the AWS identity that will be used for bucket access.
  59. AWS Secret Key::
  60. Secret Access Key for the AWS identity that will be used for bucket access.
  61. [discrete#es-connectors-s3-documents-syncs]
  62. ===== Documents and syncs
  63. [NOTE]
  64. ====
  65. * Content from files bigger than 10 MB won't be extracted. (Self-managed connectors can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.)
  66. * Permissions are not synced.
  67. **All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
  68. ====
  69. [discrete#es-connectors-s3-sync-rules]
  70. ===== Sync rules
  71. <<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
  72. [discrete#es-connectors-s3-sync-rules-advanced]
  73. ====== Advanced sync rules
  74. [NOTE]
  75. ====
  76. A <<es-connectors-sync-types-full, full sync>> is required for advanced sync rules to take effect.
  77. ====
  78. Advanced sync rules are defined through a source-specific DSL JSON snippet.
  79. Use advanced sync rules to filter data to be fetched from Amazon S3 buckets.
  80. They take the following parameters:
  81. 1. `bucket`: S3 bucket the rule applies to.
  82. 2. `extension` (optional): Lists which file types to sync. Defaults to syncing all types.
  83. 3. `prefix` (optional): String of prefix characters.
  84. The connector will fetch file and folder data that matches the string.
  85. Defaults to `""` (syncs all bucket objects).
  86. [discrete#es-connectors-s3-sync-rules-advanced-examples]
  87. *Advanced sync rules examples*
  88. *Fetching files and folders recursively by prefix*
  89. *Example*: Fetch files/folders in `folder1/docs`.
  90. [source,js]
  91. ----
  92. [
  93. {
  94. "bucket": "bucket1",
  95. "prefix": "folder1/docs"
  96. }
  97. ]
  98. ----
  99. // NOTCONSOLE
  100. *Example*: Fetch files/folder starting with `folder1`.
  101. [source,js]
  102. ----
  103. [
  104. {
  105. "bucket": "bucket2",
  106. "prefix": "folder1"
  107. }
  108. ]
  109. ----
  110. // NOTCONSOLE
  111. *Fetching files and folders by specifying extensions*
  112. *Example*: Fetch all objects which start with `abc` and then filter using file extensions.
  113. [source,js]
  114. ----
  115. [
  116. {
  117. "bucket": "bucket2",
  118. "prefix": "abc",
  119. "extension": [".txt", ".png"]
  120. }
  121. ]
  122. ----
  123. // NOTCONSOLE
  124. [discrete#es-connectors-s3-content-extraction]
  125. ===== Content extraction
  126. See <<es-connectors-content-extraction>>.
  127. [discrete#es-connectors-s3-known-issues]
  128. ===== Known issues
  129. There are no known issues for this connector.
  130. See <<es-connectors-known-issues>> for any issues affecting all connectors.
  131. [discrete#es-connectors-s3-troubleshooting]
  132. ===== Troubleshooting
  133. See <<es-connectors-troubleshooting>>.
  134. [discrete#es-connectors-s3-security]
  135. ===== Security
  136. See <<es-connectors-security>>.
  137. [discrete#es-connectors-s3-source]
  138. ===== Framework and source
  139. This connector is built with the {connectors-python}[Elastic connector framework^].
  140. View the {connectors-python}/connectors/sources/s3.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
  141. // Closing the collapsible section
  142. ===============
  143. // //////// //// //// //// //// //// //// ////////
  144. // //////// CONNECTOR CLIENT REFERENCE (SELF-MANAGED) ///////
  145. // //////// //// //// //// //// //// //// ////////
  146. [discrete#es-connectors-s3-connector-client-reference]
  147. ==== *Self-managed connector reference*
  148. .View *self-managed connector* reference
  149. [%collapsible]
  150. ===============
  151. [discrete#es-connectors-s3-client-prerequisites]
  152. ===== Availability and prerequisites
  153. This connector is available as a self-managed *self-managed connector*.
  154. This self-managed connector is compatible with Elastic versions *8.6.0+*.
  155. To use this connector, satisfy all <<es-build-connector,self-managed connector requirements>>.
  156. [discrete#es-connectors-s3-create-connector-client]
  157. ===== Create a {service-name} connector
  158. include::_connectors-create-client.asciidoc[]
  159. [discrete#es-connectors-s3-client-usage]
  160. ===== Usage
  161. To use this connector as a *self-managed connector*, see <<es-build-connector>>.
  162. For additional operations, see <<es-connectors-usage>>.
  163. S3 users will also need to <<es-connectors-s3-client-usage-create-iam, Create an IAM identity>>
  164. [discrete#es-connectors-s3-client-usage-create-iam]
  165. ====== Create an IAM identity
  166. Users need to create an IAM identity to use this connector as a *self-managed connector*.
  167. Refer to https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-set-up.html[the AWS documentation^].
  168. The https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html[policy^] associated with the IAM identity must have the following *AWS permissions*:
  169. * `ListAllMyBuckets`
  170. * `ListBucket`
  171. * `GetBucketLocation`
  172. * `GetObject`
  173. [discrete#es-connectors-s3-client-compatibility]
  174. ===== Compatibility
  175. Currently the connector does not support S3-compatible vendors.
  176. [discrete#es-connectors-s3-client-configuration]
  177. ===== Configuration
  178. [TIP]
  179. ====
  180. When using the <<es-build-connector, self-managed connector>> workflow, these fields will use the default configuration set in the https://github.com/elastic/connectors/blob/a5976d20cd8277ae46511f7176662afc889e56ec/connectors/sources/s3.py#L231-L258[connector source code^].
  181. These configurable fields will be rendered with their respective *labels* in the Kibana UI.
  182. Once connected, you'll be able to update these values in Kibana.
  183. ====
  184. The following configuration fields are required to *set up* the connector:
  185. `buckets`::
  186. List of S3 bucket names.
  187. `*` will fetch data from all buckets.
  188. Examples:
  189. +
  190. * `testbucket, prodbucket`
  191. * `testbucket`
  192. * `*`
  193. [NOTE]
  194. ====
  195. This field is ignored when using advanced sync rules.
  196. ====
  197. `aws_access_key_id`::
  198. Access Key ID for the AWS identity that will be used for bucket access.
  199. `aws_secret_access_key`::
  200. Secret Access Key for the AWS identity that will be used for bucket access.
  201. `read_timeout`::
  202. The `read_timeout` for Amazon S3.
  203. Default value is `90`.
  204. `connect_timeout`::
  205. Connection timeout for crawling S3.
  206. Default value is `90`.
  207. `max_attempts`::
  208. Maximum retry attempts.
  209. Default value is `5`.
  210. `page_size`::
  211. Page size for iterating bucket objects in Amazon S3.
  212. Default value is `100`.
  213. [discrete#es-connectors-s3-client-docker]
  214. ===== Deployment using Docker
  215. include::_connectors-docker-instructions.asciidoc[]
  216. [discrete#es-connectors-s3-client-documents-syncs]
  217. ===== Documents and syncs
  218. [NOTE]
  219. ====
  220. * Content from files bigger than 10 MB won't be extracted by default. You can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.
  221. * Permissions are not synced.
  222. **All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
  223. ====
  224. [discrete#es-connectors-s3-client-sync-rules]
  225. ===== Sync rules
  226. <<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
  227. [discrete#es-connectors-s3-client-sync-rules-advanced]
  228. ====== Advanced sync rules
  229. [NOTE]
  230. ====
  231. A <<es-connectors-sync-types-full, full sync>> is required for advanced sync rules to take effect.
  232. ====
  233. Advanced sync rules are defined through a source-specific DSL JSON snippet.
  234. Use advanced sync rules to filter data to be fetched from Amazon S3 buckets.
  235. They take the following parameters:
  236. 1. `bucket`: S3 bucket the rule applies to.
  237. 2. `extension` (optional): Lists which file types to sync. Defaults to syncing all types.
  238. 3. `prefix` (optional): String of prefix characters.
  239. The connector will fetch file and folder data that matches the string.
  240. Defaults to `""` (syncs all bucket objects).
  241. [discrete#es-connectors-s3-client-sync-rules-advanced-examples]
  242. *Advanced sync rules examples*
  243. *Fetching files and folders recursively by prefix*
  244. *Example*: Fetch files/folders in `folder1/docs`.
  245. [source,js]
  246. ----
  247. [
  248. {
  249. "bucket": "bucket1",
  250. "prefix": "folder1/docs"
  251. }
  252. ]
  253. ----
  254. // NOTCONSOLE
  255. *Example*: Fetch files/folder starting with `folder1`.
  256. [source,js]
  257. ----
  258. [
  259. {
  260. "bucket": "bucket2",
  261. "prefix": "folder1"
  262. }
  263. ]
  264. ----
  265. // NOTCONSOLE
  266. *Fetching files and folders by specifying extensions*
  267. *Example*: Fetch all objects which start with `abc` and then filter using file extensions.
  268. [source,js]
  269. ----
  270. [
  271. {
  272. "bucket": "bucket2",
  273. "prefix": "abc",
  274. "extension": [".txt", ".png"]
  275. }
  276. ]
  277. ----
  278. // NOTCONSOLE
  279. [discrete#es-connectors-s3-client-content-extraction]
  280. ===== Content extraction
  281. See <<es-connectors-content-extraction>>.
  282. [discrete#es-connectors-s3-client-testing]
  283. ===== End-to-end testing
  284. The connector framework enables operators to run functional tests against a real data source.
  285. Refer to <<es-build-connector-testing>> for more details.
  286. To execute a functional test for the Amazon S3 *self-managed connector*, run the following command:
  287. [source,shell]
  288. ----
  289. make ftest NAME=s3
  290. ----
  291. By default, this will use a medium-sized dataset.
  292. To make the test faster add the `DATA_SIZE=small` argument:
  293. [source,shell]
  294. ----
  295. make ftest NAME=s3 DATA_SIZE=small
  296. ----
  297. [discrete#es-connectors-s3-client-known-issues]
  298. ===== Known issues
  299. There are no known issues for this connector.
  300. See <<es-connectors-known-issues>> for any issues affecting all connectors.
  301. [discrete#es-connectors-s3-client-troubleshooting]
  302. ===== Troubleshooting
  303. See <<es-connectors-troubleshooting>>.
  304. [discrete#es-connectors-s3-client-security]
  305. ===== Security
  306. See <<es-connectors-security>>.
  307. [discrete#es-connectors-s3-client-source]
  308. ===== Framework and source
  309. This connector is built with the {connectors-python}[Elastic connector framework^].
  310. View the {connectors-python}/connectors/sources/s3.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
  311. // Closing the collapsible section
  312. ===============