|
@@ -16,6 +16,13 @@ suitable to be ingested into {es}.
|
|
|
|
|
|
`POST _ml/find_file_structure`
|
|
|
|
|
|
+[[ml-find-file-structure-prereqs]]
|
|
|
+==== {api-prereq-title}
|
|
|
+
|
|
|
+* If the {es} {security-features} are enabled, you must have `monitor_ml` or
|
|
|
+`monitor` cluster privileges to use this API. See
|
|
|
+{stack-ov}/security-privileges.html[Security privileges].
|
|
|
+
|
|
|
[[ml-find-file-structure-desc]]
|
|
|
==== {api-description-title}
|
|
|
|
|
@@ -51,36 +58,36 @@ chosen.
|
|
|
[[ml-find-file-structure-query-parms]]
|
|
|
==== {api-query-parms-title}
|
|
|
|
|
|
-`charset`::
|
|
|
+`charset` (Optional)::
|
|
|
(string) The file's character set. It must be a character set that is supported
|
|
|
by the JVM that {es} uses. For example, `UTF-8`, `UTF-16LE`, `windows-1252`, or
|
|
|
`EUC-JP`. If this parameter is not specified, the structure finder chooses an
|
|
|
appropriate character set.
|
|
|
|
|
|
-`column_names`::
|
|
|
+`column_names` (Optional)::
|
|
|
(string) If you have set `format` to `delimited`, you can specify the column names
|
|
|
in a comma-separated list. If this parameter is not specified, the structure
|
|
|
finder uses the column names from the header row of the file. If the file does
|
|
|
not have a header role, columns are named "column1", "column2", "column3", etc.
|
|
|
|
|
|
-`delimiter`::
|
|
|
+`delimiter` (Optional)::
|
|
|
(string) If you have set `format` to `delimited`, you can specify the character used
|
|
|
to delimit the values in each row. Only a single character is supported; the
|
|
|
delimiter cannot have multiple characters. If this parameter is not specified,
|
|
|
the structure finder considers the following possibilities: comma, tab,
|
|
|
semi-colon, and pipe (`|`).
|
|
|
|
|
|
-`explain`::
|
|
|
+`explain` (Optional)::
|
|
|
(boolean) If this parameter is set to `true`, the response includes a field
|
|
|
named `explanation`, which is an array of strings that indicate how the
|
|
|
structure finder produced its result. The default value is `false`.
|
|
|
|
|
|
-`format`::
|
|
|
+`format` (Optional)::
|
|
|
(string) The high level structure of the file. Valid values are `ndjson`, `xml`,
|
|
|
`delimited`, and `semi_structured_text`. If this parameter is not specified,
|
|
|
the structure finder chooses one.
|
|
|
|
|
|
-`grok_pattern`::
|
|
|
+`grok_pattern` (Optional)::
|
|
|
(string) If you have set `format` to `semi_structured_text`, you can specify a Grok
|
|
|
pattern that is used to extract fields from every message in the file. The
|
|
|
name of the timestamp field in the Grok pattern must match what is specified
|
|
@@ -88,20 +95,20 @@ chosen.
|
|
|
name of the timestamp field in the Grok pattern must match "timestamp". If
|
|
|
`grok_pattern` is not specified, the structure finder creates a Grok pattern.
|
|
|
|
|
|
-`has_header_row`::
|
|
|
+`has_header_row` (Optional)::
|
|
|
(boolean) If you have set `format` to `delimited`, you can use this parameter to
|
|
|
indicate whether the column names are in the first row of the file. If this
|
|
|
parameter is not specified, the structure finder guesses based on the similarity of
|
|
|
the first row of the file to other rows.
|
|
|
|
|
|
-`line_merge_size_limit`::
|
|
|
+`line_merge_size_limit` (Optional)::
|
|
|
(unsigned integer) The maximum number of characters in a message when lines are
|
|
|
merged to form messages while analyzing semi-structured files. The default
|
|
|
is 10000. If you have extremely long messages you may need to increase this, but
|
|
|
be aware that this may lead to very long processing times if the way to group
|
|
|
lines into messages is misdetected.
|
|
|
|
|
|
-`lines_to_sample`::
|
|
|
+`lines_to_sample` (Optional)::
|
|
|
(unsigned integer) The number of lines to include in the structural analysis,
|
|
|
starting from the beginning of the file. The minimum is 2; the default
|
|
|
is 1000. If the value of this parameter is greater than the number of lines in
|
|
@@ -117,7 +124,7 @@ efficient to upload a sample file with more variety in the first 1000 lines than
|
|
|
to request analysis of 100000 lines to achieve some variety.
|
|
|
--
|
|
|
|
|
|
-`quote`::
|
|
|
+`quote` (Optional)::
|
|
|
(string) If you have set `format` to `delimited`, you can specify the character used
|
|
|
to quote the values in each row if they contain newlines or the delimiter
|
|
|
character. Only a single character is supported. If this parameter is not
|
|
@@ -125,18 +132,18 @@ to request analysis of 100000 lines to achieve some variety.
|
|
|
format does not use quoting, a workaround is to set this argument to a
|
|
|
character that does not appear anywhere in the sample.
|
|
|
|
|
|
-`should_trim_fields`::
|
|
|
+`should_trim_fields` (Optional)::
|
|
|
(boolean) If you have set `format` to `delimited`, you can specify whether values
|
|
|
between delimiters should have whitespace trimmed from them. If this parameter
|
|
|
is not specified and the delimiter is pipe (`|`), the default value is `true`.
|
|
|
Otherwise, the default value is `false`.
|
|
|
|
|
|
-`timeout`::
|
|
|
+`timeout` (Optional)::
|
|
|
(time) Sets the maximum amount of time that the structure analysis make take.
|
|
|
If the analysis is still running when the timeout expires then it will be
|
|
|
aborted. The default value is 25 seconds.
|
|
|
|
|
|
-`timestamp_field`::
|
|
|
+`timestamp_field` (Optional)::
|
|
|
(string) The name of the field that contains the primary timestamp of each
|
|
|
record in the file. In particular, if the file were ingested into an index,
|
|
|
this is the field that would be used to populate the `@timestamp` field. +
|
|
@@ -155,7 +162,7 @@ field (if any) is the primary timestamp field. For structured file formats, it
|
|
|
is not compulsory to have a timestamp in the file.
|
|
|
--
|
|
|
|
|
|
-`timestamp_format`::
|
|
|
+`timestamp_format` (Optional)::
|
|
|
(string) The Java time format of the timestamp field in the file. +
|
|
|
+
|
|
|
--
|
|
@@ -207,13 +214,6 @@ be ingested into {es}. It does not need to be in JSON format and it does not
|
|
|
need to be UTF-8 encoded. The size is limited to the {es} HTTP receive buffer
|
|
|
size, which defaults to 100 Mb.
|
|
|
|
|
|
-[[ml-find-file-structure-prereqs]]
|
|
|
-==== {api-prereq-title}
|
|
|
-
|
|
|
-You must have `monitor_ml`, or `monitor` cluster privileges to use this API.
|
|
|
-For more information, see {stack-ov}/security-privileges.html[Security Privileges].
|
|
|
-
|
|
|
-
|
|
|
[[ml-find-file-structure-examples]]
|
|
|
==== {api-examples-title}
|
|
|
|