dfanalyticsresources.asciidoc 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[ml-dfanalytics-resources]]
  4. === {dfanalytics-cap} job resources
  5. {dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
  6. <<get-dfanalytics>>.
  7. [discrete]
  8. [[ml-dfanalytics-properties]]
  9. ==== {api-definitions-title}
  10. `analysis`::
  11. (object) The type of analysis that is performed on the `source`. For example:
  12. `outlier_detection`. For more information, see <<dfanalytics-types>>.
  13. `analyzed_fields`::
  14. (object) You can specify both `includes` and/or `excludes` patterns. If
  15. `analyzed_fields` is not set, only the relevant fields will be included. For
  16. example all the numeric fields for {oldetection}.
  17. `analyzed_fields.includes`:::
  18. (array) An array of strings that defines the fields that will be included in
  19. the analysis.
  20. `analyzed_fields.excludes`:::
  21. (array) An array of strings that defines the fields that will be excluded
  22. from the analysis.
  23. [source,console]
  24. --------------------------------------------------
  25. PUT _ml/data_frame/analytics/loganalytics
  26. {
  27. "source": {
  28. "index": "logdata"
  29. },
  30. "dest": {
  31. "index": "logdata_out"
  32. },
  33. "analysis": {
  34. "outlier_detection": {
  35. }
  36. },
  37. "analyzed_fields": {
  38. "includes": [ "request.bytes", "response.counts.error" ],
  39. "excludes": [ "source.geo" ]
  40. }
  41. }
  42. --------------------------------------------------
  43. // TEST[setup:setup_logdata]
  44. `description`::
  45. (Optional, string) A description of the job.
  46. `dest`::
  47. (object) The destination configuration of the analysis.
  48. `index`:::
  49. (Required, string) Defines the _destination index_ to store the results of
  50. the {dfanalytics-job}.
  51. `results_field`:::
  52. (Optional, string) Defines the name of the field in which to store the
  53. results of the analysis. Default to `ml`.
  54. `id`::
  55. (string) The unique identifier for the {dfanalytics-job}. This identifier can
  56. contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
  57. underscores. It must start and end with alphanumeric characters. This property
  58. is informational; you cannot change the identifier for existing jobs.
  59. `model_memory_limit`::
  60. (string) The approximate maximum amount of memory resources that are
  61. permitted for analytical processing. The default value for {dfanalytics-jobs}
  62. is `1gb`. If your `elasticsearch.yml` file contains an
  63. `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
  64. create {dfanalytics-jobs} that have `model_memory_limit` values greater than
  65. that setting. For more information, see <<ml-settings>>.
  66. `source`::
  67. (object) The source configuration consisting an `index` and optionally a
  68. `query` object.
  69. `index`:::
  70. (Required, string or array) Index or indices on which to perform the
  71. analysis. It can be a single index or index pattern as well as an array of
  72. indices or patterns.
  73. `query`:::
  74. (Optional, object) The {es} query domain-specific language
  75. (<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
  76. search POST body. All the options that are supported by {es} can be used,
  77. as this object is passed verbatim to {es}. By default, this property has
  78. the following value: `{"match_all": {}}`.
  79. [[dfanalytics-types]]
  80. ==== Analysis objects
  81. {dfanalytics-cap} resources contain `analysis` objects. For example, when you
  82. create a {dfanalytics-job}, you must define the type of analysis it performs.
  83. Currently, `outlier_detection` is the only available type of analysis, however,
  84. other types will be added, for example `regression`.
  85. [discrete]
  86. [[oldetection-resources]]
  87. ==== {oldetection-cap} configuration objects
  88. An {oldetection} configuration object has the following properties:
  89. `compute_feature_influence`::
  90. (boolean) If `true`, the feature influence calculation is enabled. Defaults to
  91. `true`.
  92. `feature_influence_threshold`::
  93. (double) The minimum {olscore} that a document needs to have in order to
  94. calculate its {fiscore}. Value range: 0-1 (`0.1` by default).
  95. `method`::
  96. (string) Sets the method that {oldetection} uses. If the method is not set
  97. {oldetection} uses an ensemble of different methods and normalises and
  98. combines their individual {olscores} to obtain the overall {olscore}. We
  99. recommend to use the ensemble method. Available methods are `lof`, `ldof`,
  100. `distance_kth_nn`, `distance_knn`.
  101. `n_neighbors`::
  102. (integer) Defines the value for how many nearest neighbors each method of
  103. {oldetection} will use to calculate its {olscore}. When the value is not set,
  104. different values will be used for different ensemble members. This helps
  105. improve diversity in the ensemble. Therefore, only override this if you are
  106. confident that the value you choose is appropriate for the data set.
  107. `outlier_fraction`::
  108. (double) Sets the proportion of the data set that is assumed to be outlying prior to
  109. {oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers
  110. and 95% are inliers.
  111. `standardize_columns`::
  112. (boolean) If `true`, then the following operation is performed on the columns
  113. before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to
  114. `true`. For more information, see
  115. https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].