explain-dfanalytics.asciidoc 4.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[explain-dfanalytics]]
  4. = Explain {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Explain {dfanalytics}</titleabbrev>
  8. ++++
  9. Explains a {dataframe-analytics-config}.
  10. [[ml-explain-dfanalytics-request]]
  11. == {api-request-title}
  12. `GET _ml/data_frame/analytics/_explain` +
  13. `POST _ml/data_frame/analytics/_explain` +
  14. `GET _ml/data_frame/analytics/<data_frame_analytics_id>/_explain` +
  15. `POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain`
  16. [[ml-explain-dfanalytics-prereq]]
  17. == {api-prereq-title}
  18. If the {es} {security-features} are enabled, you must have the following
  19. privileges:
  20. * cluster: `monitor_ml`
  21. * source indices: `read`, `view_index_metadata`
  22. For more information, see <<security-privileges>> and
  23. {ml-docs-setup-privileges}.
  24. [[ml-explain-dfanalytics-desc]]
  25. == {api-description-title}
  26. This API provides explanations for a {dataframe-analytics-config} that either
  27. exists already or one that has not been created yet.
  28. The following explanations are provided:
  29. * which fields are included or not in the analysis and why,
  30. * how much memory is estimated to be required. The estimate can be used when
  31. deciding the appropriate value for `model_memory_limit` setting later on.
  32. If you have object fields or fields that are excluded via source filtering,
  33. they are not included in the explanation.
  34. [[ml-explain-dfanalytics-path-params]]
  35. == {api-path-parms-title}
  36. `<data_frame_analytics_id>`::
  37. (Optional, string)
  38. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
  39. [[ml-explain-dfanalytics-request-body]]
  40. == {api-request-body-title}
  41. A {dataframe-analytics-config} as described in <<put-dfanalytics>>.
  42. Note that `id` and `dest` don't need to be provided in the context of this API.
  43. [role="child_attributes"]
  44. [[ml-explain-dfanalytics-results]]
  45. == {api-response-body-title}
  46. The API returns a response that contains the following:
  47. `field_selection`::
  48. (array)
  49. An array of objects that explain selection for each field, sorted by
  50. the field names.
  51. +
  52. .Properties of `field_selection` objects
  53. [%collapsible%open]
  54. ====
  55. `is_included`:::
  56. (Boolean) Whether the field is selected to be included in the analysis.
  57. `is_required`:::
  58. (Boolean) Whether the field is required.
  59. `feature_type`:::
  60. (string) The feature type of this field for the analysis. May be `categorical`
  61. or `numerical`.
  62. `mapping_types`:::
  63. (string) The mapping types of the field.
  64. `name`:::
  65. (string) The field name.
  66. `reason`:::
  67. (string) The reason a field is not selected to be included in the analysis.
  68. ====
  69. `memory_estimation`::
  70. (object)
  71. An object containing the memory estimates.
  72. +
  73. .Properties of `memory_estimation`
  74. [%collapsible%open]
  75. ====
  76. `expected_memory_with_disk`:::
  77. (string) Estimated memory usage under the assumption that overflowing to disk is
  78. allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
  79. than `expected_memory_without_disk` as using disk allows to limit the main
  80. memory needed to perform {dfanalytics}.
  81. `expected_memory_without_disk`:::
  82. (string) Estimated memory usage under the assumption that the whole
  83. {dfanalytics} should happen in memory (i.e. without overflowing to disk).
  84. ====
  85. [[ml-explain-dfanalytics-example]]
  86. == {api-examples-title}
  87. [source,console]
  88. --------------------------------------------------
  89. POST _ml/data_frame/analytics/_explain
  90. {
  91. "source": {
  92. "index": "houses_sold_last_10_yrs"
  93. },
  94. "analysis": {
  95. "regression": {
  96. "dependent_variable": "price"
  97. }
  98. }
  99. }
  100. --------------------------------------------------
  101. // TEST[skip:TBD]
  102. The API returns the following results:
  103. [source,console-result]
  104. ----
  105. {
  106. "field_selection": [
  107. {
  108. "field": "number_of_bedrooms",
  109. "mappings_types": ["integer"],
  110. "is_included": true,
  111. "is_required": false,
  112. "feature_type": "numerical"
  113. },
  114. {
  115. "field": "postcode",
  116. "mappings_types": ["text"],
  117. "is_included": false,
  118. "is_required": false,
  119. "reason": "[postcode.keyword] is preferred because it is aggregatable"
  120. },
  121. {
  122. "field": "postcode.keyword",
  123. "mappings_types": ["keyword"],
  124. "is_included": true,
  125. "is_required": false,
  126. "feature_type": "categorical"
  127. },
  128. {
  129. "field": "price",
  130. "mappings_types": ["float"],
  131. "is_included": true,
  132. "is_required": true,
  133. "feature_type": "numerical"
  134. }
  135. ],
  136. "memory_estimation": {
  137. "expected_memory_without_disk": "128MB",
  138. "expected_memory_with_disk": "32MB"
  139. }
  140. }
  141. ----