explain-dfanalytics.asciidoc 4.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
  1. [role="xpack"]
  2. [[explain-dfanalytics]]
  3. = Explain {dfanalytics} API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Explain {dfanalytics}</titleabbrev>
  7. ++++
  8. Explains a {dataframe-analytics-config}.
  9. [[ml-explain-dfanalytics-request]]
  10. == {api-request-title}
  11. `GET _ml/data_frame/analytics/_explain` +
  12. `POST _ml/data_frame/analytics/_explain` +
  13. `GET _ml/data_frame/analytics/<data_frame_analytics_id>/_explain` +
  14. `POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain`
  15. [[ml-explain-dfanalytics-prereq]]
  16. == {api-prereq-title}
  17. Requires the following privileges:
  18. * cluster: `monitor_ml` (the `machine_learning_user` built-in role grants this
  19. privilege)
  20. * source indices: `read`, `view_index_metadata`
  21. [[ml-explain-dfanalytics-desc]]
  22. == {api-description-title}
  23. This API provides explanations for a {dataframe-analytics-config} that either
  24. exists already or one that has not been created yet.
  25. The following explanations are provided:
  26. * which fields are included or not in the analysis and why,
  27. * how much memory is estimated to be required. The estimate can be used when
  28. deciding the appropriate value for `model_memory_limit` setting later on.
  29. If you have object fields or fields that are excluded via source filtering,
  30. they are not included in the explanation.
  31. [[ml-explain-dfanalytics-path-params]]
  32. == {api-path-parms-title}
  33. `<data_frame_analytics_id>`::
  34. (Optional, string)
  35. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
  36. [[ml-explain-dfanalytics-request-body]]
  37. == {api-request-body-title}
  38. A {dataframe-analytics-config} as described in <<put-dfanalytics>>.
  39. Note that `id` and `dest` don't need to be provided in the context of this API.
  40. [role="child_attributes"]
  41. [[ml-explain-dfanalytics-results]]
  42. == {api-response-body-title}
  43. The API returns a response that contains the following:
  44. `field_selection`::
  45. (array)
  46. An array of objects that explain selection for each field, sorted by
  47. the field names.
  48. +
  49. .Properties of `field_selection` objects
  50. [%collapsible%open]
  51. ====
  52. `is_included`:::
  53. (Boolean) Whether the field is selected to be included in the analysis.
  54. `is_required`:::
  55. (Boolean) Whether the field is required.
  56. `feature_type`:::
  57. (string) The feature type of this field for the analysis. May be `categorical`
  58. or `numerical`.
  59. `mapping_types`:::
  60. (string) The mapping types of the field.
  61. `name`:::
  62. (string) The field name.
  63. `reason`:::
  64. (string) The reason a field is not selected to be included in the analysis.
  65. ====
  66. `memory_estimation`::
  67. (object)
  68. An object containing the memory estimates.
  69. +
  70. .Properties of `memory_estimation`
  71. [%collapsible%open]
  72. ====
  73. `expected_memory_with_disk`:::
  74. (string) Estimated memory usage under the assumption that overflowing to disk is
  75. allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
  76. than `expected_memory_without_disk` as using disk allows to limit the main
  77. memory needed to perform {dfanalytics}.
  78. `expected_memory_without_disk`:::
  79. (string) Estimated memory usage under the assumption that the whole
  80. {dfanalytics} should happen in memory (i.e. without overflowing to disk).
  81. ====
  82. [[ml-explain-dfanalytics-example]]
  83. == {api-examples-title}
  84. [source,console]
  85. --------------------------------------------------
  86. POST _ml/data_frame/analytics/_explain
  87. {
  88. "source": {
  89. "index": "houses_sold_last_10_yrs"
  90. },
  91. "analysis": {
  92. "regression": {
  93. "dependent_variable": "price"
  94. }
  95. }
  96. }
  97. --------------------------------------------------
  98. // TEST[skip:TBD]
  99. The API returns the following results:
  100. [source,console-result]
  101. ----
  102. {
  103. "field_selection": [
  104. {
  105. "field": "number_of_bedrooms",
  106. "mappings_types": ["integer"],
  107. "is_included": true,
  108. "is_required": false,
  109. "feature_type": "numerical"
  110. },
  111. {
  112. "field": "postcode",
  113. "mappings_types": ["text"],
  114. "is_included": false,
  115. "is_required": false,
  116. "reason": "[postcode.keyword] is preferred because it is aggregatable"
  117. },
  118. {
  119. "field": "postcode.keyword",
  120. "mappings_types": ["keyword"],
  121. "is_included": true,
  122. "is_required": false,
  123. "feature_type": "categorical"
  124. },
  125. {
  126. "field": "price",
  127. "mappings_types": ["float"],
  128. "is_included": true,
  129. "is_required": true,
  130. "feature_type": "numerical"
  131. }
  132. ],
  133. "memory_estimation": {
  134. "expected_memory_without_disk": "128MB",
  135. "expected_memory_with_disk": "32MB"
  136. }
  137. }
  138. ----