explain-dfanalytics.asciidoc 4.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[explain-dfanalytics]]
  4. === Explain {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Explain {dfanalytics} API</titleabbrev>
  8. ++++
  9. Explains a {dataframe-analytics-config}.
  10. experimental[]
  11. [[ml-explain-dfanalytics-request]]
  12. ==== {api-request-title}
  13. `GET _ml/data_frame/analytics/_explain` +
  14. `POST _ml/data_frame/analytics/_explain` +
  15. `GET _ml/data_frame/analytics/<data_frame_analytics_id>/_explain` +
  16. `POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain`
  17. [[ml-explain-dfanalytics-prereq]]
  18. ==== {api-prereq-title}
  19. * You must have `monitor_ml` privilege to use this API. For more
  20. information, see <<security-privileges>> and <<built-in-roles>>.
  21. [[ml-explain-dfanalytics-desc]]
  22. ==== {api-description-title}
  23. This API provides explanations for a {dataframe-analytics-config} that either exists already or one that has not been created yet.
  24. The following explanations are provided:
  25. * which fields are included or not in the analysis and why
  26. * how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for `model_memory_limit` setting later on.
  27. about either an existing {dfanalytics-job} or one that has not been created yet.
  28. [[ml-explain-dfanalytics-path-params]]
  29. ==== {api-path-parms-title}
  30. `<data_frame_analytics_id>`::
  31. (Optional, string) A numerical character string that uniquely identifies the existing
  32. {dfanalytics-job} to explain. This identifier can contain lowercase alphanumeric
  33. characters (a-z and 0-9), hyphens, and underscores. It must start and end with
  34. alphanumeric characters.
  35. [[ml-explain-dfanalytics-request-body]]
  36. ==== {api-request-body-title}
  37. `data_frame_analytics_config`::
  38. (Optional, object) Intended configuration of {dfanalytics-job}. For more information, see
  39. <<ml-dfanalytics-resources>>.
  40. Note that `id` and `dest` don't need to be provided in the context of this API.
  41. [[ml-explain-dfanalytics-results]]
  42. ==== {api-response-body-title}
  43. The API returns a response that contains the following:
  44. `field_selection`::
  45. (array) An array of objects that explain selection for each field, sorted by the field names.
  46. Each object in the array has the following properties:
  47. `name`:::
  48. (string) The field name.
  49. `mapping_types`:::
  50. (string) The mapping types of the field.
  51. `is_included`:::
  52. (boolean) Whether the field is selected to be included in the analysis.
  53. `is_required`:::
  54. (boolean) Whether the field is required.
  55. `feature_type`:::
  56. (string) The feature type of this field for the analysis. May be `categorical` or `numerical`.
  57. `reason`:::
  58. (string) The reason a field is not selected to be included in the analysis.
  59. `memory_estimation`::
  60. (object) An object containing the memory estimates. The object has the following properties:
  61. `expected_memory_without_disk`:::
  62. (string) Estimated memory usage under the assumption that the whole {dfanalytics} should happen in memory
  63. (i.e. without overflowing to disk).
  64. `expected_memory_with_disk`:::
  65. (string) Estimated memory usage under the assumption that overflowing to disk is allowed during {dfanalytics}.
  66. `expected_memory_with_disk` is usually smaller than `expected_memory_without_disk` as using disk allows to
  67. limit the main memory needed to perform {dfanalytics}.
  68. [[ml-explain-dfanalytics-example]]
  69. ==== {api-examples-title}
  70. [source,console]
  71. --------------------------------------------------
  72. POST _ml/data_frame/analytics/_explain
  73. {
  74. "data_frame_analytics_config": {
  75. "source": {
  76. "index": "houses_sold_last_10_yrs"
  77. },
  78. "analysis": {
  79. "regression": {
  80. "dependent_variable": "price"
  81. }
  82. }
  83. }
  84. }
  85. --------------------------------------------------
  86. // TEST[skip:TBD]
  87. The API returns the following results:
  88. [source,console-result]
  89. ----
  90. {
  91. "field_selection": [
  92. {
  93. "field": "number_of_bedrooms",
  94. "mappings_types": ["integer"],
  95. "is_included": true,
  96. "is_required": false,
  97. "feature_type": "numerical"
  98. },
  99. {
  100. "field": "postcode",
  101. "mappings_types": ["text"],
  102. "is_included": false,
  103. "is_required": false,
  104. "reason": "[postcode.keyword] is preferred because it is aggregatable"
  105. },
  106. {
  107. "field": "postcode.keyword",
  108. "mappings_types": ["keyword"],
  109. "is_included": true,
  110. "is_required": false,
  111. "feature_type": "categorical"
  112. },
  113. {
  114. "field": "price",
  115. "mappings_types": ["float"],
  116. "is_included": true,
  117. "is_required": true,
  118. "feature_type": "numerical"
  119. }
  120. ],
  121. "memory_estimation": {
  122. "expected_memory_without_disk": "128MB",
  123. "expected_memory_with_disk": "32MB"
  124. }
  125. }
  126. ----