ml-functions.asciidoc 2.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344
  1. [role="xpack"]
  2. [[ml-functions]]
  3. = Function reference
  4. The {ml-features} include analysis functions that provide a wide variety of
  5. flexible ways to analyze data for anomalies.
  6. When you create {anomaly-jobs}, you specify one or more detectors, which define
  7. the type of analysis that needs to be done. If you are creating your job by
  8. using {ml} APIs, you specify the functions in detector configuration objects.
  9. If you are creating your job in {kib}, you specify the functions differently
  10. depending on whether you are creating single metric, multi-metric, or advanced
  11. jobs.
  12. //For a demonstration of creating jobs in {kib}, see <<ml-getting-started>>.
  13. Most functions detect anomalies in both low and high values. In statistical
  14. terminology, they apply a two-sided test. Some functions offer low and high
  15. variations (for example, `count`, `low_count`, and `high_count`). These variations
  16. apply one-sided tests, detecting anomalies only when the values are low or
  17. high, depending one which alternative is used.
  18. You can specify a `summary_count_field_name` with any function except `metric`.
  19. When you use `summary_count_field_name`, the {ml} features expect the input
  20. data to be pre-aggregated. The value of the `summary_count_field_name` field
  21. must contain the count of raw events that were summarized. In {kib}, use the
  22. **summary_count_field_name** in advanced {anomaly-jobs}. Analyzing aggregated
  23. input data provides a significant boost in performance. For more information, see
  24. <<ml-configuring-aggregation>>.
  25. If your data is sparse, there may be gaps in the data which means you might have
  26. empty buckets. You might want to treat these as anomalies or you might want these
  27. gaps to be ignored. Your decision depends on your use case and what is important
  28. to you. It also depends on which functions you use. The `sum` and `count`
  29. functions are strongly affected by empty buckets. For this reason, there are
  30. `non_null_sum` and `non_zero_count` functions, which are tolerant to sparse data.
  31. These functions effectively ignore empty buckets.
  32. * <<ml-count-functions>>
  33. * <<ml-geo-functions>>
  34. * <<ml-info-functions>>
  35. * <<ml-metric-functions>>
  36. * <<ml-rare-functions>>
  37. * <<ml-sum-functions>>
  38. * <<ml-time-functions>>