聚合国内IT技术精华文章,分享IT技术精华,帮助IT从业人士成长

异常检测包PyCuliarity的使用

2020-03-31 20:00 浏览: 3451630 次 我要评论(0 条) 字号:

时间序列异常检测算法梳理的文章中介绍了Twitter的异常检测工具AnomalyDetection。另外也讲到了Python版本PyCuliarity的简单使用。由于AnomalyDetection和PyCuliarity都没有相关的文档,这里将代码中抠出来的注释拿出来做下学习。

PyCuliarity有两个顶级函数,一个用于timeseries数据,一个用于向量数据处理,detect_ts和 detect_vec:

  • detect_ts:输入的DataFrame需要两列数据,其中一列为时间,另一列为该时间点对应的值
  • detect_vec:可以不包含是时间列,时间索引按照DataFrame长度自动生成。

两者的使用方法基本一致,这里主要介绍detect_ts的使用,源码注释如下:

def detect_ts(df, max_anoms=0.10, direction='pos',
              alpha=0.05, only_last=None, threshold=None,
              e_value=False, longterm=False,
              piecewise_median_period_weeks=2, plot=False,
              y_log=False, xlabel = '', ylabel = 'count',
              title=None, verbose=False):
    """
    Anomaly Detection Using Seasonal Hybrid ESD Test
    A technique for detecting anomalies in seasonal univariate time series where the input is a
    series of <timestamp, value> pairs.

    Args:

    x: Time series as a two column data frame where the first column consists of the
    timestamps and the second column consists of the observations.

    max_anoms: Maximum number of anomalies that S-H-ESD will detect as a percentage of the
    data.

    direction: Directionality of the anomalies to be detected. Options are: ('pos' | 'neg' | 'both').

    alpha: The level of statistical significance with which to accept or reject anomalies.

    only_last: Find and report anomalies only within the last day or hr in the time series. Options: (None | 'day' | 'hr')

    threshold: Only report positive going anoms above the threshold specified. Options are: (None | 'med_max' | 'p95' | 'p99')

    e_value: Add an additional column to the anoms output containing the expected value.

    longterm: Increase anom detection efficacy for time series that are greater than a month.

    See Details below.
    piecewise_median_period_weeks: The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014). Defaults to 2.

    plot: (Currently unsupported) A flag indicating if a plot with both the time series and the estimated anoms,
    indicated by circles, should also be returned.

    y_log: Apply log scaling to the y-axis. This helps with viewing plots that have extremely
    large positive anomalies relative to the rest of the data.

    xlabel: X-axis label to be added to the output plot.
    ylabel: Y-axis label to be added to the output plot.

    Details


    'longterm' This option should be set when the input time series is longer than a month.
    The option enables the approach described in Vallis, Hochenbaum, and Kejariwal (2014).
    'threshold' Filter all negative anomalies and those anomalies whose magnitude is smaller
    than one of the specified thresholds which include: the median
    of the daily max values (med_max), the 95th percentile of the daily max values (p95), and the
    99th percentile of the daily max values (p99).
    'title' Title for the output plot.
    'verbose' Enable debug messages

    The returned value is a dictionary with the following components:
      anoms: Data frame containing timestamps, values, and optionally expected values.
      plot: A graphical object if plotting was requested by the user. The plot contains
      the estimated anomalies annotated on the input time series
    """

参数详解:

  • df:包含时间和值的DataFrame
  • max_anoms=0.10:发现异常数据的量(占总体的百分之多少)
  • direction=’pos’:’pos’是发现数据突增点,’neg’是发现数据突降点,’both’是包含突增与突降
  • alpha=0.05:接受或拒绝显著性水平,即p-value
  • only_last=None:仅再时间序列最后1天(’day’)或1小时(’hr’)寻找异常
  • threshold=None:仅报告高于指定阈值的正向异常。选项有:
    • med_max:每日最大值的中位数
    • p95:每日最大值的95%
    • p99:每日最大值的99%
  • e_value=False:返回数据中新增一列期望值(加了后都是NaN,不清楚原因)
  • longterm=False:当时间序列超过一个月时,设置此值,
  • piecewise_median_period_weeks=2:当设置longterm后需要设置该值,设置滑动窗口的大小,注意这里需要>=2
  • plot=False:输出图像,已经不支持
  • y_log=False:对Y轴值取对数
  • xlabel = ”:添加输出到图形的X轴标签
  • ylabel = ‘count’:添加输出到图形的Y轴标签
  • title=None:输出图像的标签
  • verbose=False:是否输出debug信息


网友评论已有0条评论, 我也要评论

发表评论

*

* (保密)

Ctrl+Enter 快捷回复