Anomaly Detection module

class cherrypick.anomaly.OutlierPruner(method: Literal['iqr', 'zscore', 'mod_zscore', 'isoforest', 'lof'], df: DataFrame, col: str)

Bases: object

OutlierPruner provides statistical and ML-based methods for detecting and removing outliers from a dataset.

Parameters:
  • method ({'iqr', 'zscore', 'mod_zscore', 'isoforest', 'lof'}) –

    Method used for outlier detection.

    • iqr : Interquartile Range method

    • zscore : Standard Z-score normalization

    • mod_zscore : Modified Z-score

    \[Z = 0.6745 * (X - median) / MAD\]

    Where:

    • median : median of the sample data

    • MAD : median absolute deviation

    • X : sample data points

    • isoforest : Isolation Forest (ensemble-based anomaly detection)

    • lof : Local Outlier Factor (density-based detection)

  • df (pandas.DataFrame) – Input dataset on which outlier pruning will be applied.

  • col (str) – Column name used for outlier detection in statistical methods.

Notes

  • Statistical methods require a specific column (col).

  • ML-based methods (Isolation Forest, Local Outlier Factor) operate on numerical features.

  • Modified Z-score is robust to extreme values as it uses the median instead of the mean.

Code

>>> pruner = OutlierPruner(df=df, method='isoforest', col=column)
>>> pruner.remove_outlier() ## removes the Outliers using Isolation forest
remove_outlier()

Calling this function will transform dataset with configuration provided to OutlierPruner.