Anomaly Detection module
- class cherrypick.anomaly.OutlierPruner(method: Literal['iqr', 'zscore', 'mod_zscore', 'isoforest', 'lof'], df: DataFrame, col: str)
Bases:
objectOutlierPruner provides statistical and ML-based methods for detecting and removing outliers from a dataset.
- Parameters:
method ({'iqr', 'zscore', 'mod_zscore', 'isoforest', 'lof'}) –
Method used for outlier detection.
iqr: Interquartile Range methodzscore: Standard Z-score normalizationmod_zscore: Modified Z-score
\[Z = 0.6745 * (X - median) / MAD\]Where:
median : median of the sample data
MAD : median absolute deviation
X : sample data points
isoforest: Isolation Forest (ensemble-based anomaly detection)lof: Local Outlier Factor (density-based detection)
df (pandas.DataFrame) – Input dataset on which outlier pruning will be applied.
col (str) – Column name used for outlier detection in statistical methods.
Notes
Statistical methods require a specific column (
col).ML-based methods (Isolation Forest, Local Outlier Factor) operate on numerical features.
Modified Z-score is robust to extreme values as it uses the median instead of the mean.
Code
>>> pruner = OutlierPruner(df=df, method='isoforest', col=column)
>>> pruner.remove_outlier() ## removes the Outliers using Isolation forest
- remove_outlier()
Calling this function will transform dataset with configuration provided to OutlierPruner.