Data Splitter module

cherrypick.splits.splitter(df, target: str, test_size: float) tuple[tuple, tuple]

Split dataset into training and testing sets.

Parameters:
  • df (pandas.DataFrame) – Input dataset containing features and target variable.

  • target (str) – Column name of the target variable.

  • test_size (float) – Proportion of the dataset to include in the test split. Must be between 0.0 and 1.0.

Examples

>>> train, test = splitter(df= df, target=target_column, test_size=0.25)
Returns:

A tuple containing:

  • X_train - pandas.DataFrame(train)

  • y_train - pandas.Series(train)

  • X_test - pandas.DataFrame(test)

  • y_test - pandas.Series(test)

Return type:

tuple

Notes

  • Features are obtained by dropping the target column from df.

  • Internally uses train_test_split from scikit-learn.