Data Splitter module
- cherrypick.splits.splitter(df, target: str, test_size: float) tuple[tuple, tuple]
Split dataset into training and testing sets.
- Parameters:
df (pandas.DataFrame) – Input dataset containing features and target variable.
target (str) – Column name of the target variable.
test_size (float) – Proportion of the dataset to include in the test split. Must be between 0.0 and 1.0.
Examples
>>> train, test = splitter(df= df, target=target_column, test_size=0.25)
- Returns:
A tuple containing:
X_train - pandas.DataFrame(train)
y_train - pandas.Series(train)
X_test - pandas.DataFrame(test)
y_test - pandas.Series(test)
- Return type:
tuple
Notes
Features are obtained by dropping the target column from
df.Internally uses
train_test_splitfrom scikit-learn.