2 回答
TA贡献1921条经验 获得超9个赞
当然,您可以使用任何sklearn操作并将其应用于groupby对象。
首先,一个方便的包装器:
import typing
import pandas as pd
class SklearnWrapper:
def __init__(self, transform: typing.Callable):
self.transform = transform
def __call__(self, df):
transformed = self.transform.fit_transform(df.values)
return pd.DataFrame(transformed, columns=df.columns, index=df.index)
这将应用sklearn您传递给它的任何变换到一个组。
最后简单的用法:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
data = load_iris()
df = pd.DataFrame(data["data"], columns=data["feature_names"])
df["class"] = data["target"]
df_rescaled = (
df.groupby("class")
.apply(SklearnWrapper(StandardScaler()))
.drop("class", axis="columns")
)
编辑:你几乎可以用SklearnWrapper. 这是为每个组转换和反转此操作的示例(例如,不要覆盖转换对象) - 每次看到新组时只需重新拟合对象(并将其添加到list)。
sklearn's为了更容易使用,我复制了一些功能(您可以通过将适当的传递string给_call_with_function内部方法来使用您想要的任何功能扩展它):
class SklearnWrapper:
def __init__(self, transformation: typing.Callable):
self.transformation = transformation
self._group_transforms = []
# Start with -1 and for each group up the pointer by one
self._pointer = -1
def _call_with_function(self, df: pd.DataFrame, function: str):
# If pointer >= len we are making a new apply, reset _pointer
if self._pointer >= len(self._group_transforms):
self._pointer = -1
self._pointer += 1
return pd.DataFrame(
getattr(self._group_transforms[self._pointer], function)(df.values),
columns=df.columns,
index=df.index,
)
def fit(self, df):
self._group_transforms.append(self.transformation.fit(df.values))
return self
def transform(self, df):
return self._call_with_function(df, "transform")
def fit_transform(self, df):
self.fit(df)
return self.transform(df)
def inverse_transform(self, df):
return self._call_with_function(df, "inverse_transform")
用法(组变换,逆运算并再次应用):
data = load_iris()
df = pd.DataFrame(data["data"], columns=data["feature_names"])
df["class"] = data["target"]
# Create scaler outside the class
scaler = SklearnWrapper(StandardScaler())
# Fit and transform data (holding state)
df_rescaled = df.groupby("class").apply(scaler.fit_transform)
# Inverse the operation
df_inverted = df_rescaled.groupby("class").apply(scaler.inverse_transform)
# Apply transformation once again
df_transformed = (
df_inverted.groupby("class")
.apply(scaler.transform)
.drop("class", axis="columns")
)
TA贡献1856条经验 获得超5个赞
我更新了@Szymon Maszke 代码:
class SklearnWrapper:
def __init__(self, transformation: typing.Callable):
self.transformation = transformation
self._group_transforms = []
# Start with -1 and for each group up the pointer by one
self._pointer = -1
def _call_with_function(self, df: pd.DataFrame, function: str):
# If pointer >= len we are making a new apply, reset _pointer
if self._pointer == len(self._group_transforms)-1 and function=="inverse_transform":
self._pointer = -1
self._pointer += 1
print(self._pointer)
return pd.DataFrame(
getattr(self._group_transforms[self._pointer], function)(df.values),
columns=df.columns,
index=df.index,
)
def fit(self, df):
scaler = copy(self.transformation)
self._group_transforms.append(scaler.fit(df.values))
return self
def transform(self, df):
return self._call_with_function(df, "transform")
def fit_transform(self, df):
self.fit(df)
return self.transform(df)
def inverse_transform(self, df):
return self._call_with_function(df, "inverse_transform")
StandardScaler()中没有正确存储_group_transforms,所以我创建了一个副本(使用副本库)并存储它(也许使用 OOP 有更好的方法来做到这一点)。
添加回答
举报