我正在尝试使用 scikitlearn 的 OneHotEncoder 对数据进行预处理。显然,我做错了什么。这是我的示例程序:from sklearn.preprocessing import LabelEncoder, OneHotEncoderfrom sklearn.compose import ColumnTransformercat = ['ok', 'ko', 'maybe', 'maybe']label_encoder = LabelEncoder()label_encoder.fit(cat)cat = label_encoder.transform(cat)# returns [2 0 1 1], which seams good.print(cat)ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')res = ct.fit_transform([cat])print(res)最后结果 :[[1.0 0 1 1]]预期结果:类似于:[ [ 1 0 0 ] [ 0 0 1 ] [ 0 1 0 ] [ 0 1 0 ]]有人能指出我错过了什么吗?
1 回答
慕码人2483693
TA贡献1860条经验 获得超9个赞
您可以考虑使用 numpy 和 MultiLabelBinarizer。
import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer
cat = np.array([['ok', 'ko', 'maybe', 'maybe']])
m = MultiLabelBinarizer()
print(m.fit_transform(cat.T))
如果你仍然想坚持你的解决方案。您只需要更新如下:
# because of it still a row, not a column
# res = ct.fit_transform([cat]) => remove this
# it should works
res = ct.fit_transform(np.array([cat]).T)
Out[2]:
array([[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.],
[0., 1., 0.]])
添加回答
举报
0/150
提交
取消