1 回答
TA贡献1883条经验 获得超3个赞
从文档中
'column'
将列选择器指定为(作为简单字符串)和['column']
(作为包含一个元素的列表)之间的区别在于传递给转换器的数组的形状。在第一种情况下,将传递一个一维数组,而在第二种情况下,将传递一个具有一列的二维数组,即列向量。
所有列必须使用相同类型的列选择器传递。
在本例中,为 a
list
,因为需要list
保留一些未转换的列。
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn_pandas import DataFrameMapper
# load data
df = pd.read_csv('https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv')
# create houseAge_income
df['houseAge_income'] = df.housing_median_age.mul(df.median_income)
# configure mapper with all columns passed as lists
mapper = DataFrameMapper([(['houseAge_income'], PolynomialFeatures(2)),
(['median_income'], PolynomialFeatures(2)),
(['latitude', 'housing_median_age', 'total_rooms', 'population', 'median_house_value', 'ocean_proximity'], None)])
# fit
poly_feature = mapper.fit_transform(df)
# display(pd.DataFrame(poly_feature).head())
0 1 2 3 4 5 6 7 8 9 10 11
0 1 341.33 1.1651e+05 1 8.3252 69.309 37.88 41 880 322 4.526e+05 NEAR BAY
1 1 174.33 30391 1 8.3014 68.913 37.86 21 7099 2401 3.585e+05 NEAR BAY
2 1 377.38 1.4242e+05 1 7.2574 52.67 37.85 52 1467 496 3.521e+05 NEAR BAY
3 1 293.44 86108 1 5.6431 31.845 37.85 52 1274 558 3.413e+05 NEAR BAY
4 1 200 40001 1 3.8462 14.793 37.85 52 1627 565 3.422e+05 NEAR BAY
添加回答
举报