我有用于机器学习研究的数据,但我坚持使用这些字符串特征。我想将 ) 映射them(object到number(int64).例如,在 feature 中workclass,制作一个map(dict)as {'private':0,'State-gov':1, etc}。那么,如何在 DataFrame 中处理它,我是否应该编写一个 for 循环来查找特征中的 n 个不同类,并为每个对象特征进行 n 键映射?# There are the code about data readingimport pandas as pddf_trainFeatures = pd.read_csv('data/trainFeatures.csv')object_features = ['workclass','education','Marital-status', 'occupation','occupation','relationship','race','sex','native-country']# list data typefor i in df_trainFeatures: print(df_trainFeatures[i].dtype,i)//Printingint64 ageobject workclassint64 fnlwgtobject educationint64 education-numobject Marital-statusobject occupationobject relationshipobject raceobject sexint64 capital-gainint64 capital-lossint64 hours-per-weekobject native-country子数据框如下:
1 回答
眼眸繁星
TA贡献1873条经验 获得超9个赞
pandas.get_dummies(data)
它将分类变量转换为虚拟/指标变量。
或者在你的情况下
pandas.get_dummies(df_trainFeautres['workclass'])
添加回答
举报
0/150
提交
取消