如何评估每个变量中每种类型之间的相关性?df level job0 good golfer1 bad footballer2 intermediate musician...预期输出是一个相关表或类似的东西: golfer footballer musician ...good bad intermediate 我试过:df['level']=df['level'].astype('category').cat.codesdf['job']=df['job'].astype('category').cat.codesdf.corr()
2 回答
qq_遁去的一_1
TA贡献1725条经验 获得超7个赞
您可以使用pd.crosstab
df1 = pd.crosstab(df.level, df.job)
df1
对于我的示例数据,您将得到输出
job footballer golfer musician
level
bad 1 3 3
good 3 3 2
intermediate 1 2 2
然后除以每行的总和
df1 / df1.sum()
输出
job footballer golfer musician
level
bad 0.2 0.375 0.428571
good 0.6 0.375 0.285714
intermediate 0.2 0.250 0.285714
慕丝7291255
TA贡献1859条经验 获得超6个赞
从预期的输出来看,您需要一个频率表。我想这可以做得更好,但一种方法是:
count_combos = pd.Series(zip(df.level, df.job)).value_counts() count_combos.index = pd.MultiIndex.from_tuples(count_combos.index) count_combos.unstack()
添加回答
举报
0/150
提交
取消