1 回答
TA贡献1826条经验 获得超6个赞
IIUC,您主要是在寻找corrDataFrame 的方法。考虑这个例子:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.rand(30, 5)).add_prefix("feature_")
df["year"] = np.repeat(["2012", "2013", "2014"], 10)
print(df.head()) # first 5 rows. Note that there are 30 rows
feature_0 feature_1 feature_2 feature_3 feature_4 year
0 0.548814 0.715189 0.602763 0.544883 0.423655 2012
1 0.645894 0.437587 0.891773 0.963663 0.383442 2012
2 0.791725 0.528895 0.568045 0.925597 0.071036 2012
3 0.087129 0.020218 0.832620 0.778157 0.870012 2012
4 0.978618 0.799159 0.461479 0.780529 0.118274 2012
对您想要在 cormat 中的数字列进行子集化(在本例中,我用于.filter仅获取“feature_X”列)并使用 DataFrame.corr:
cormat = df.filter(like="feature").corr()
print(cormat)
feature_0 feature_1 feature_2 feature_3 feature_4
feature_0 1.000000 0.004582 0.412658 0.269969 0.151162
feature_1 0.004582 1.000000 -0.200808 0.140620 -0.138652
feature_2 0.412658 -0.200808 1.000000 -0.019439 0.284211
feature_3 0.269969 0.140620 -0.019439 1.000000 -0.063653
feature_4 0.151162 -0.138652 0.284211 -0.063653 1.000000
如果你想得到一些其他变量分组的相关矩阵,你可以.groupby先使用。
annual_cormat = df.groupby("year").corr()
print(annual_cormat)
feature_0 feature_1 feature_2 feature_3 feature_4
year
2012 feature_0 1.000000 0.359721 -0.266740 0.285998 -0.526528
feature_1 0.359721 1.000000 -0.330484 0.180620 -0.580236
feature_2 -0.266740 -0.330484 1.000000 0.262000 0.428895
feature_3 0.285998 0.180620 0.262000 1.000000 -0.144745
feature_4 -0.526528 -0.580236 0.428895 -0.144745 1.000000
2013 feature_0 1.000000 0.135499 0.704653 0.081326 0.453111
feature_1 0.135499 1.000000 -0.385677 0.732700 -0.065941
feature_2 0.704653 -0.385677 1.000000 -0.607016 0.143572
feature_3 0.081326 0.732700 -0.607016 1.000000 0.107971
feature_4 0.453111 -0.065941 0.143572 0.107971 1.000000
2014 feature_0 1.000000 -0.624004 0.056185 0.351376 -0.038286
feature_1 -0.624004 1.000000 0.103911 -0.284685 0.266124
feature_2 0.056185 0.103911 1.000000 0.249860 0.145773
feature_3 0.351376 -0.284685 0.249860 1.000000 -0.347361
feature_4 -0.038286 0.266124 0.145773 -0.347361 1.000000
添加回答
举报