已解决430363个问题，去搜搜看，总会有你想问的

Python中两个变量（分类和连续）之间的相关性

首页猿问 Python中两个变量（分类和连续...

Python

白猪掌柜的 2022-08-25 15:31:23

我正在为一个简单的问题而苦苦挣扎，我需要检查客户位置是否对缺陷数量有影响。数据集就是这样。位置有 50 个值，本质上是分类的，缺陷是连续的。location defects a 20 b 30 c 40 d 50 e 60 f 70 g 80

查看完整描述

2 回答

慕容3067478

TA贡献1773条经验获得超3个赞

非常简单。您可以使用将分类转换为数值。LabelEncoder

例：

from sklearn.preprocessing import LabelEncoder

import numpy as np

#data

location = np.array(['a','b','a'])

defects = np.array([1,2,1])

# the encoder

lb_make = LabelEncoder()

converted= lb_make.fit_transform(location) # convert to numerical

print(converted)

array([0, 1, 0])

np.corrcoef(defects,converted)[0][1]

0.9999999999999998

反对回复 2022-08-25

万千封印

TA贡献1891条经验获得超3个赞

所以你基本上想计算（ratio_for_location）=（number_of_defects_for_location）/（total_number_of_whatever_for_location）并检查异常值/找到函数defect_ratio（位置）？

反对回复 2022-08-25

关注

0/150

提交

取消