3 回答
TA贡献1868条经验 获得超4个赞
else你的函数中缺少一个:
def cluster_name(df):
if df['cluster'] == 1:
value = 'A'
elif df['cluster'] == 2:
value = 'B'
elif df['cluster'] == 3:
value = 'C'
elif df['cluster'] == 4:
value = 'D'
elif df['cluster'] == 5:
value = 'E'
elif df['cluster'] == 6:
value = 'F'
elif df['cluster'] == 7:
value = 'G'
else:
value = ...
return value
否则,value如果不在值 {1, 2, ..., 7} 之间,则不会设置df['cluster'],并且会出现异常。
TA贡献1111条经验 获得超0个赞
手动创建
if-else
函数被高估了,并且可能会错过某个条件。由于您将字母指定为
'cluster_name'
,因此请使用string.ascii_uppercase
来获取list
所有字母中的 a ,并将zip
它们分配给中的唯一值'cluster'
dict
从压缩值创建一个并.map
创建'cluster_name'
列。
此实现使用列中的唯一值来创建映射,因此不会出现
"local variable 'value' referenced before assignment"
.在您出现错误的情况下,这是因为
return value
当列中存在不符合您的if-else
条件的值时执行,这意味着value
未在函数中分配。
import pandas as pd
import string
# test dataframe
df = pd.DataFrame({'cluster': range(1, 11)})
# unique values from the cluster column
clusters = sorted(df.cluster.unique())
# create a dict to map
cluster_map = dict(zip(clusters, string.ascii_uppercase))
# create the cluster_name column
df['cluster_name'] = df.cluster.map(cluster_map)
# df
cluster cluster_name
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
5 6 F
6 7 G
7 8 H
8 9 I
9 10 J
TA贡献1796条经验 获得超10个赞
似乎您的问题已在评论中得到解答,因此我将提出一种更面向熊猫的方法来解决您的问题。使用apply(axis=1)DataFrame 速度非常慢,而且几乎没有必要(与迭代数据帧中的行相同),因此更好的方法是使用矢量化方法。最简单的方法是在字典中定义 cluster -> cluster_name 映射,并使用以下方法map:
df = pd.DataFrame(
{"cluster": [1,2,3,4,5,6,7]}
)
# repeat this dataframe 10000 times
df = pd.concat([df] * 10000)
应用方法:
def mapping_func(row):
if row['cluster'] == 1:
value = 'A'
elif row['cluster'] == 2:
value = 'B'
elif row['cluster'] == 3:
value = 'C'
elif row['cluster'] == 4:
value = 'D'
elif row['cluster'] == 5:
value = 'E'
elif row['cluster'] == 6:
value = 'F'
elif row['cluster'] == 7:
value = 'G'
else:
# This is a "catch-all" in case none of the values in the column are 1-7
value = "Z"
return value
%timeit df.apply(mapping_func, axis=1)
# 1.32 s ± 91.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
.map方法
mapping_dict = {
1: "A",
2: "B",
3: "C",
4: "D",
5: "E",
6: "F",
7: "G"
}
# the `fillna` is our "catch-all" statement.
# essentially if `map` encounters a value not in the dictionary
# it will place a NaN there. So I fill those NaNs with "Z" to
# be consistent with the above example
%timeit df["cluster"].map(mapping_dict).fillna("Z")
# 4.87 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我们可以看到mapwith 字典方法比 while 方法要快得多,apply而且还避免了长if/elif语句链。
- 3 回答
- 0 关注
- 140 浏览
添加回答
举报