6 回答
TA贡献2003条经验 获得超2个赞
你可以用点来实现它:
df = pd.DataFrame(
{
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'F': [1,0,1]
}
)
df['new_column'] = df.dot(df.columns).str.join(",")
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
更新:对于包含多个字母的列,@BEN_YO 提出了一个非常好的解决方案:
df.dot(df.columns+',').str[:-1]
TA贡献1836条经验 获得超3个赞
如果列名更像一个字符,请使用DataFrame.dot
向列名添加分隔符并最后从右侧删除Series.str.rstrip
:
df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")
print (df)
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
df = pd.DataFrame({
'col1': [0,0,1],
'col2': [1,0,0],
'col3': [0,0,0,],
'col4': [1,0,1],
'col5': [1,0,1]})
df['new_column'] = df.dot(df.columns + ',').str.rstrip(",")
#alternative
#df['new_column'] = (df @ (df.columns + ',')).str.rstrip(",")
print (df)
col1 col2 col3 col4 col5 new_column
0 0 1 0 1 1 col2,col4,col5
1 0 0 0 0 0
2 1 0 0 1 1 col1,col4,col5
替代解决方案:
cols = df.columns.to_numpy()
df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]
性能:
sammywemmy无法使用第一个解决方案,因为有 50 列,所以有些列有 2 个或更多字母。也是footfalcon创建列表的解决方案,所以也不要测试。
df = pd.DataFrame({
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'E': [1,0,1]})
[30000 rows x 50 columns]
df = pd.concat([df] * 10, ignore_index=True, axis=1)
df = pd.concat([df] * 10000, ignore_index=True).add_prefix('col')
最快的是列表理解解决方案,但样本数据只有 10 毫秒,然后是真正快速的dot解决方案,最后是apply解决方案:
In [70]: %%timeit
...: cols = df.columns.to_numpy()
...: df["new_column"] = [', '.join(cols[x]) for x in df.to_numpy().astype(bool)]
...:
128 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#for testing are values converted to boolean (else test fail)
In [72]: %timeit df['new_column'] = df.astype(bool).dot(df.columns + ',').str.rstrip(",")
138 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Dishin H Goyani
In [73]: %timeit df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
3.98 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Akshay Sehgal
In [75]: %timeit df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
11 s ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Rajith Thennakoon
In [78]: %%timeit
...: df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
...: df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))
...:
...:
25.9 s ± 709 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
TA贡献1784条经验 获得超7个赞
不确定这是否是最佳解决方案,但它可以完成工作:
import pandas as pd
df = pd.DataFrame(
{
'A': [0,0,1],
'B': [1,0,0],
'C': [0,0,0,],
'D': [1,0,1],
'F': [1,0,1]
}
)
df1 = df.T
new_cells = []
for c in df1.columns:
new_cells.append(df1[df1[c] == 1].index.tolist())
df['New_column'] = new_cells
输出:
A B C D F New_column
0 0 1 0 1 1 [B, D, F]
1 0 0 0 0 0 []
2 1 0 0 1 1 [A, D, F]
TA贡献1865条经验 获得超7个赞
如果你有 python >= 3.5,你可以使用 matmul 运算符来做一个点积——
df['new_column'] = (df @ df.columns).str.join(', ')
A B C D E new_column
0 0 1 0 1 1 B, D, E
1 0 0 0 0 0
2 1 0 0 1 1 A, D, E
或者您可以使用applyaxis=1 解决此问题,如下所示 -
df['new_column'] = df.apply(lambda x: ', '.join(list(x[x!=0].index)), axis=1)
A B C D E new_column
0 0 1 0 1 1 B, D, E
1 0 0 0 0 0
2 1 0 0 1 1 A, D, E
TA贡献1831条经验 获得超10个赞
您可以使用applywith lambdafunction onaxis=1
df["New_column"] = df.apply(lambda x: ','.join(df.columns[x==1]), axis=1)
df
A B C D F New_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
TA贡献1820条经验 获得超10个赞
试试这个方法。
df = pd.DataFrame({"A":[0,0,1],"B":[1,0,0],"C":[0,0,0],"D":[1,0,1],"F":[1,0,1]})
df["new_column"] = df.apply(lambda x: (pd.DataFrame(x[x==1]).index.values),axis=1)
df["new_column"] = df["new_column"].apply(lambda x: ','.join(map(str, x)))
输出
A B C D F new_column
0 0 1 0 1 1 B,D,F
1 0 0 0 0 0
2 1 0 0 1 1 A,D,F
添加回答
举报