3 回答
TA贡献1825条经验 获得超4个赞
将索引设置为ID
并使用DataFrame.stack
来重塑框架,然后使用Series.factorize
创建一个标识不同值的数字数组,从而创建一个系列s
,然后使用Series.groupby
ons
和agg使用first
(因为我们必须首先优先考虑列的顺序End1
)End2
:
s = pd.Series(df.set_index('ID').stack().factorize()[0] + 1)
df['Order'] = s.groupby(s.index // 2).first()
编辑:如果我们需要考虑每组的不同值:
s = pd.Series(np.hstack([g.factorize()[0] + 1 for _, g in
df.set_index('ID').stack().groupby(level=0)]))
df['Order'] = s.groupby(s.index // 2).first()
结果:
ID End1 End2 Order
0 1 A B 1
1 1 A B 1
2 1 B A 2
3 1 A B 1
4 1 C B 3
5 1 C D 3
6 1 D C 4
7 1 C D 3
8 1 D C 4
9 2 A B 1
10 2 A B 1
11 2 A C 1
12 2 A C 1
13 2 C A 3
14 2 C A 3
15 2 D C 4
16 2 C D 3
17 2 D C 4
TA贡献1936条经验 获得超6个赞
import pandas as pd
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 2, 10: 2, 11: 2, 12: 2, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2},
'End1': {0: 'A', 1: 'A', 2: 'B', 3: 'A', 4: 'C', 5: 'C', 6: 'D', 7: 'C', 8: 'D', 9: 'A', 10: 'A', 11: 'A', 12: 'A', 13: 'C', 14: 'C', 15: 'D', 16: 'C', 17: 'D'},
'End2': {0: 'B', 1: 'B', 2: 'A', 3: 'B', 4: 'B', 5: 'D', 6: 'C', 7: 'D', 8: 'C', 9: 'B', 10: 'B', 11: 'C', 12: 'C', 13: 'A', 14: 'A', 15: 'C', 16: 'D', 17: 'C'}})
pandas.unique将给出出现的顺序。
sequence查找该列的每个值的索引End1。分组依据'ID'因此顺序是唯一的'ID'。堆叠每个组/数据帧可以使列变平['End1','End2']。
df = df.set_index('ID')
gb = df.groupby('ID')
for k,g in gb:
sequence = pd.unique(g.stack())
order = (g.End1.to_numpy() == sequence[:,None]).argmax(0) + 1
df.loc[k,'Order'] = order
df.Order = df.Order.astype(int)
def f(g):
sequence = pd.unique(g.stack())
order = (g.End1.to_numpy() == sequence[:,None]).argmax(0) + 1
return order
gb = df.groupby('ID')
orders = gb.apply(f)
df.loc[orders.index,'foo'] = np.concatenate(orders.values)
TA贡献1851条经验 获得超5个赞
一种可能的方法是连接 End1+End2 中的字符串值,并将结果用作字典的键。该算法看起来像:
counter = 1
new_column = []
my_dict = dict()
for row in data:
key_to_check = row[End1]+row[End2]
if key_to_check in my_dict:
new_column.append(my_dict[key_to_check])
else:
my_dict[key_to_check] = counter
new_column.append(my_dict[key_to_check])
counter += 1
## append new_column to the data
添加回答
举报