1 回答
TA贡献1852条经验 获得超7个赞
需要迭代每一行,但并不复杂,想法是用想要的数据创建一个 dict 并使用DataFrame.from_dict
data="""
Index Uniprot P1 P2 ID1 ID2
1 O00141 2r5tA_1 3hdmA_9 2r5tA 3hdmA
2 O00141 2r5tA_2 3hdmA_1 2r5tA 3hdmA
3 O00141 2r5tA_7 3hdmA_7 2r5tA 3hdmA
4 O15021 2w7rB_2 2w7rA_2 2w7rB 2w7rA
"""
#create the sample dataframe
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
#Uniprot have to be sorted
df = df.sort_values(by= 'Uniprot')
dico = {}
for i, row in df.iterrows():
key1 = row.Uniprot + 'C1';key2 = row.Uniprot + 'C2'
if key1 not in dico:
dico[key1] = [row.Uniprot, row.ID1, row.P1]
dico[key2] = [row.Uniprot, row.ID2, row.P2]
else:
dico[key1] = dico[key1] + [row.P1]
dico[key2] = dico[key2] + [row.P2]
maxlen = max ([len(l) for l in dico.values()])
for k in dico.keys():
d = len(dico[k])
dico[k] = dico[k] + [''] * (maxlen - len(dico[k]))
df_result = pd.DataFrame.from_dict(dico).T.reset_index(drop=True)
print(df_result)
输出:
0 1 2 3 4
0 O00141 2r5tA 2r5tA_1 2r5tA_2 2r5tA_7
1 O00141 3hdmA 3hdmA_9 3hdmA_1 3hdmA_7
2 O15021 2w7rB 2w7rB_2
3 O15021 2w7rA 2w7rA_2
添加回答
举报