2 回答

TA贡献1828条经验 获得超13个赞
下面是可能的解决方案之一。也许不是超级优雅,但有效。
grouped = df.groupby('Patient')
col = ['Patient']
data = []
for p, g in grouped:
d = {'Patient': p}
g.reset_index(inplace=True)
for i, row in g.iterrows():
for c in range(2, len(g.columns)):
col_name = g.columns[c] + '_' + str(i + 1)
d[col_name] = row[g.columns[c]]
if col_name not in col:
col.append(col_name)
data.append(d)
df = pd.DataFrame(data, columns=col)

TA贡献1852条经验 获得超7个赞
一种使用melt,groupby和的方法unstack:
数据
原来的
In []: df
Out[]:
Patient Test panel gene alteration
0 1 A 54 APC E1345*
1 1 B 54 TP53 Y205H
2 1 C 54 APC V2278V
3 2 A 54 KRAS G12D
4 2 B 54 PTEN L25L
5 3 A 54 KRAS G13D
6 3 C 54 TP53 C141W
7 3 C 54 APC R876*
8 3 A 54 ERBB2 L663P
整齐的数据
pd.DataFrame.melt 允许整理这张表:
In []: tidy = df.melt(id_vars=['Patient', 'Test'], value_vars=['panel', 'gene', 'alteration'])
In []: tidy
Out[]:
Patient Test variable value
0 1 A panel 54
1 1 B panel 54
2 1 C panel 54
3 2 A panel 54
4 2 B panel 54
5 3 A panel 54
6 3 C panel 54
7 3 C panel 54
8 3 A panel 54
9 1 A gene APC
10 1 B gene TP53
11 1 C gene APC
12 2 A gene KRAS
13 2 B gene PTEN
14 3 A gene KRAS
15 3 C gene TP53
16 3 C gene APC
17 3 A gene ERBB2
18 1 A alteration E1345*
19 1 B alteration Y205H
20 1 C alteration V2278V
21 2 A alteration G12D
22 2 B alteration L25L
23 3 A alteration G13D
24 3 C alteration C141W
25 3 C alteration R876*
26 3 A alteration L663P
重塑
使用 goupby 和 unstack
In []: (tidy.groupby(['Patient', 'Test', 'variable']) # group by three levels of interest
...: .first() # access values as a dataframe
...: .unstack(level=[1,2])) # pivot on levels [1, 2] of multiindex
Out[]:
value
Test A B C
variable alteration gene panel alteration gene panel alteration gene panel
Patient
1 E1345* APC 54 Y205H TP53 54 V2278V APC 54
2 G12D KRAS 54 L25L PTEN 54 NaN NaN NaN
3 G13D KRAS 54 NaN NaN NaN C141W TP53 54
使用交叉表
这给出了等效的结果:
In []: pd.crosstab(tidy.Patient, # index
[tidy.Test, tidy.variable], # columns
values=tidy.value,
aggfunc='first') # get first value
Out[]:
Test A B C
variable alteration gene panel alteration gene panel alteration gene panel
Patient
1 E1345* APC 54 Y205H TP53 54 V2278V APC 54
2 G12D KRAS 54 L25L PTEN 54 NaN NaN NaN
3 G13D KRAS 54 NaN NaN NaN C141W TP53 54
添加回答
举报