2 回答
TA贡献1853条经验 获得超18个赞
让我们用cumsum它来识别块和分组:
blocks = df['C'].notna().cumsum()
agg_dict = {col:' '.join if col=='B' else 'first' for col in df}
df.groupby(blocks).agg(agg_dict).reset_index(drop=True)
输出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10.0 20.0
1 Car Fastest Can be more fast Year/2020/AYD 20.0 30.0
TA贡献1829条经验 获得超6个赞
一个有点复杂的解决方案,仅使用numpy
,但对于大数据来说工作速度非常快:
import pandas as pd, numpy as np, math
df = pd.DataFrame([
['Train', 'Superfast', 10, 20],
[np.nan, 'Convernient', np.nan, np.nan],
[np.nan, 'Newest model', np.nan, np.nan],
[np.nan, 'Year 2002/099', np.nan, np.nan],
['Car', 'Fastest', 20, 30],
[np.nan, 'Can be more fast', np.nan, np.nan],
[np.nan, 'Year/2020/AYD', np.nan, np.nan],
], columns = ['A', 'B', 'C', 'D'])
a = df.values
i = np.append(np.flatnonzero(~(a[:, 0] != a[:, 0])), a.shape[0])
b = a[i[:-1], :]
diffs = np.diff(i)
maxs = np.amax(diffs)
c = np.zeros([i.shape[0], maxs], dtype = np.str_)
begs, ends = i[:-1], i[1:]
for j in range(1, maxs):
chosen = begs + j < ends
b[chosen, 1] += ' ' + a[begs[chosen] + j, 1]
df = pd.DataFrame(b, columns = df.columns.values.tolist())
print(df)
代码输出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10 20
1 Car Fastest Can be more fast Year/2020/AYD 20 30
添加回答
举报