1 回答
TA贡献1993条经验 获得超5个赞
首先Series用Series.str.splitand创建DataFrame.stack:
s = df['station'].str.split(expand=True).stack()
min然后删除以by boolean indexingwith结尾的值Series.str.endswith:
df1 = s[~s.str.endswith('min')].to_frame('data').rename_axis(('a','b'))
line然后为s 和为station具有过滤和 的行创建计数器GroupBy.cumcount:
df1['Line'] = (df1[df1['data'].str.endswith('line')]
.groupby(level=0)
.cumcount()
.add(1)
.astype(str))
df1['Line'] = df1['Line'].ffill()
df1['station'] = (df1[df1['data'].str.endswith('station')]
.groupby(['a','Line'])
.cumcount()
.add(1)
.astype(str))
使用连接创建系列,将缺失值替换df1['Line']为Series.fillna:
df1['station'] = (df1['Line'] + '-' + df1['station']).fillna(df1['Line'])
DataFrame.set_index通过重塑DataFrame.unstack:
df1 = df1.set_index('station', append=True)['data'].reset_index(level=1, drop=True).unstack()
Rename列名 - 之前不是为了避免错误排序:
df1 = df1.rename(columns = lambda x: 'Station' + x if '-' in x else 'Line' + x)
删除列名:
df1.columns.name = None
df1.index.name = None
print (df1)
Line1 Station1-1 Station1-2 Station1-3 Line2 Station2-1
0 A-line B-station C-station NaN NaN NaN
1 D-line E-station NaN NaN F-line G-station
2 G-line H-station I-station J-station NaN NaN
添加回答
举报