3 回答
TA贡献1826条经验 获得超6个赞
import pandas as pd
df = pd.DataFrame({
"ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],
'Sender': [28, 'delete', 'flag_source', 56, 28, 312, 'delete', 'flag_source', 78, 102, 26, 101, 96],
'Receiver': [129, 28, 'delete', 172, 56, 28, 61, 'delete', 12, 78, 98, 26, 101],
'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],
'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],
'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],
'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]
})
flaggedData = (df[df["Sender"] == "flag_source"])
for i,row in flaggedData.iterrows(): # Row variable contains row having sender as flag_source
deleteRow = df[df.index == i-1].values[0] # delete variable contains row having sender as delete
combined = [row[0], # ID
row[1], # Sender
deleteRow[2], # Receiver
deleteRow[3], # Date
row[4], # Sender_type
deleteRow[5], # Receiver_type
deleteRow[6]] # Price
df.loc[i-1] = combined # replace with new values
df = df.drop(index=i) # drop old values
df = df.reset_index() # resent index for better access on future.
print(df.loc[1])
我假设每个“删除”行都位于“flag_source”行上方。如果你还是不明白,请阅读评论,评论你的疑问。
TA贡献1845条经验 获得超8个赞
如果delete/flag_source始终位于同一日期,并且该日期+ ID上没有其他行,则可以对ID和日期使用groupby聚合函数以避免使用长循环。如果您的数据顺序不正确,您始终可以sort_values提前进行更改。
cols = df.columns
new_df = df.groupby(['ID', 'Date']).aggregate({
'Sender': 'last',
'Receiver': 'first',
'Sender_type': 'last',
'Receiver_type': 'first',
'Price': 'first'
}).reset_index()
# Reorder as per original data
new_df[cols].sort_values(['ID', 'Date'], ascending=[1, 0])
TA贡献1773条经验 获得超3个赞
看来您只需要删除每对的第二行并替换其余行中的一些值。
df = df[dd.Receiver == 'delete']
df.Sender = df.Sender.str.replace('delete', 'flag_source')
df.Sender_type = df.Sender_type.str.replace('temp', 'house')
添加回答
举报