为了账号安全,请及时绑定邮箱和手机立即绑定

如何使用 Pandas 合并具有重复时间戳的元素

如何使用 Pandas 合并具有重复时间戳的元素

慕田峪4524236 2023-07-18 16:42:01
我有以下代表加密交易的数据:1599177600000,381.52,1.425,s1599177600000,381.49,0.828,s1599177600000,381.48,0.747,s1599177600212,381.53,3.225,s1599177600560,381.53,0.226,s1599177600560,381.45,0.637,s1599177600560,381.44,11.431,s1599177600560,381.38,2.153,s1599177600560,381.37,0.569,s1599177600560,381.35,150,s1599177600560,381.33,1.056,s1599177600560,381.32,8.581,s1599177600560,381.31,16.947,s1599177600560,381.29,15.877,s1599177600590,381.45,2.586,s1599177600652,381.54,0.03,b1599177600826,381.39,0.5,s1599177601166,381.39,0.139,s1599177601304,381.39,1.445,s1599177601306,381.35,2.555,s1599177601624,381.3,1.552,s1599177601706,381.29,2,s1599177601868,381.31,0.262,s1599177602108,381.29,0.092,s1599177602242,381.3,0.05,b1599177602296,381.31,2.228,b1599177602312,381.32,0.05,b1599177602386,381.33,0.639,b1599177602388,381.29,7.901,s1599177602388,381.25,12.099,s这些列是:unix 时间戳(毫秒)、价格、数量和代表交易是买入还是卖出事件的字母(b 或 s)。使用 Pandas,如何将具有相同时间戳的行合并在一起,同时添加额外的列?合并规则为:new quantity = sum quantity for all rowsnew price = sum (quantity * price) for all rows / new quantity例外的是:if there is a duplicate timestamp with different letters, the one with the letter 'b' has to be pushed ahead by 1ms额外的列是:if a row is a result of a merge, the extra columns needs to have a bool True in it然后使用该时间戳作为索引?我不确定这是否可以一次性完成,但我也不太熟悉 Pandas 的语法来弄清楚如何做到这一点,所以任何带有解释的答案都会很棒。
查看完整描述

2 回答

?
杨魅力

TA贡献1811条经验 获得超6个赞

这是一种方法:


w_avg = df.groupby("time").apply(lambda d: sum(d["price"]*d["volume"])/d["volume"].sum())

s = df.loc[df["time"].duplicated(keep=False),"time"].unique()

df = df.groupby("time", as_index=False).agg({"volume": "sum"})


print (df.assign(w_avg=df["time"].map(w_avg), boolean=df["time"].isin(s)))


             time   volume       w_avg  boolean

0   1599177600000    3.000  381.501760     True

1   1599177600212    3.225  381.530000    False

2   1599177600560  207.477  381.346627     True

3   1599177600590    2.586  381.450000    False

4   1599177600652    0.030  381.540000    False

5   1599177600826    0.500  381.390000    False

6   1599177601166    0.139  381.390000    False

7   1599177601304    1.445  381.390000    False

8   1599177601306    2.555  381.350000    False

9   1599177601624    1.552  381.300000    False

10  1599177601706    2.000  381.290000    False

11  1599177601868    0.262  381.310000    False

12  1599177602108    0.092  381.290000    False

13  1599177602242    0.050  381.300000    False

14  1599177602296    2.228  381.310000    False

15  1599177602312    0.050  381.320000    False

16  1599177602386    0.639  381.330000    False

17  1599177602388   20.000  381.265802     True


查看完整回答
反对 回复 2023-07-18
?
30秒到达战场

TA贡献1828条经验 获得超6个赞

IIUC,这是一种包含业务逻辑的方法(使用加权平均值;如果同时有买入和卖出,则向前移动一毫秒的时间戳):


# create data frame

df = pd.read_csv(StringIO(data), sep=',')

#df['timestamp'] -= df['timestamp'].min()


# find buy, sell timestamps

buy_timestamps = df.loc[ df['buy_sell'] == 'b', 'timestamp']

sell_timestamps = df.loc[ df['buy_sell'] == 's', 'timestamp']

bs_timestamps = set(buy_timestamps) & set(sell_timestamps)


# adjust timestamps

df.loc[ df['timestamp'].isin(bs_timestamps), 'timestamp' ] += 1


# helper columns

df['price_quantity'] = df['price'] * df['quantity']

df['multiple_trades'] = df.groupby('timestamp')['buy_sell'].transform('count')

现在执行聚合计算:


g = df.groupby(['timestamp', 'buy_sell'])


t = (pd.concat([

    (g['price_quantity'].sum() / g['quantity'].sum()).rename('price'),

    g['quantity'].sum().rename('quantity'),

    g['timestamp'].count().apply(lambda x: True if x > 1 else False).rename('trade_count')], 

    axis=1)

     .reset_index()

     .filter(['timestamp', 'price', 'buy_sell', 'trade_count'])

    )

print(t)

结果是:


        timestamp       price buy_sell  trade_count

0   1599177600000  381.501760        s         True

1   1599177600212  381.530000        s        False

2   1599177600560  381.346627        s         True

3   1599177600590  381.450000        s        False

4   1599177600652  381.540000        b        False

5   1599177600826  381.390000        s        False

6   1599177601166  381.390000        s        False

7   1599177601304  381.390000        s        False

8   1599177601306  381.350000        s        False

9   1599177601624  381.300000        s        False

10  1599177601706  381.290000        s        False

11  1599177601868  381.310000        s        False

12  1599177602108  381.290000        s        False

13  1599177602242  381.300000        b        False

14  1599177602296  381.310000        b        False

15  1599177602312  381.320000        b        False

16  1599177602386  381.330000        b        False

17  1599177602388  381.265802        s         True


查看完整回答
反对 回复 2023-07-18
  • 2 回答
  • 0 关注
  • 94 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信