为了账号安全,请及时绑定邮箱和手机立即绑定

Pandas - 组合偶数/奇数列并按小时聚合

Pandas - 组合偶数/奇数列并按小时聚合

不负相思意 2022-07-05 16:59:30
所以我有一个设备可以为我提供客流量(进出)。它根据绘制的线数生成一个 csv。csv 格式如下:timestamp, in, out上述情况适用于我只有 1 行的情况。但是,我每行有几个输入/输出,格式如下:timestamp, in, out, in, out, in, out, in, out输入示例:12/01/2020,16:02:00,0,0,0,2,0,0,0,012/01/2020,16:03:00,0,0,0,0,0,0,0,012/01/2020,16:04:00,0,0,0,0,0,0,0,012/01/2020,16:05:00,0,0,0,0,0,0,0,012/01/2020,17:06:00,0,0,0,0,0,0,0,012/01/2020,17:07:06,1,0,0,0,0,0,0,012/01/2020,17:08:00,0,0,0,0,0,0,0,012/01/2020,17:09:01,0,0,0,0,0,0,0,112/01/2020,18:10:00,0,0,0,0,0,0,0,012/01/2020,18:11:00,1,0,0,0,0,0,0,0in我希望计算每个小时的总和out。结果应采用以下格式:timestamp, ins, outs
查看完整描述

2 回答

?
Smart猫小萌

TA贡献1911条经验 获得超7个赞

阅读后my_csv_file.csv,您应该添加相应的输入/输出列,创建一个时间戳列并按小时级别的时间戳分组:


import pandas as pd


# Read file, no header!

df = pd.read_csv('my_csv_file.csv', header=None)

n_cols = len(df.columns)


# Sum all inputs and outputs

df['in'] = df.iloc[:,range(2,n_cols ,2)].sum(axis=1)

df['out'] = df.iloc[:,range(3,n_cols ,2)].sum(axis=1)

df = df.drop(columns=range(2,n_cols))


# Create a timestamp with the date and hour

df['timestamp'] = pd.to_datetime((df[0] + ' ' + df[1]))

df =df.drop(columns=[0,1])


# Groupby same hour and same date and sum

df_grouped = df.groupby([df.timestamp.dt.date, df.timestamp.dt.hour], group_keys=False).sum()


# Prettify the output

df_grouped.index.names = ['date', 'hour']

df_grouped = df_grouped.reset_index()


#         date  hour  in  out

#0  2020-12-01    16   0    2

#1  2020-12-01    17   1    1

#2  2020-12-01    18   1    0

注意:要重新创建我用于示例的数据,您可以使用这行代码(代替read_csv)


df = pd.DataFrame({0: {0: '12/01/2020', 1: '12/01/2020', 2: '12/01/2020', 3: '12/01/2020', 4: '12/01/2020', 5: '12/01/2020', 6: '12/01/2020', 7: '12/01/2020', 8: '12/01/2020', 9: '12/01/2020'}, 1: {0: '16:02:00', 1: '16:03:00', 2: '16:04:00', 3: '16:05:00', 4: '17:06:00', 5: '17:07:06', 6: '17:08:00', 7: '17:09:01', 8: '18:10:00', 9: '18:11:00'}, 2: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0, 8: 0, 9: 1}, 3: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 4: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 5: {0: 2, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 6: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 7: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 8: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 9: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 1, 8: 0, 9: 0}})



查看完整回答
反对 回复 2022-07-05
?
哆啦的时光机

TA贡献1779条经验 获得超6个赞

请参考以下代码。每行的解释已被注释。


df=pd.read_csv(path_here,sep=",",header=None)

df=df.rename(columns={0:"date",1:"timestamp"})



#Get all headers that are not timestamp and date 

headers=list(df.columns)

headers.remove("timestamp")

headers.remove("date")


# Unpivot data so each value is in single record

df=df.melt(id_vars=["date","timestamp"],value_vars=headers,var_name="type",value_name="value")


# Change data type for aggregation (even is in and odd is out)

df["type"]=df["type"].apply(lambda x: "in" if x%2==0 else "out")


# group by timestamp,type and find the sum of value

df=df.groupby(["date","timestamp","type"],as_index=False)["value"].sum()


# pivot table to get in and out of single time stamp in a record

df=df.pivot_table(index=["date","timestamp"],columns="type",values="value")

df=df.reset_index()


查看完整回答
反对 回复 2022-07-05
  • 2 回答
  • 0 关注
  • 106 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信