2 回答
TA贡献1853条经验 获得超9个赞
您可以使用 groupby 来计算每周的总数量。然后,您可以将该总体积加入原始数据帧并以矢量化方式计算比率。
假设原始数据帧是df(dtype 是int):
Week Year Weekday Volume
0 1 2000 1 0
1 1 2000 2 10
2 1 2000 3 10
3 2 2000 1 10
4 2 2000 2 0
5 1 2001 1 0
6 1 2001 2 10
7 1 2001 3 10
8 2 2001 1 10
9 2 2001 2 0
您可以使用:
s = df.groupby(['Week', 'Year']).sum().drop('Weekday', axis=1)
df2 = df.set_index(['Week', 'Year']).join(s,rsuffix='_tot').sort_index(level=1)
df2['ratio'] = df2.Volume / df2.Volume_tot
print(df2)
给出:
Weekday Volume Volume_tot ratio
Week Year
1 2000 1 0 20 0.0
2000 2 10 20 0.5
2000 3 10 20 0.5
2 2000 1 10 10 1.0
2000 2 0 10 0.0
1 2001 1 0 20 0.0
2001 2 10 20 0.5
2001 3 10 20 0.5
2 2001 1 10 10 1.0
2001 2 0 10 0.0
您可以通过以下方式获得预期输出:
print(df2.drop('Volume_tot', axis=1).reset_index())
这使:
Week Year Weekday Volume ratio
0 1 2000 1 0 0.0
1 1 2000 2 10 0.5
2 1 2000 3 10 0.5
3 2 2000 1 10 1.0
4 2 2000 2 0 0.0
5 1 2001 1 0 0.0
6 1 2001 2 10 0.5
7 1 2001 3 10 0.5
8 2 2001 1 10 1.0
9 2 2001 2 0 0.0
TA贡献1810条经验 获得超5个赞
您可以使用 Pandas 中的索引和groupby功能来执行分组操作。
假设您有一个df包含 ['week','year','weekday','volume'] 列的数据框,您的解决方案将如下所示:
import numpy as np
import pandas as pd
import timeit as t
# make up some data, only 1000 groups not your 30000, but it gets the point across
dates = pd.date_range(start = '2000-01-01', end = '2019-02-28', freq = 'D')
volume = np.random.randint(0,100,len(dates))
df = pd.DataFrame(list(zip(dates.week,dates.year,dates.dayofweek,volume)),
columns = ['week','year','weekday','volume'])
# group
grp = df.groupby(['year','week'])
grp_vol = grp['volume'].sum()
# rename to avoid overlap in names
grp_vol.name = 'weekly_volume'
# rejoin to calculate your ratio
df = df.join(grp_vol, on = ['year','week'])
df['ratio'] = df['volume']/df['weekly_volume']
然后计时
%timeit df['ratio'] = df['volume']/df['weekly_volume']
196 µs ± 4.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
添加回答
举报