1 回答
TA贡献1877条经验 获得超1个赞
我认为你需要与支点相反的东西——融化。您的数据采用“宽”格式,如果将数据转换为“高”格式,则可以更轻松地进行此汇总。获得高格式的数据后,您可以使用 groupby 来汇总同一年内的值。
我下载了房屋库存和销售数据集,并编写了一个简短的程序来汇总同一年的所有值。
代码:
import pandas as pd
df = pd.read_csv("Metro_invt_fs_uc_sfrcondo_smoothed_month.csv")
# Take all of the columns after the index and convert them into additional rows
df = df.melt(id_vars=["RegionID", "SizeRank", "RegionName", "RegionType", "StateName"], var_name="Date")
# Drop date, but keep year
df["Year"] = pd.to_datetime(df["Date"]).dt.year
df = df.drop("Date", axis="columns")
# Aggregate each year
df = df.groupby(["RegionID", "SizeRank", "RegionName", "RegionType", "StateName", "Year"], as_index=False).sum()
print(df)
输出:
RegionID SizeRank RegionName RegionType StateName Year value
0 394304 74 Akron, OH Msa OH 2017 3576.0
1 394304 74 Akron, OH Msa OH 2018 42625.0
2 394304 74 Akron, OH Msa OH 2019 39078.0
3 394304 74 Akron, OH Msa OH 2020 21532.0
4 394308 60 Albany, NY Msa NY 2017 2969.0
.. ... ... ... ... ... ... ...
475 753906 75 North Port-Sarasota-Bradenton, FL Msa FL 2020 73953.0
476 753924 54 Urban Honolulu, HI Msa HI 2017 3735.0
477 753924 54 Urban Honolulu, HI Msa HI 2018 50079.0
478 753924 54 Urban Honolulu, HI Msa HI 2019 57413.0
479 753924 54 Urban Honolulu, HI Msa HI 2020 35522.0
[480 rows x 7 columns]
添加回答
举报