首页猿问 Pandas - 循环目录...

Pandas - 循环目录 read_excel 使用工作簿月份将日期值添加到数据框

Python

qq_笑_17 2023-11-09 21:40:19

我有一个包含 Excel 文件的目录，我正在循环遍历这些文件并将每个文件中的一张表读取到 Pandas 数据框中。每个文件包含一个月的数据（示例名称=“Savings January 2019.xlsx”）。Excel工作表中没有日期列，因此我想在数据框中添加“日期”列，并按工作簿名称中的月份和年份读取每个文件（例如“2019年1月”）和添加“MM-DD-YYYY”（例如“01-01-2019”）作为读入的每行的日期值。下面是我的工作循环，读取 12 个没有日期的 Excel 工作簿，仅生成所有 12 个月的总计。我需要日期，以便可以按月可视化数据。df_total = pd.DataFrame()for file in files: # loop through Excel files (each file adds date value based on file name) if file.endswith('.xlsx'): excel_file = pd.ExcelFile(file) sheets = excel_file.sheet_names for sheet in sheets: # loop through sheets inside an Excel file df = excel_file.parse(sheet_name = "Group Savings") df_total = df_total.append(df)当前 df: State Group Value0 Illinois 000000130 470.931 Illinois 000000130 948.332 Illinois 000000784 3498.423 Illinois 000000784 16808.164 Illinois 000002077 7.00需要df： State Group Date Value0 Illinois 000000130 01-01-2019 470.931 Illinois 000000130 01-01-2019 948.332 Illinois 000000784 01-01-2019 3498.423 Illinois 000000784 02-01-2019 6808.164 Illinois 000002077 02-01-2019 7.00我做了一些研究，认为这就像创建列然后添加日期值，但无法弄清楚如何解析文件名来执行此操作，并且我显然是这里的初学者。for sheet in sheets: # loop through sheets inside an Excel file df = excel_file.parse(sheet_name = "Group Savings") df_total = df_total.append(df) df_total['Date'] = #if excel_file contains 'January 2019', then df_total['Date'] == '01-01-2019

查看完整描述

2 回答

阿晨1998

TA贡献2037条经验获得超6个赞

您的概念是正确的，您的代码也即将完成。您现在需要添加的只是日期解析。

您可以使用Python的strptime()来解析文件名中的日期。

https://docs.python.org/3/library/datetime.html

例如，如果您的文件名类似于“Savings January 2019.xlsx”，那么您可以按如下方式解析它。请注意，这不是解析字符串的唯一方法，还有其他几种可以使用此方法的变体。

from datetime import datetime

string = 'Savings January 2019.xlsx'

month_str = string.split(' ')[1]

year_str = string.split(' ')[2].split('.')[0]

date_object = datetime.strptime(month_str + year_str, "%B%Y")

以下是 python 日期字符串格式的一个很好的概述：https://strftime.org/

获得日期对象后，您只需将其添加到数据框中即可。

df['Date'] = date_object

反对回复 2023-11-09

哔哔one

TA贡献1854条经验获得超8个赞

这是最终的代码。请注意，文件名实际上更长，并且我遗漏了一些公司信息，因此 .split 中的更改

from datetime import datetime

#create empty dataframe

df_total = pd.DataFrame()

# loop through Excel files

for file in files:

if file.endswith('.xlsx'):

excel_file = pd.ExcelFile(file)

# parse excel filename to take month and year and save as date object for Date column

month_str = file.split(' ')[4]

year_str = file.split(' ')[5].split('.')[0]

date_object = datetime.strptime(month_str + year_str, "%B%Y")

# loop excel sheets and add "Date" column, populating with date from parsed filename

sheets = excel_file.sheet_names

for sheet in sheets: # loop through sheets inside an Excel file

df = excel_file.parse(sheet_name = "Group Savings")

df_total = df_total.append(df)

df_total['Date'] = date_object

反对回复 2023-11-09

2 回答
0 关注
169 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Pandas - 循环目录 read_excel 使用工作簿月份将日期值添加到数据框

Pandas - 循环目录 read_excel 使用工作簿月份将日期值添加到数据框

2 回答

添加回答