我需要将HTML文件列表读取到熊猫数据帧中。每个HTML文件都有多个数据帧(我使用pd.concat来组合它们)。HTML文件名包含一个字符串,我想将其添加为列。# Read all files into a listfiles = glob.glob('monthly_*.html')# Zip the dfs with the desired string segmentzipped_dfs = [zip(pd.concat(pd.read_html(file)), file.split('_')[1]) for file in files]我在打开( df,产品)的压缩列表时遇到问题。dfs = []# Loop through the list of zips, for _zip in zipped_dfs: # Unpack the zip for _df, product in _zip: # Adding the product string as a new column _df['Product'] = product dfs.append(_df)但是,我收到错误'str' object does not support item assignment有人可以解释添加新列的最佳方法吗?
1 回答
繁华开满天机
TA贡献1816条经验 获得超4个赞
您应该从列表理解中删除该行。如果您想要串联数据帧和产品名称的元组,则应编写:zip
zipped_dfs = [(pd.concat(pd.read_html(file)), file.split('_')[1])
for file in files]
但是,不需要创建元组列表的中间步骤。整个方法可以简化如下:
dfs = []
for file in glob.glob('monthly_*.html'):
# NOTE: your code seemingly keeps .html in the product name
# so I modified the split operation
df = pd.concat(pd.read_html(file))
df['Product'] = file.split('.html')[0].split('_')[1]
dfs.append(df)
添加回答
举报
0/150
提交
取消