首页猿问如何将多个csv连接到xarray...

如何将多个csv连接到xarray并定义坐标？

Python

慕桂英546537 2022-06-22 15:34:32

我有多个 csv 文件，具有相同的行和列，它们包含的数据因日期而异。每个 csv 文件都附属于不同的日期，在其名称中列出，例如data.2018-06-01.csv. 我的数据的一个最小示例如下所示：我有 2 个文件data.2018-06-01.csv和data.2019-06-01.csv，它们分别包含user_id, weight, status001, 70, healthy002, 90, healthy 和user_id, weight, status001, 72, healthy002, 103, obese我的问题：如何将 csv 文件连接到 xarray 并定义 xarray 的坐标是user_id和date？我尝试了以下代码df_all = [] date_arr = []for f in [`data.2018-06-01.csv`, `data.2019-06-01.csv`]: date = f.split('.')[1] df = pd.read_csv(f) df_all.append(df) date_arr.append(date)x_arr = xr.concat([df.to_xarray() for df in df_all], coords=[date_arr, 'user_id'])但coords=[...]会导致错误。我能做什么？谢谢

查看完整描述

2 回答

慕的地8271018

TA贡献1796条经验获得超4个赞

NumPy回想一下，尽管它在原始类数组之上引入了维度、坐标和属性形式的标签，但它的xarray灵感来自pandas. 因此，要回答这个问题，您可以按照以下步骤进行。

from glob import glob

import numpy as np

import pandas as pd

# Get the list of all the csv files in data path

csv_flist = glob(data_path + "/*.csv")

df_list = []

for _file in csv_flist:

# get the file name from the data path

file_name = _file.split("/")[-1]

# extract the date from a file name, e.g. "data.2018-06-01.csv"

date = file_name.split(".")[1]

# read the read the data in _file

df = pd.read_csv(_file)

# add a column date knowing that all the data in df are recorded at the same date

df["date"] = np.repeat(date, df.shape[0])

df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format

# append df to df_list

df_list.append(df)

让我们检查一下例如第df一个df_list

print(df_list[0])

status user_id weight date

0 healthy 1 72 2019-06-01

1 obese 2 103 2019-06-01

连接所有的dfsaxis=0

df_all = pd.concat(df_list, ignore_index=True).sort_index()

print(df_all)

status user_id weight date

0 healthy 1 72 2019-06-01

1 obese 2 103 2019-06-01

2 healthy 1 70 2018-06-01

3 healthy 2 90 2018-06-01

使用和将的索引设置df_all为两个级别的levels[0] = "date"多索引levels[1]="user_id"。

data = df_all.set_index(["date", "user_id"]).sort_index()

print(data)

status weight

date user_id

2018-06-01 1 healthy 70

2 healthy 90

2019-06-01 1 healthy 72

2 obese 103

随后，您可以将结果pandas.DataFrame转换为xarray.Datasetusing .to_xarray()，如下所示。

xds = data.to_xarray()

print(xds)

<xarray.Dataset>

Dimensions: (date: 2, user_id: 2)

Coordinates:

* date (date) datetime64[ns] 2018-06-01 2019-06-01

* user_id (user_id) int64 1 2

Data variables:

status (date, user_id) object 'healthy' 'healthy' 'healthy' 'obese'

weight (date, user_id) int64 70 90 72 103

这将完全回答这个问题。

反对回复 2022-06-22

阿晨1998

TA贡献2037条经验获得超6个赞

试试这些：

import glob

import pandas as pd

path=(r'ur file')

all_file = glob.glob(path + "/*.csv")

li = []

for filename in all_file:

df = pd.read_csv(filename, index_col=None, header=0)

li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

反对回复 2022-06-22

2 回答
0 关注
101 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何将多个csv连接到xarray并定义坐标？

如何将多个csv连接到xarray并定义坐标？

2 回答

添加回答