2 回答
TA贡献1796条经验 获得超4个赞
NumPy回想一下,尽管它在原始类数组之上引入了维度、坐标和属性形式的标签,但它的xarray灵感来自pandas. 因此,要回答这个问题,您可以按照以下步骤进行。
from glob import glob
import numpy as np
import pandas as pd
# Get the list of all the csv files in data path
csv_flist = glob(data_path + "/*.csv")
df_list = []
for _file in csv_flist:
# get the file name from the data path
file_name = _file.split("/")[-1]
# extract the date from a file name, e.g. "data.2018-06-01.csv"
date = file_name.split(".")[1]
# read the read the data in _file
df = pd.read_csv(_file)
# add a column date knowing that all the data in df are recorded at the same date
df["date"] = np.repeat(date, df.shape[0])
df["date"] = df.date.astype("datetime64[ns]") # reset date column to a correct date format
# append df to df_list
df_list.append(df)
让我们检查一下例如第df一个df_list
print(df_list[0])
status user_id weight date
0 healthy 1 72 2019-06-01
1 obese 2 103 2019-06-01
连接所有的dfsaxis=0
df_all = pd.concat(df_list, ignore_index=True).sort_index()
print(df_all)
status user_id weight date
0 healthy 1 72 2019-06-01
1 obese 2 103 2019-06-01
2 healthy 1 70 2018-06-01
3 healthy 2 90 2018-06-01
使用 和将 的索引设置df_all为两个级别的levels[0] = "date"多索引levels[1]="user_id"。
data = df_all.set_index(["date", "user_id"]).sort_index()
print(data)
status weight
date user_id
2018-06-01 1 healthy 70
2 healthy 90
2019-06-01 1 healthy 72
2 obese 103
随后,您可以将结果pandas.DataFrame转换为xarray.Datasetusing .to_xarray(),如下所示。
xds = data.to_xarray()
print(xds)
<xarray.Dataset>
Dimensions: (date: 2, user_id: 2)
Coordinates:
* date (date) datetime64[ns] 2018-06-01 2019-06-01
* user_id (user_id) int64 1 2
Data variables:
status (date, user_id) object 'healthy' 'healthy' 'healthy' 'obese'
weight (date, user_id) int64 70 90 72 103
这将完全回答这个问题。
TA贡献2037条经验 获得超6个赞
试试这些:
import glob
import pandas as pd
path=(r'ur file')
all_file = glob.glob(path + "/*.csv")
li = []
for filename in all_file:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
添加回答
举报