首页猿问基于两个条件对数据集进行子集化，将...

基于两个条件对数据集进行子集化，将每个数据帧保存到 .csv 文件中，迭代每个文件并绘制图形

Python

www说 2024-01-16 09:51:34

我是数据科学新手，需要帮助执行以下操作：region(I) 在我的例子中，根据列中的唯一组和另一个组分割数据集country(II) 我想将每个数据帧保存为 .csv 文件 - 像这样regionname_country.csv，例如west_GER.csv，east_POL.csv(III) 如果可能的话，我想迭代每个 .csv 文件以绘制每个 dffor loop 的散点图。education vs age(IV) 最后将我的绘图/图形保存在 pdf 文件中（每页 4 个图形）'df' Region, country, Age, Education, Income, FICO, Target1 west, GER, 43, 1, 47510, 710, 12 east, POL, 32, 2, 73640, 723, 13 east, POL, 22, 2, 88525, 610, 04 west, GER, 55, 0, 31008, 592, 05 north, USA, 19, 0, 18007, 599, 16 south, PER, 27, 2, 68850, 690, 07 south, BRZ, 56, 3, 71065, 592, 08 north, USA, 39, 1, 98004, 729, 19 east, JPN, 36, 2, 51361, 692, 010 west, ESP, 59, 1, 98643, 729, 1期望的结果： # df_to_csv : 'west_GER.csv'west, GER, 43, 1, 47510, 710, 1 west, GER, 55, 0, 31008, 592, 0 # west_ESP.csvwest, ESP, 59, 1, 98643, 729, 1 # east_POL.csveast, POL, 32, 2, 73640, 723, 1 ...# north_USA.csvnorth, USA, 39, 1, 98004, 729, 1 north, USA, 19, 0, 18007, 599, 1

查看完整描述

2 回答

呼如林

TA贡献1798条经验获得超3个赞

对于 Python：（

一）和（二）：

for i in df.groupby(["Region", "country"])[["Region", "country"]].apply(lambda x: list(np.unique(x))):

df.groupby(["Region", "country"]).get_group((i[1], i[0])).to_csv(f"{i[1]}_{i[0]}.csv")

（三）、（四）：

import glob

import matplotlib.pyplot as plt

fig, axs = plt.subplots(nrows=2, ncols=2)

for ax, file in zip(axs.flatten(), glob.glob("./*csv")):

df_temp = pd.read_csv(file)

region_temp = df_temp['Region'][0]

country_temp = df_temp['country'][0]

ax.scatter(df_temp["Age"], df_temp["Education"])

ax.set_title(f"Region:{region_temp}, Country:{country_temp}")

ax.set_xlabel("Age")

ax.set_ylabel("Education")

plt.tight_layout()

fig.savefig("scatter.pdf")

反对回复 2024-01-16

慕侠2389804

TA贡献1719条经验获得超6个赞

在 R 中，您可以这样做：

library(tidyverse)

#get data in list of dataframes

df %>%

select(Region, country, Education, Age) %>%

group_split(Region, country) -> split_data

#From list of data create list of plots.

list_plots <- map(split_data, ~ggplot(.) + aes(Education, Age) +

geom_point() +

ggtitle(sprintf('Plot for region %s and country %s',

first(.$Region), first(.$country))))

#Write the plots in pdf as well as write the csvs.

pdf("plots.pdf", onefile = TRUE)

for (i in seq_along(list_plots)) {

write.csv(split_data, sprintf('%s_%s.csv',

split_data[[i]]$Region[1], split_data[[i]]$country[1]), row.names = FALSE)

print(list_plots[[i]])

}

dev.off()

反对回复 2024-01-16

2 回答
0 关注
134 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

基于两个条件对数据集进行子集化，将每个数据帧保存到 .csv 文件中，迭代每个文件并绘制图形

基于两个条件对数据集进行子集化，将每个数据帧保存到 .csv 文件中，迭代每个文件并绘制图形

2 回答

添加回答