5 回答
TA贡献1829条经验 获得超6个赞
这里有一个使用 `concat' 的相当简单的方法:
f = df[["jobTitle", "female_count", "mean_z_score_female"]].rename(columns = {"female_count": "count",
"mean_z_score_female": "mean_z_score"})\
.assign(gender="female")
m = df[["jobTitle", "male_count", "mean_z_score_male"]].rename(columns = {"male_count": "count",
"mean_z_score_male": "mean_z_score"})\
.assign(gender="male")
pd.concat([m, f]).sort_values("jobTitle")
输出是:
jobTitle count mean_z_score gender
0 Associate 44.0 -0.047592 male
0 Associate 65.0 0.000000 female
1 Intern 17.0 0.000000 male
1 Intern 13.0 0.000000 female
2 Key Holder 32.0 -0.288726 male
2 Key Holder 51.0 -0.352018 female
3 Retail Store Manager 6.0 -0.002756 male
3 Retail Store Manager 19.0 0.082110 female
4 Seasonal Sales Associate 26.0 0.000000 male
4 Seasonal Sales Associate 53.0 -0.109181 female
5 other_jobTitles 125.0 0.613314 male
5 other_jobTitles 146.0 0.231569 female
TA贡献1794条经验 获得超8个赞
这是的工作pd.wide_to_long,但您首先必须重命名某些列,即female_countandmale_count和:count_femalecount_male
df.columns = ["_".join(entry.split("_")[::-1])
if "count" in entry else entry
for entry in df]
]
print(df.columns)
Index(['count_female', 'jobTitle', 'count_male', 'mean_z_score_female',
'mean_z_score_male'],
dtype='object')
print (pd.wide_to_long(df, stubnames=["count","mean_z_score"],
i="jobTitle",j="gender", sep="_", suffix="\w+"))
count mean_z_score
jobTitle gender
Associate female 65.0 0.000000
Intern female 13.0 0.000000
Key Holder female 51.0 -0.352018
Retail Store Manager female 19.0 0.082110
Seasonal Sales Associate female 53.0 -0.109181
other_jobTitles female 146.0 0.231569
Associate male 44.0 -0.047592
Intern male 17.0 0.000000
Key Holder male 32.0 -0.288726
Retail Store Manager male 6.0 -0.002756
Seasonal Sales Associate male 26.0 0.000000
other_jobTitles male 125.0 0.613314
TA贡献1820条经验 获得超9个赞
这是一个与 Roy2012 非常相似的答案,它使用append:
df_new = None
for gender in ['male','female']:
df_gender = (df[['jobTitle',f'{gender}_count',f'mean_z_score_{gender}']]
.rename(columns = {f'{gender}_count':'count',
f'mean_z_score_{gender}': 'mean_z_score'}))
df_gender['gender'] = gender
df_new = df_gender if df_new is None else df_new.append(df_gender)
df_new = df_new.sort_values(by=['jobTitle','gender'],
axis=0).reset_index(drop=True)
print(df_new)
输出是:
jobTitle count mean_z_score gender
0 Associate 65.0 0.000000 female
1 Associate 44.0 -0.047592 male
2 Intern 13.0 0.000000 female
3 Intern 17.0 0.000000 male
4 Key Holder 51.0 -0.352018 female
5 Key Holder 32.0 -0.288726 male
6 Retail Store Manager 19.0 0.082110 female
7 Retail Store Manager 6.0 -0.002756 male
8 Seasonal Sales Associate 53.0 -0.109181 female
9 Seasonal Sales Associate 26.0 0.000000 male
10 other_jobTitles 146.0 0.231569 female
11 other_jobTitles 125.0 0.613314 male
TA贡献1810条经验 获得超4个赞
pd.melt(df, id_vars=['jobTitle','mean_z_score_female','mean_z_score_male'],
value_vars=['female_count', 'male_count'], var_name="gender", value_name='count').melt(
id_vars=['jobTitle', 'gender','count'],value_vars=['mean_z_score_female','mean_z_score_male'],
value_name='mean_z_score').drop('variable', axis=1)
TA贡献1856条经验 获得超17个赞
这是另一种使用方式lreshape:
newdf=pd.lreshape(df,{'count': ['female_count', 'male_count'],'mean_z_score': ['mean_z_score_female', 'mean_z_score_male']})\
.sort_values('jobTitle')
newdf['genre']=['female','male']*(len(newdf)//2)
输出:
newdf
jobTitle count mean_z_score genre
0 Associate 65.0 0.000000 female
6 Associate 44.0 -0.047592 male
1 Intern 13.0 0.000000 female
7 Intern 17.0 0.000000 male
2 Key Holder 51.0 -0.352018 female
8 Key Holder 32.0 -0.288726 male
3 Retail Store Manager 19.0 0.082110 female
9 Retail Store Manager 6.0 -0.002756 male
4 Seasonal Sales Associate 53.0 -0.109181 female
10 Seasonal Sales Associate 26.0 0.000000 male
5 other_jobTitles 146.0 0.231569 female
11 other_jobTitles 125.0 0.613314 male
注意:lreshape
目前没有记录,可能会被删除。
添加回答
举报