为了账号安全,请及时绑定邮箱和手机立即绑定

将数据框列与列表值匹配,并附加数据框与匹配的行

将数据框列与列表值匹配,并附加数据框与匹配的行

RISEBY 2022-10-18 15:09:52
我有两个不同的 csv,我在两个数据帧中读取。我想将列 df1['building_type] 与 df2['model'] 匹配,并将相应的行附加到 df1。数据框 1:data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},       {'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]df1 = pd.DataFrame(data)print(df1)数据框 2:data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},       {'type': 'B3', 'floor': '4',  'model': ['wood_double','concrete_double','steel_double']}]df2=pd.DataFrame(data2)print(df2)最终数据框:   length   width   height  building_type                                 type  floor0   34      58.5    60.2   [concrete, wood, steel, laminate]              A1    21   42      33      23     [concrete_double, wood_double, steel_double]   B3    4
查看完整描述

1 回答

?
交互式爱情

TA贡献1712条经验 获得超3个赞

pd.merge似乎是这里必要的工具,但我们需要一个不可变的 dtype。list是可变的,不能加入。我们可以将list(mutable) 转换为tupleor frozenset,这两者都是不可变的,可以用来加入。由于示例输出显示顺序无关紧要,我选择了frozenset.


这是代码:


import pandas as pd


data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},

       {'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]

df1 = pd.DataFrame(data)

print(df1)


data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},

       {'type': 'B3', 'floor': '4',  'model': ['wood_double','concrete_double','steel_double']}]

df2=pd.DataFrame(data2)

print(df2)



# Note: Merge fails on mutable dtype

# pd.merge(df1, df2, left_on='building_type', right_on='model')

# Produces `TypeError: unhashable type: 'list'`


# Convert mutable type to immutable type and merge.

# `tuple` is best if order matters for you. I am assuming that the

# order doesn't matter based on the sample output, so `frozenset` is more

# appropriate.

df1['building_type'] = df1['building_type'].apply(frozenset)

df2['model'] = df2['model'].apply(frozenset)


# Now, merge. Note that since column names are different both

# 'building_type' and 'model' would be retained. You can remove one of them.

final_df = pd.merge(df1, df2, left_on='building_type', right_on='model')

final_df = final_df.drop(['model'], axis=1)

print(final_df)


我机器上的输出:


  length width height                                 building_type

0     34  58.5   60.2             [concrete, wood, steel, laminate]

1     42    33     23  [concrete_double, wood_double, steel_double]

  type floor                                         model

0   A1     2             [wood, laminate, concrete, steel]

1   B3     4  [wood_double, concrete_double, steel_double]

  length width height                                 building_type type floor

0     34  58.5   60.2             (laminate, wood, steel, concrete)   A1     2

1     42    33     23  (concrete_double, steel_double, wood_double)   B3     4


查看完整回答
反对 回复 2022-10-18
  • 1 回答
  • 0 关注
  • 81 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信