我有一个包含多列的购买数据框,包括以下三列: PURCHASE_ID (index of purchase) WORKER_ID (index of worker) ACCOUNT_ID (index of account)一个工人可以有多个关联的账户,一个账户可以有多个工人。如果我创建 WORKER 和 ACCOUNT 实体并添加关系,则会出现错误:KeyError: 'Variable: ACCOUNT_ID not found in entity'到目前为止,这是我的代码:import pandas as pdimport featuretools as ftimport featuretools.variable_types as vtypesd = {'PURCHASE_ID': [1, 2], 'WORKER_ID': [0, 0], 'ACCOUNT_ID': [1, 2], 'COST': [5, 10], 'PURCHASE_TIME': ['2018-01-01 01:00:00', '2016-01-01 02:00:00']}df = pd.DataFrame(data=d)data_variable_types = {'PURCHASE_ID': vtypes.Id, 'WORKER_ID': vtypes.Id, 'ACCOUNT_ID': vtypes.Id, 'COST': vtypes.Numeric, 'PURCHASE_TIME': vtypes.Datetime}es = ft.EntitySet('Purchase')es = es.entity_from_dataframe(entity_id='purchases', dataframe=df, index='PURCHASE_ID', time_index='PURCHASE_TIME', variable_types=data_variable_types)es.normalize_entity(base_entity_id='purchases', new_entity_id='workers', index='WORKER_ID', additional_variables=['ACCOUNT_ID'], make_time_index=False)es.normalize_entity(base_entity_id='purchases', new_entity_id='accounts', index='ACCOUNT_ID', additional_variables=['WORKER_ID'], make_time_index=False)fm, features = ft.dfs(entityset=es, target_entity='purchases', agg_primitives=['mean'], trans_primitives=[], verbose=True)features如何分离实体以包含多对多关系?
添加回答
举报
0/150
提交
取消