我有一个有很多条目的DF。DF 的摘录如下所示。DF_OLD =...sID tID NER token Prediction274 79 U-Peop khrushchev Live_In-ARG2+B274 79 O 's Live_IN-ARG2+L807 53 U-Loc louisiana Live_IN-ARG2+U807 56 B-Peop earl Live_IN-ARG1+B807 57 L-Peop long Live_IN-ARG1+L807 13 B-Peop dwight Live_IN-ARG1+B807 13 I-Peop d. Live_IN-ARG1+I807 13 L-Peop eisenhower Live_IN-ARG1+L...该列sID将不同的句子分开。该列Prediction显示了机器学习分类器的结果。这些可能很荒谬。我的目标是按照以下方案将所有预测的标签分组:DF_Expected =...sID entity1 tID1 entity2 tID2 Relation274 NaN NaN khrushchev 's 79 Live_In 807 earl long 56 57 louisiana 53 Live_In807 dwight d. eisenhower 13 louisiana 53 Live_In...“-ARGX-”部分显示实体在表中的位置,而第一个“-”之前的部分显示关系。如果缺少参数部分之一,则相应的单元格应为空。这是我尝试过的:DF["Live_In_Predict_Split"] = DF["Prediction"].str.split("+").str[0]DF["token2"] = DF["token"]DF["tokenID2"] = DF["tokenID"]DF["Live_In_Predict2"] = DF["Live_In_Predict"]data_tokeni_map = DF.groupby(["Live_In_Predict_Split","sentenceID"],as_index=True, sort=False).agg(" ".join).reset_index()s = data_tokeni_map.loc[:,['sentenceID','token2',"tokenID2","Live_In_Predict2"]].merge(data_tokeni_map.loc[:,['sentenceID','token',"tokenID","Live_In_Predict"]],on='sentenceID') s = s.loc[s.token2!=s.token].drop_duplicates()我缺少某种计数器来区分不同的“-ARGX-”和某种 GroupBy 函数(GroupingBy tokenID 不智能,因为它会产生错误的结果)。因此,我的新DF错误:DF_EDITED =...sID entity1 tID1 entity2 tID2 ...807 dwight d eisenhower earl long 13 56 57 louisiana 53 807 louisiana 13 56 57 dwight d eisenhower earl long 53
添加回答
举报
0/150
提交
取消