2 回答

TA贡献1757条经验 获得超8个赞
用于defaultdict对列表中的所有值求和,然后将其转换为元组列表并传递给 DataFrame 构造函数:
from collections import defaultdict
out = []
for a, b in zipped:
d = defaultdict(int)
for x, y in zip(a, b):
if x in team1:
d['Team_One'] +=y
else:
d[x] = y
out.append((list(d.keys()), list(d.values())))
df = pd.DataFrame(out, columns=['Names','Prob'])
print (df)
Names Prob
0 [Team_One] [100.0]
1 [Team_One, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]
2 [Josh, Team_One] [51, 49]
如果没有0值,解决方案工作Prob:
out = []
for a, b in zipped:
n, p = [],[]
tot = 0
for x, y in zip(a, b):
if x in team1:
tot +=y
else:
n.append(x)
p.append(y)
if tot != 0:
p.append(tot)
n.append('Team_One')
out.append((n, p))
df = pd.DataFrame(out, columns=['Names','Prob'])
print (df)
Names Prob
0 [Team_One] [100.0]
1 [Andy, Vera, Kate, Team_One] [4.5, 5.5, 60.0, 30.0]
2 [Josh, Team_One] [51, 49]
在 Pandas 中处理列表在列中很慢,所以最好先展平列表:
from itertools import chain
lens = [len(x) for x in df['Names']]
df = pd.DataFrame({
'row' : np.arange(len(df)).repeat(lens),
'Names' : list(chain.from_iterable(df['Names'].tolist())),
'Prob' : list(chain.from_iterable(df['Prob'].tolist()))
})
然后用isin最后一个聚合替换值sum:
team1 = ['Anne', 'Mike', 'Sophie']
df.loc[df['Names'].isin(team1), 'Names'] = 'Team_One'
df = df.groupby(['row','Names'], as_index=False, sort=False)['Prob'].sum()
print (df)
row Names Prob
0 0 Team_One 100.0
1 1 Team_One 30.0
2 1 Andy 4.5
3 1 Vera 5.5
4 1 Kate 60.0
5 2 Josh 51.0
6 2 Team_One 49.0

TA贡献1851条经验 获得超4个赞
似乎没有办法绕过创建新列表来替换旧列表,因为从原始列表中删除项目成本太高。我认为这可能是通过名称和概率的可行解决方案,如果名称不在 team1 中,请将名称和概率附加到新列表中。如果名称在 team1 中,则不要添加该名称,而是保留针对 team1 名称遇到的问题的总和。如果在遍历行的每个名称后该总和不为零,则至少找到了一个 team1 成员(假设所有概率都是正数,如果为真,则 idk)。然后最后,我们将“Team_One”作为名称和 probs 的总和附加到 probs 列表(如果 sum 非零),并用这些新创建的列表替换数据框的列表。
def pool(df):
# Set of team1 names for faster look up than a list
team1 = {'Anne', 'Mike', 'Sophie'}
for i, names in enumerate(df['Names']):
# iterating through every row and initializing new lists to replace the name/prob lists
new_names = []
new_probs = []
team1_prob = 0
for name, prob in zip(names, df['Probs'][i]):
# iterating through every name/prob pair.
if name not in team1:
# add the pair to the new lists if not in team1
new_names.append(name)
new_probs.append(prob)
else:
# keep a sum of probs for all team1 members found, but don't append their name
team1_prob += prob
if team1_prob != 0:
# assuming all probs are positive, thus if any team1 member was found, team1_prob must be nonzero
new_names.append('Team_One')
new_probs.append(team1_prob)
# replace lists in the original df
df['Names'][i] = new_names
df['Prob'][i] = new_probs
return df
添加回答
举报