2 回答
TA贡献2036条经验 获得超8个赞
你可以在使用 Pandas之前操作你的字典:
from operator import itemgetter
# sort by value descending
items_sorted = sorted(d.items(), key=itemgetter(1), reverse=True)
# calculate sum of others
others = ('Other', sum(map(itemgetter(1), items_sorted[5:])))
# construct dictionary
d = dict([*items_sorted[:5], others])
print(d)
{'Games': 715067930.8599964,
'Design': 705237125.089998,
'Technology': 648570433.7599969,
'Film & Video': 379559714.56000066,
'Music': 191227757.8699999,
'Other': 658334549.8999995}
TA贡献1936条经验 获得超6个赞
基于@jpp 的想法,但使用堆:
import heapq
d = {'Games': 715067930.8599964,
'Design': 705237125.089998,
'Technology': 648570433.7599969,
'Film & Video': 379559714.56000066,
'Music': 191227757.8699999,
'Publishing': 130763828.65999977,
'Fashion': 125678824.47999984,
'Food': 122781563.58000016,
'Art': 89078801.8599998,
'Comics': 70600202.99999984,
'Theater': 42662109.69999992,
'Photography': 37709926.38000007,
'Crafts': 13953818.35000002,
'Dance': 12908120.519999994,
'Journalism': 12197353.370000007}
top_5 = set(heapq.nlargest(5, d, key=d.get))
groups = {}
for category, pledge in d.items():
new_category = category if category in top_5 else 'Other'
groups.setdefault(new_category, []).append(pledge)
result = {k: sum(v) for k, v in groups.items()}
print(result)
输出
{'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999994, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066, 'Music': 191227757.8699999}
或者,如果您喜欢 numpy:
import numpy as np
d = {'Games': 715067930.8599964,
'Design': 705237125.089998,
'Technology': 648570433.7599969,
'Film & Video': 379559714.56000066,
'Music': 191227757.8699999,
'Publishing': 130763828.65999977,
'Fashion': 125678824.47999984,
'Food': 122781563.58000016,
'Art': 89078801.8599998,
'Comics': 70600202.99999984,
'Theater': 42662109.69999992,
'Photography': 37709926.38000007,
'Crafts': 13953818.35000002,
'Dance': 12908120.519999994,
'Journalism': 12197353.370000007}
categories, pledge_values = map(np.array, zip(*d.items()))
partition = np.argpartition(pledge_values, -5)
top_5 = set(categories[partition][-5:])
groups = {}
for category, pledge in d.items():
new_category = category if category in top_5 else 'Other'
groups.setdefault(new_category, []).append(pledge)
result = {k: sum(v) for k, v in groups.items()}
print(result)
输出
{'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999995, 'Music': 191227757.8699999, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066}
第二个提案(使用 numpy)的复杂度是O(n),其中n是 的键值对的数量d。
添加回答
举报