3 回答
TA贡献1871条经验 获得超8个赞
这是我能想到的最 pythonic 的方式。我的做法是先对列表的列表进行排序,按sublist[3],这意味着当我们遍历列表时,我们最终会在遇到重复项之前遇到具有最大评论数的子列表。这个技巧将用于构建最终列表。
meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)
# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
if not meta[0] in [item[0] for item in final_list]:
'''
If another meta with the same name (0th index)
doesn't already exist in final_list, add it
'''
final_list.append(meta)
输出-
[['Instagram',
'SOCIAL',
'4.5',
66577446,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Teen',
'Social',
'July 31, 2018',
'Varies with device',
'Varies with device'],
['Gmail',
'COMMUNICATION',
'4.3',
4604483,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Everyone',
'Communication',
'August 2, 2018',
'Varies with device',
'Varies with device'],
['Coloring book moana',
'FAMILY',
'3.9',
974,
'14M',
'500,000+',
'Free',
'0',
'Everyone',
'Art & Design;Pretend Play',
'January 15, 2018',
'2.0.0',
'4.0.3 and up']]
基本上它将所有不存在的元数据添加到final_list. 为什么这行得通?因为您在循环时遇到的第一个元数据是评论数最高的元数据。所以一旦那个被添加,它的复制品就不能被添加,我们就完成了。
注意:这不会保留评论本身的顺序。它只会确保只保留评论数最高的评论,以防出现同名的重复评论。
TA贡献1869条经验 获得超4个赞
这个问题可能有更优雅/pythonic 的解决方案,但这是一个可能的途径:
my_list = [...] # Nested list here
def compare_duplicates(nested_list, name_index=0, compare_index=3):
max_values = dict() # Used two dictionaries for readability
final_indexes = dict()
for i, item in enumerate(nested_list):
name, value = item[name_index], item[compare_index]
if value > max_values.get(name, 0):
max_values[name] = value
final_indexes[name] = i
return [nested_list[i] for i in final_indexes.values()]
print(compare_duplicates(my_list))
TA贡献1806条经验 获得超5个赞
是这样的:
_DATA = [
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
]
def print_highest(data):
list_map = {}
for d in data:
key = str(d[0:3] + d[4:])
if key not in list_map:
list_map[key] = d
continue
if d[3] > list_map[key][3]:
list_map[key] = d
for l in list_map.values():
print(l)
print_highest(_DATA)
输出:
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
添加回答
举报