首页猿问从列中的相似值创建嵌套字典并使用值...

从列中的相似值创建嵌套字典并使用值作为字典的键包含具有该值的所有行

Python

慕村225694 2022-10-05 18:25:44

所有，我在图像中有边界框的数据框，每个框都在单独的行中。我想要做的是组合特定图像的所有行。 image xmin ymin xmax ymax label0 bookstore_video0_40.jpg 763 899 806 940 pedestrian3 bookstore_video0_40.jpg 1026 754 1075 797 pedestrian4 bookstore_video0_40.jpg 868 770 927 822 biker5 bookstore_video0_40.jpg 413 1010 433 1040 pedestrian21 bookstore_video0_80.jpg 866 278 917 328 pedestrian22 bookstore_video0_80.jpg 761 825 820 865 biker我在想的可能是把它变成一个单级嵌套字典，注意我对任何解决方案都持开放态度，而且我并不固定在字典中，就像这样。{'bookstore_video0_40.jpg': {'xmin': 1066, 'ymin': 802, 'xmax': 1093, 'ymax': 829, 'label': 'pedestrian'}但是以图像名称为键的所有行数据。我的最终目标是将其传递给一个函数，该函数将按顺序将每一行的数据写入文件。话虽如此，我对如何将数据分组到块中迷失了方向。我做了 groupby('image') 但我不知道如何将这些数据变成我想要的东西。有人有想法吗？我很确定这很容易，我环顾四周，但我看到的大多数回复都是针对更复杂的问题。提前致谢。

查看完整描述

4 回答

跃然一笑

TA贡献1826条经验获得超6个赞

那这个呢？

new_dict = df.set_index('image').stack().groupby('image').apply(list).to_dict()

print(new_dict)

{'bookstore_video0_40.jpg': [763,

899,

806,

940,

'pedestrian',

1026,

754,

1075,

797,

'pedestrian',

868,

770,

927,

822,

'biker',

413,

1010,

433,

1040,

'pedestrian'],

'bookstore_video0_80.jpg': [866,

278,

917,

328,

'pedestrian',

761,

825,

820,

865,

'biker']}

反对回复 2022-10-05

开心每一天1111

TA贡献1836条经验获得超13个赞

这是一个基于您的示例的工作示例，但读取实际的 XML 文件除外。非常感谢。我怀疑您的回答会很有用，因为这是机器视觉领域的人们在进行诸如切割已经注释的 4K 图像之类的事情时会遇到的问题。

import sys

import glob

import numpy as np

import pandas as pd

from lxml import etree

from pathlib import Path, PurePosixPath

from xml.etree import ElementTree as ET

df = pd.DataFrame(dict(

image = '40.jpg 40.jpg 40.jpg 40.jpg 80.jpg 80.jpg'.split(),

xmin = [763, 1026, 868, 413, 866, 761],

ymin = [899, 754, 770, 1010, 278, 825],

xmax = [806, 1075, 927, 433, 917, 820],

ymax = [940, 797, 822, 1040, 328, 865],

label = 'pedestrian pedestrian biker pedestrian pedestrian biker'.split(),

))

for img in df['image'].unique():

img_df = df[df['image']==img].drop(columns = 'image').reset_index()

boxes = range(img_df.shape[0])

print(img, '\n', img_df)

# Ideally your custom voc writer can be inited here

# with something like:

image = img

# v_writer = VocWriter(f'path/{img[:-4]}.xml')

print("New custom VOC Writer instance inited here!")

depth = 3

filepath = PurePosixPath('image')

annotation = ET.Element('annotation')

ET.SubElement(annotation, 'folder').text = str(image)

ET.SubElement(annotation, 'filename').text = str(image)

ET.SubElement(annotation, 'segmented').text = '0'

size = ET.SubElement(annotation, 'size')

ET.SubElement(size, 'width').text = str('0')

ET.SubElement(size, 'height').text = str('0')

ET.SubElement(size, 'depth').text = str('3')

for box in boxes:

xmin = img_df.loc[box,'xmin']

ymin = img_df.loc[box,'ymin']

xmax = img_df.loc[box,'xmax']

ymax = img_df.loc[box,'ymax']

label = img_df.loc[box,'label']

print(xmin, ymin, xmax, ymax)

# Inside of this loop,

# you can add each box to your VocWriter object

# something like:

ob = ET.SubElement(annotation, 'object')

ET.SubElement(ob, 'name').text = str(img_df.loc[box,'label'])

ET.SubElement(ob, 'pose').text = 'Unspecified'

ET.SubElement(ob, 'truncated').text = '0'

ET.SubElement(ob, 'difficult').text = '0'

bbox = ET.SubElement(ob, 'bndbox')

ET.SubElement(bbox, 'xmin').text = str(img_df.loc[box,'xmin'])

ET.SubElement(bbox, 'ymin').text = str(img_df.loc[box,'ymin'])

ET.SubElement(bbox, 'xmax').text = str(img_df.loc[box,'xmax'])

ET.SubElement(bbox, 'ymax').text = str(img_df.loc[box,'ymax'])

# Once you exit that inner loop,

# you can save your data to your .xml file

# with something like:

# v_writer.save(f'path/{img[:-4]}.xml')

print(".xml file saved here!")

fileName = str(img)

tree = ET.ElementTree(annotation)

tree.write("./mergedxml/" + fileName + ".xml", encoding='utf8')

反对回复 2022-10-05

繁花如伊

TA贡献2012条经验获得超12个赞

也许您需要在groupby上使用 dict 和tuple/list：

images_dict = dict(tuple(df.groupby('image')))

反对回复 2022-10-05

侃侃尔雅

TA贡献1801条经验获得超16个赞

我想将此作为评论而不是答案，但链接太长：

我写了一个voc作家。我只需要能够以这样的方式传递数据，以便我可以遍历它。我有一个不同的数据集，我在其中做类似的事情，但数据已经是一种易于使用的形式。对于我的项目，我花了很多时间编辑、清理、转换等数据。对我来说不好玩😁 – Robi Sen

你的 voc 作家是如何工作的？它是否类似于我链接到的那个（即使用 OPP 并具有用于将 bbox 数据添加到 xml 编写器实例的类方法，然后是另一种将该实例保存到 xml 文件的方法？）评论写得不好，这里有一个更好的例子来说明我的意思：

import pandas as pd

df = pd.DataFrame(dict(

image = '40.jpg 40.jpg 40.jpg 40.jpg 80.jpg 80.jpg'.split(),

xmin = [763, 1026, 868, 413, 866, 761],

ymin = [899, 754, 770, 1010, 278, 825],

xmax = [806, 1075, 927, 433, 917, 820],

ymax = [940, 797, 822, 1040, 328, 865],

label = 'pedestrian pedestrian biker pedestrian pedestrian biker'.split(),

))

for img in df['image'].unique():

img_df = df[df['image']==img].drop(columns = 'image').reset_index()

boxes = range(img_df.shape[0])

print(img, '\n', img_df)

# Ideally your custom voc writer can be inited here

# with something like:

# v_writer = VocWriter(f'path/{img[:-4]}.xml')

print('New custom VOC XML Writer instance inited here!')

for box in boxes:

xmin = img_df.loc[box,'xmin']

ymin = img_df.loc[box,'ymin']

xmax = img_df.loc[box,'xmax']

ymax = img_df.loc[box,'ymax']

label = img_df.loc[box,'label']

print(xmin, ymin, xmax, ymax)

# Inside of this loop,

# you can add each box to your VocWriter object

# something like:

# v_writer.addObject(label, xmin, ymin, xmax, ymax)

print('New bbox object added to writer instance here!')

# Once you exit that inner loop,

# you can save your data to your .xml file

# with something like:

# v_writer.save(f'path/{img[:-4]}.xml')

print(f'path/{img[:-4]}.xml file saved here!')

逐步浏览python导师中的示例，以更好地了解我的想法

反对回复 2022-10-05

4 回答
0 关注
110 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

从列中的相似值创建嵌套字典并使用值作为字典的键包含具有该值的所有行

从列中的相似值创建嵌套字典并使用值作为字典的键包含具有该值的所有行

4 回答

添加回答