为了账号安全,请及时绑定邮箱和手机立即绑定

有没有办法在 tf.data.Dataset w/tf.py_func 中传递字典?

有没有办法在 tf.data.Dataset w/tf.py_func 中传递字典?

交互式爱情 2021-12-21 16:13:18
我在数据处理中使用 tf.data.Dataset,我想用 tf.py_func 应用一些 python 代码。顺便说一句,我发现在 tf.py_func 中,我无法返回字典。有没有办法做到这一点或解决方法?我有如下所示的代码def map_func(images, labels):    """mapping python function"""    # do something    # cannot be expressed as a tensor graph    return {        'images': images,        'labels': labels,        'new_key': new_value}def tf_py_func(images, labels):    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')return dataset.map(tf_py_func)===========================================================================已经有一段时间了,我忘记我问过这个问题了。我以另一种方式解决了它,它是如此简单,以至于我觉得我几乎是个傻瓜。问题是:tf.py_func 不能返回字典。dataset.map 可以返回字典。答案是:映射两次。def map_func(images, labels):    """mapping python function"""    # do something    # cannot be expressed as a tensor graph    return processed_images, processed_labelsdef tf_py_func(images, labels):    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')def _to_dict(images, labels):    return { 'images': images, 'labels': labels }return dataset.map(tf_py_func).map(_to_dict)
查看完整描述

2 回答

?
MYYA

TA贡献1868条经验 获得超4个赞

您可以将字典转换为返回的字符串,然后拆分为字典。


这可能看起来像这样:


return (images + " " + labels + " " + new value)

然后在您的其他功能中:


l = map_func(image, label).split(" ")

d['images'] = l[0]

d[

...


查看完整回答
反对 回复 2021-12-21
?
小唯快跑啊

TA贡献1863条经验 获得超2个赞

我也遇到过这个问题(我想使用非 TF 函数预处理文本数据,但将所有内容都保留在 Tensorflow 的 Dataset 对象的保护伞下)。事实上,不需要双重map()解决方法。在处理每个示例时,只需嵌入 Python 函数。


这是完整的示例代码;也在 colab 上进行了测试(前两行用于安装依赖项)。


!pip install tensorflow-gpu==2.0.0b1

!pip install tensorflow-datasets==1.0.2


from typing import Dict


import tensorflow as tf

import tensorflow_datasets as tfds


# Get a textual dataset using the 'tensorflow_datasets' library

dataset_builder = tfds.text.IMDBReviews()

dataset_builder.download_and_prepare()


# Do not randomly shuffle examples for demonstration purposes

ds = dataset_builder.as_dataset(shuffle_files=False)

training_ds = ds[tfds.Split.TRAIN]


print(training_ds)

# <_OptionsDataset shapes: {text: (), label: ()}, types: {text: tf.string, 

# label: tf.int64}>


# Print the first training example

for example in training_ds.take(1):

    print(example['text'])

    # tf.Tensor(b"As a lifelong fan of Dickens, I have ... realised.",

    # shape=(), dtype=string)


# some global configuration or object which we want to access in the

# processing function

we_want_upper_case = True



def process_string(t: tf.Tensor) -> str:

    # This function must have been called as tf.py_function which means

    # it's always eagerly executed and we can access the .numpy() content

    string_content = t.numpy().decode('utf-8')


    # Now we can do what we want in Python, i.e. upper-case or lower-case

    # depending on the external parameter.

    # Note that 'we_want_upper_case' is a variable defined in the outer scope

    # of the function! We cannot pass non-Tensor objects as parameters here.

    if we_want_upper_case:

        return string_content.upper()

    else:

        return string_content.lower()



def process_example(example: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]:

    # I'm using typing (Dict, etc.) just for clarity, it's not necessary


    result = {}

    # First, simply copy all the tensor values

    for key in example:

        result[key] = tf.identity(example[key])


    # Now let's process the 'text' Tensor.

    # Call the 'process_string' function as 'tf.py_function'. Make sure the

    # output type matches the 'Tout' parameter (string and tf.string).

    # The inputs must be in a list: here we pass the string-typed Tensor 'text'.

    result['text'] = tf.py_function(func=process_string,

                                    inp=[example['text']],

                                    Tout=tf.string)

    return result



# We can call the 'map' function which consumes and produces dictionaries

training_ds = training_ds.map(lambda x: process_example(x))


for example in training_ds.take(1):

    print(example['text'])

    # tf.Tensor(b"AS A LIFELONG FAN OF DICKENS, I HAVE ...  REALISED.",

    # shape=(), dtype=string)


查看完整回答
反对 回复 2021-12-21
  • 2 回答
  • 0 关注
  • 171 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信