2 回答
TA贡献1757条经验 获得超7个赞
您需要将这些字符串转换为向量,并将它们填充为相等的长度。我将向您展示一个示例partial_x_train_actors_array:
import tensorflow as tf
partial_x_train_actors_array = [b'victor mclaglen', b'jon hall', b'frances farmer',
b'olympe bradna', b'gene lockhart', b'douglass dumbrille',
b'francis ford', b'ben welden', b'abner biberman',
b'pedro de cordoba', b'rudy robles', b'bobby stone',
b'nellie duran', b'james flavin', b'nina campana']
tok = tf.keras.preprocessing.text.Tokenizer(char_level=True)
tok.fit_on_texts(partial_x_train_actors_array)
seq = tok.texts_to_sequences(partial_x_train_actors_array)
这seq看起来像:
[[20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],
[21, 7, 3, 5, 22, 1, 6, 6],
[14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],
[7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],
[17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],
[9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],
[14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],
[8, 2, 3, 5, 29, 2, 6, 9, 2, 3],
[1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],
[19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],
[4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],
[8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],
[3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],
[21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],
[3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]]
然后,将序列填充为等长:
padded = tf.keras.preprocessing.sequence.pad_sequences(seq)
array([[ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 7, 3, 5, 22, 1, 6, 6],
[ 0, 0, 0, 0, 14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],
[ 0, 0, 0, 0, 0, 7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],
[ 0, 0, 0, 0, 0, 17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],
[ 9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],
[ 0, 0, 0, 0, 0, 0, 14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],
[ 0, 0, 0, 0, 0, 0, 0, 0, 8, 2, 3, 5, 29, 2, 6, 9, 2, 3],
[ 0, 0, 0, 0, 1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],
[ 0, 0, 19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],
[ 0, 0, 0, 0, 0, 0, 0, 4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],
[ 0, 0, 0, 0, 0, 0, 0, 8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],
[ 0, 0, 0, 0, 0, 0, 3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],
[ 0, 0, 0, 0, 0, 0, 21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],
[ 0, 0, 0, 0, 0, 0, 3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]])
最后:
ds = tf.data.Dataset.from_tensor_slices(padded)
next(iter(ds))
<tf.Tensor: shape=(18,), dtype=int32, numpy=
array([ 0, 0, 0, 20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2,
3])>
如果出于任何原因,您需要所有输入(不仅仅是partial_x_train_actors_array)具有相同的填充形状,您可以使用该maxlen参数。
添加回答
举报