4 回答
TA贡献1815条经验 获得超13个赞
尝试将 batch_size 更改为 32、16 或 8 之类的值。显然,对于 rtx 2060/70/80,有一个 tensorflow 错误导致其内存不足。
TA贡献1831条经验 获得超4个赞
在类似的情况下,以下代码段有所帮助。
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)在类似的情况下,以下代码段有所帮助。
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)
TA贡献1804条经验 获得超3个赞
我认为这里的大多数答案都忽略了这里问题的症结所在。这里的 Tensorflow 模型尝试对定义的 Embedding Layer 中不存在的索引执行 Embedding Lookup 操作。大多数答案都指向 VRAM 问题,但很可能由于简单的查找问题而出现此消息。
要解决此问题,您可以定义自己的字典并对标签进行编码,并且对于每个未知标签,您可以返回0或-1保留一个未知类别。
一些解决此类问题的示例代码(受这篇文章的启发,似乎适用于测试数据):
自定义字典处理类:
class EmbeddingMapping:
"""
An instance of this class should be defined
for each categorical variable you want to use.
"""
def __init__(self, series: pd.Series) -> None:
# get a list of unique values
values = series.unique().tolist()
# dictionary mapping
self.embedding_dict: Dict[str, int] = {value: int_value + 1 for int_value, value in enumerate(values)}
self.num_values: int = len(values) + 1 # +1 for unknown categories
def get_mapping(self, value: str) -> int:
# return value if it was seen in training
if value in self.embedding_dict:
return self.embedding_dict[value]
else:
return 0
构建映射:
# build mappings
res_dict_train: Dict[str, EmbeddingMapping] = {}
for var in categorical_features:
embd_train = EmbeddingMapping(X_train_categorical[var])
temp_series_train = X_train_categorical[var].apply(embd_train.get_mapping)
res_dict_train[var] = temp_series_train
X_train_categorical = X_train_categorical.assign(**res_dict_train)
结合分类和数值特征的模型:
# Keras
# Categorical vars
models_lst = []
inputs = []
for cat_feature in categorical_features:
print('---------------------------------------')
print(f'Info for categorical feature {cat_feature}')
input_i = Input(shape=(1,), dtype='int32')
inputs.append(input_i)
num_categories = EmbeddingMapping(X_train_categorical[cat_feature]).num_values
print(f"Number of categories: {num_categories}")
embedding_size = min(np.ceil(num_categories/2), 50) # rule of thumb
embedding_size = int(embedding_size)
print(f'Embedding size: {embedding_size}')
model_i = Embedding(input_dim=num_categories, output_dim=embedding_size, input_length=1, name=f'embedding_{cat_feature}')(input_i)
model_i2 = Reshape(target_shape=(embedding_size,))(model_i)
models_lst.append(model_i2)
# layer for numerical
input_numerical = Input(shape=(len(numerical_features),), dtype='float32')
numerical_model = Reshape(target_shape=(2,))(input_numerical)
models_lst.append(numerical_model)
inputs.append(input_numerical)
concatenated = concatenate(models_lst, axis=-1)
mymodel = Dense(50, activation="relu")(concatenated)
mymodel2 = Dense(15, activation="relu")(mymodel)
mymodel3 = Dense(1, activation='sigmoid')(mymodel2)
final_model = models.Model(inputs, mymodel3)
final_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc', 'binary_accuracy'])
final_model.fit(x=train_input_list, validation_split=0.2, y=y_train, epochs=1, batch_size=128)
为了解释代码,它创建了一个嵌入层,如果嵌入查找在任何情况下都失败,我们分配一个未知变量。如果您有一个自定义的 Data 对象,例如 Pandas DataFrame,您可以使您的数值和分类特征离散,并以这种方式应用模型,或者仅使用具有上述映射的分类模型代码。另一种方法是使用 Scikit-Learn OrdinalEncoder(自 SKLearn 0.24.2 起添加),但我发现这更简单,因为它易于维护。
添加回答
举报