MobileNet是Google提出来的移动端分类网络。在V1中,MobileNet应用了深度可分离卷积(Depth-wise Seperable Convolution)并提出两个超参来控制网络容量,这种卷积背后的假设是跨channel相关性和跨spatial相关性的解耦。深度可分离卷积能够节省参数量省,在保持移动端可接受的模型复杂性的基础上达到了相当的高精度。而在V2中,MobileNet应用了新的单元:Inverted residual with linear bottleneck,主要的改动是为Bottleneck添加了linear激活输出以及将残差网络的skip-connection结构转移到低维Bottleneck层。
Paper:Inverted Residuals and Linear Bottlenecks Mobile Networks for Classification, Detection and Segmentation
MobileNetV2的整体结构如下图所示。每行描述一个或多个相同(步长)层的序列,每个bottleneck重复n次。 相同序列中的所有层具有相同数量的输出通道。 每个序列的第一层有使用步长s,所有其他层使用步长1。所有的空间卷积使用3 * 3的内核。扩展因子t始终应用于输入大小。假设输入某一层的tensor的通道数为k,那么应用在这一层上的filters数就为 k * t。
OpenCV 3.4
Python 3.5
Tensorflow-gpu 1.2.0
Keras 2.1.3
基于论文给出的参数,我使用Keras 2实现了网络结构,如下所示:
from keras.models import Modelfrom keras.layers import Input, Conv2D, GlobalAveragePooling2D, Dropoutfrom keras.layers import Activation, BatchNormalization, add, Reshapefrom keras.applications.mobilenet import relu6, DepthwiseConv2Dfrom keras.utils.vis_utils import plot_modelfrom keras import backend as Kdef _conv_block(inputs, filters, kernel, strides): """Convolution Block This function defines a 2D convolution operation with BN and relu6. # Arguments inputs: Tensor, input tensor of conv layer. filters: Integer, the dimensionality of the output space. kernel: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window. strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the width and height. Can be a single integer to specify the same value for all spatial dimensions. # Returns Output tensor. """ channel_axis = 1 if K.image_data_format() == 'channels_first' else -1 x = Conv2D(filters, kernel, padding='same', strides=strides)(inputs) x = BatchNormalization(axis=channel_axis)(x) return Activation(relu6)(x)def _bottleneck(inputs, filters, kernel, t, s, r=False): """Bottleneck This function defines a basic bottleneck structure. # Arguments inputs: Tensor, input tensor of conv layer. filters: Integer, the dimensionality of the output space. kernel: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window. t: Integer, expansion factor. t is always applied to the input size. s: An integer or tuple/list of 2 integers,specifying the strides of the convolution along the width and height.Can be a single integer to specify the same value for all spatial dimensions. r: Boolean, Whether to use the residuals. # Returns Output tensor. """ channel_axis = 1 if K.image_data_format() == 'channels_first' else -1 tchannel = K.int_shape(inputs)[channel_axis] * t x = _conv_block(inputs, tchannel, (1, 1), (1, 1)) x = DepthwiseConv2D(kernel, strides=(s, s), depth_multiplier=1, padding='same')(x) x = BatchNormalization(axis=channel_axis)(x) x = Activation(relu6)(x) x = Conv2D(filters, (1, 1), strides=(1, 1), padding='same')(x) x = BatchNormalization(axis=channel_axis)(x) if r: x = add([x, inputs]) return xdef _inverted_residual_block(inputs, filters, kernel, t, strides, n): """Inverted Residual Block This function defines a sequence of 1 or more identical layers. # Arguments inputs: Tensor, input tensor of conv layer. filters: Integer, the dimensionality of the output space. kernel: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window. t: Integer, expansion factor. t is always applied to the input size. s: An integer or tuple/list of 2 integers,specifying the strides of the convolution along the width and height.Can be a single integer to specify the same value for all spatial dimensions. n: Integer, layer repeat times. # Returns Output tensor. """ x = _bottleneck(inputs, filters, kernel, t, strides) for i in range(1, n): x = _bottleneck(x, filters, kernel, t, 1, True) return xdef MobileNetv2(input_shape, k): """MobileNetv2 This function defines a MobileNetv2 architectures. # Arguments input_shape: An integer or tuple/list of 3 integers, shape of input tensor. k: Integer, layer repeat times. # Returns MobileNetv2 model. """ inputs = Input(shape=input_shape) x = _conv_block(inputs, 32, (3, 3), strides=(2, 2)) x = _inverted_residual_block(x, 16, (3, 3), t=1, strides=1, n=1) x = _inverted_residual_block(x, 24, (3, 3), t=6, strides=2, n=2) x = _inverted_residual_block(x, 32, (3, 3), t=6, strides=2, n=3) x = _inverted_residual_block(x, 64, (3, 3), t=6, strides=2, n=4) x = _inverted_residual_block(x, 96, (3, 3), t=6, strides=1, n=3) x = _inverted_residual_block(x, 160, (3, 3), t=6, strides=2, n=3) x = _inverted_residual_block(x, 320, (3, 3), t=6, strides=1, n=1) x = _conv_block(x, 1280, (1, 1), strides=(1, 1)) x = GlobalAveragePooling2D()(x) x = Reshape((1, 1, 1280))(x) x = Dropout(0.3, name='Dropout')(x) x = Conv2D(k, (1, 1), padding='same')(x) x = Activation('softmax', name='softmax')(x) output = Reshape((k,))(x) model = Model(inputs, output) plot_model(model, to_file='images/MobileNetv2.png', show_shapes=True) return modelif __name__ == '__main__': MobileNetv2((224, 224, 3), 1000)
论文中推荐的输入大小为 224 * 224,因此训练集最好使用同样的大小. data\
| - data/ | - train/ | - class 0/ | - image.jpg .... | - class 1/ .... | - class n/ | - validation/ | - class 0/ | - class 1/ .... | - class n/
python --classes num_classes --batch batch_size --epochs epochs --size image_size
训练好的 .h5
python --classes num_classes --batch batch_size --epochs epochs --size image_size --weights weights_path --tclasses pre_classes
--classes, 当前训练集的类别数。
--size, 图像大小。
--batch, batch size。
--epochs, epochs。
--weights, 需要fine tune的模型。
--tclasses, 训练好的模型中输出的类别数。
device: Tesla K80 dataset: cifar-100 optimizer: Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) batch_szie: 128
Metrics | Loss | Top-1 Accuracy | Top-5 Accuracy |
cifar-100 | 0.195 | 94.42% | 99.82% |