[论文阅读-14] DFNet : Discriminative feature extraction and integration network for salient object detection

会议：Engineering Applications of Artificial Intelligence -2020 （SCI）

简介：

尽管卷积神经网络拥有很强提取特征的能力，但在显著性目标检测领域仍存在着挑战。1）显著性物体的尺寸不同，运用单一尺寸的卷积不能很好的捕捉到正确的大小，同时不同尺寸卷积在不考虑它们权重的情况下可能会影响最终的结果（Since salient objects appear in various sizes, using single-scale convolution would not capture the right size. Moreover, using multiscale convolutions without considering their importance may confuse the model）。2）相同地处理所有特征会有信息冗余（treating all features equally results in information redundancy）。

存在的问题

通过深度学习模型来提取不同分辨率的特征，尽管效果好，但是由于全连接层的使用，使得模型效率低下。（Due to the use of dense layers, these methods were not very efficient.）
通常的方法是将低层的特征与高层特征直接结合，无差别的处理特征。
交叉熵损失函数的使用会导致预测模糊（using this loss function leads to blurry predictions）

贡献

本文提出了多尺寸注意力引导模块，用来让模型有捕捉正确显著性物体尺寸的能力。（capture the right size for the salient object）
本文提出了基于注意力的多层集成模块，用来给不同层级的特征分配权重。（assign different weights to multi-level feature maps）
本文设计了损失函数用来引导网络输出的显著性图更准确。（design a loss function which guides our network to output saliency maps with higher certainty.）

网络结构

特征提取网络

模型由backbone和多尺寸注意力引导模块组成。

Backbone：

VGG-16，VGG-16将最后一个max pooling层移除用来保留更好的空间表示性，特征的来源是VGG-16最后的三层，conv3-3，conv4-3，conv5-3。

针对ResNet50 / NASNet 会做不同的处理。

大尺寸的卷积核适合捕捉大物体，小尺寸的卷积核适合捕捉小物体（It is evident that large kernels are suitable to capture the large objects, and small kernels are appropriate to capture the small ones.）本文采用了多种尺寸大小的卷积核来捕捉特征（Inception 的思路）。

特征融合模块

在低层的时候，网络可以捕捉局部信息（capture such local structures as textures and edges）但是很难识别整体的依赖（fails to recognize global dependencies）；在深层时，模型可以捕捉语义信息和整体背景信息，但是信息粗糙并且缺少局部的连贯性。

通过CA来分配权重，同时考虑到了语义信息以及空间细节；在经过CA块之后，用一个3*3大小的卷积层来优化特征。

损失函数

通过F值与MAE损失函数的结合来计算损失。

论文阅读-14：DFNet:显著目标检测的判别特征提取与集成网络