[论文阅读-11] F3Net : Fusion, Feedback, Focus for Salient Object Detection

会议：AAAI-2020

摘要：

大多数现存得显著性物体检测方法都是通过聚合多层特征图来得到更好的效果的（Most of existing salient object detection models have achieved great progress by aggregating multi-level features extracted from convolutional neural network.），由于不同卷积层得感受野大小不同，不同层之间生成得特征不同，常见得特征融合方法忽略了这种不同，可能会导致次优化结果（Common feature fusion strategies ignore these differences and may cause suboptimal solutions）。本文提出了F3Net用以解决上述提出得问题，其主要是由CFM（交叉特征模块）、CFD（级联反馈解码器）和PPA（像素位置感知损失）组成。

简介：

现存的两大显著性目标检测的挑战：

不同层次的特征具有不同的特征分布（Feature of different levels have different distribution characteristics）
- 深层特征具有丰富的语义，但是缺少精确的位置信息（rich semantics but lack accurate location infomation）
- 浅层特征具有丰富的细节，但是充满了背景噪音（rich details but full of background noises）
没有合理的控制输入模型的信息流，冗余信息会对最终结果有所影响（Without delicate control of the information flow in the model, some redundant features will pass in and possibly result in performance degradation.）
现存的模型使用BCE，它们对于每一个像素是做相同的处理。但是从直觉上来说（Intuitively），每一个像素应该是有不同的贡献的（… pixel at the boundary are more discriminative and should be attached with more importance …）。
- 目前有提出多种边缘损失函数，但很多靠近边界的像素也是容易预测错的（lots of pixels near the boundaries prone to wrong predictions）

贡献：

本文提出了用来融合不同层级特征的交叉特征模块，它可以提取特征之间的共享部分，消除不同特征之间的背景噪音，同时可以弥补不同特征之间的缺失部分（extract the shared parts between features and suppress each other’s background noises and complement each other’s missing parts.）
本文提出了层联反馈解码器，它可以同时将高分辨率和富有多语义信息的特征反馈以此来修正和优化预测的显著性目标结果（feedback features of both high resolutions and high semantics to previous ones to correct and refine them for better saliency maps generation.）
本文设计了像素位置感知损失函数，用来给不同位置分配不同的权重，它可以更好的挖掘包含在特征中的结构信息，同时也可以让网络更关注细节区域（It can better mine the structure infomation contained in the features and help the network focus more on detail regions）