admin 管理员组

文章数量: 1087663

【语义分割】语义分割经典模块

本文将介绍深度学习语义分割任务中几个经典模块，主要包括：ASPP、PP、Encoding、JPU、DCM，Criss-Cross Attention几个模块，同时给出了各个模块的实现代码。

一、ASPP（Atrous spatial pyramid pooling）

二、PP（Pyramid Pooling Module）

三、DCM（Dynamic Convolutional Module）

四、JPU（Joint Pyramid Upsampling）

五、Encoding（Context Encoding Module）

六、Criss-Cross Attention Module

声明

一、ASPP（Atrous spatial pyramid pooling）

ASPP模块最初是在DeepLabV2中提出的，该模块由4个并行的空洞卷积模块组成（卷积率不同），以获得较大的感受野，提取更多的上下文信息。初始结构如下图：

在DeepLabV3中，对该模块进行了改进，修改了空洞比率，同时增加了全局池化层来提取全局信息，并且在ASPP模块中引入了BatchNormalization。结构如下：

在DeepLabV3+中，提出将ASPP的卷积替换成Depth-wise卷积来减少参数数量，加快计算速度，结构如下图（和上图一样的）：

ASPP代码（PaddleSeg ASPP）：

class ASPPModule(nn.Layer):def __init__(self,aspp_ratios,in_channels,out_channels,align_corners,use_sep_conv=False,image_pooling=False,data_format='NCHW'):super().__init__()self.align_corners = align_cornersself.data_format = data_formatself.aspp_blocks = nn.LayerList()for ratio in aspp_ratios:if use_sep_conv and ratio > 1:conv_func = layers.SeparableConvBNReLUelse:conv_func = layers.ConvBNReLUblock = conv_func(in_channels=in_channels,out_channels=out_channels,kernel_size=1 if ratio == 1 else 3,dilation=ratio,padding=0 if ratio == 1 else ratio,data_format=data_format)self.aspp_blocks.append(block)out_size = len(self.aspp_blocks)if image_pooling:self.global_avg_pool = nn.Sequential(nn.AdaptiveAvgPool2D(output_size=(1, 1), data_format=data_format),layers.ConvBNReLU(in_channels,out_channels,kernel_size=1,bias_attr=False,data_format=data_format))out_size += 1self.image_pooling = image_poolingself.conv_bn_relu = layers.ConvBNReLU(in_channels=out_channels * out_size,out_channels=out_channels,kernel_size=1,data_format=data_format)self.dropout = nn.Dropout(p=0.1)  # drop ratedef forward(self, x):outputs = []if self.data_format == 'NCHW':interpolate_shape = paddle.shape(x)[2:]axis = 1else:interpolate_shape = paddle.shape(x)[1:3]axis = -1for block in self.aspp_blocks:y = block(x)outputs.append(y)if self.image_pooling:img_avg = self.global_avg_pool(x)img_avg = F.interpolate(img_avg,interpolate_shape,mode='bilinear',align_corners=self.align_corners,data_format=self.data_format)outputs.append(img_avg)x = paddle.concat(outputs, axis=axis)x = self.conv_bn_relu(x)x = self.dropout(x)return x

二、PP（Pyramid Pooling Module）

PP模块是在论文PSPNet中首次提出的，该模块由4个并行的自适应池化通道组成，不同大小的自适应池化层可以获得不同程度的上下文信息，从而提升网络效果，模型结构如下：

首先特征图经过不同的自适应池化层得到不同分辨率的特征图，然后使用卷积层压缩通道数，接着将特征图上采样至输入特征图同样大小后，将输入特征图和4个特征图concat送入后续模块。

PP Module代码（PaddleSeg PP）：

class PPModule(nn.Layer):def __init__(self, in_channels, out_channels, bin_sizes, dim_reduction,align_corners):super().__init__()self.bin_sizes = bin_sizesinter_channels = in_channelsif dim_reduction:inter_channels = in_channels // len(bin_sizes)# we use dimension reduction after pooling mentioned in original implementation.self.stages = nn.LayerList([self._make_stage(in_channels, inter_channels, size)for size in bin_sizes])self.conv_bn_relu2 = layers.ConvBNReLU(in_channels=in_channels + inter_channels * len(bin_sizes),out_channels=out_channels,kernel_size=3,padding=1)self.align_corners = align_cornersdef _make_stage(self, in_channels, out_channels, size):prior = nn.AdaptiveAvgPool2D(output_size=(size, size))conv = layers.ConvBNReLU(in_channels=in_channels, out_channels=out_channels, kernel_size=1)return nn.Sequential(prior, conv)def forward(self, input):cat_layers = []for stage in self.stages:x = stage(input)x = F.interpolate(x,paddle.shape(input)[2:],mode='bilinear',align_corners=self.align_corners)cat_layers.append(x)cat_layers = [input] + cat_layers[::-1]cat = paddle.concat(cat_layers, axis=1)out = self.conv_bn_relu2(cat)return out

三、DCM（Dynamic Convolutional Module）

DCM模块是在论文DMNet（Dynamic Multi-scale Filters for Semantic Segmentation）中提出的，ASPP和PP模块获取多尺度信息是通过空洞卷积和自适应池化得到的，他们的参数在推理的时候是固定的，没法根据不同的图像来自适应调整参数，DCM能够根据不同的图像自适应调整参数。

下图为DMNet的网络结构，多个DCM模块并行提取不同尺度的语义信息。

如下图，DCM模块有2个分支，上面的分支负责压缩特征通道（减少计算量），下面的分支通过自适应池化得到Depth-wise卷积的权重，然后2个分支的输出做Depthwise卷积。由于权重是动态生成的，对不同输入的图片适应性更强，能自适应改变自身的参数。

网络结构

DCM代码（自己实现的DMNet，DCM）：

class DCM(nn.Layer):def __init__(self, filter_size, fusion, in_channels, channels):super().__init__()self.filter_size = filter_sizeself.fusion = fusionself.channels = channelsself.filter_gen_conv = nn.Conv2D(in_channels, channels, 1)self.input_redu_conv = layers.ConvBNReLU(in_channels, channels, 1)self.norm = layers.SyncBatchNorm(channels)self.act = nn.ReLU()if self.fusion:self.fusion_conv = layers.ConvBNReLU(channels, channels, 1)def forward(self, x):generated_filter = self.filter_gen_conv(F.adaptive_avg_pool2d(x, self.filter_size))x = self.input_redu_conv(x)b, c, h, w = x.shapex = x.reshape([1, b * c, h, w])generated_filter = generated_filter.reshape([b * c, 1, self.filter_size, self.filter_size])pad = (self.filter_size - 1) // 2if (self.filter_size - 1) % 2 == 0:pad = (pad, pad, pad, pad)else:pad = (pad + 1, pad, pad + 1, pad)x = F.pad(x, pad, mode='constant', value=0) # [1, b * c, h, w]output = F.conv2d(x, weight=generated_filter, groups=b * c)output = output.reshape([b, self.channels, h, w])output = self.norm(output)output = self.act(output)if self.fusion:output = self.fusion_conv(output)return output

四、JPU（Joint Pyramid Upsampling）

JPU是在论文FastFCN中首次提出。DeepLab中移除backbone的下采样层（原来的output_stride=32，deeplab的output_stride=8/16），保留高分辨率的输出特征图，同时加入了空洞卷积模块来增加模型的感受野，但是特征图的变大会导致计算量的增加。有了JPU模块，backbone正常输出下采样的特征图，由JPU输出高分辨率特征图，并且JPU也利用空洞卷积来增加感受野。

网络结构

FastFCN模型结构如下图所示，在backbone后接JPU模块，JPU输出高分辨率特征图送入后续网络：

JPU（Joint Pyramid Upsampling）

不同分辨率的特征图经过卷积层压缩通道，上采样至相同分辨率后concat，接着将特征图送入4通道并行空洞率不同的的Separable Conv（Depth-wise卷积 + Point-wise卷积）层，不同通道的特征图concat后经过卷积层送入后续网络。

JPU代码（来自：）：

class SeparableConv2d(nn.Module):def __init__(self, inplanes, planes, kernel_size=3, stride=1, padding=1, dilation=1, bias=False, norm_layer=nn.BatchNorm2d):super(SeparableConv2d, self).__init__()self.conv1 = nn.Conv2d(inplanes, inplanes, kernel_size, stride, padding, dilation, groups=inplanes, bias=bias)self.bn = norm_layer(inplanes)self.pointwise = nn.Conv2d(inplanes, planes, 1, 1, 0, 1, 1, bias=bias)def forward(self, x):x = self.conv1(x)x = self.bn(x)x = self.pointwise(x)return xclass JPU(nn.Module):def __init__(self, in_channels, width=512, norm_layer=None, up_kwargs=None):super(JPU, self).__init__()self.up_kwargs = up_kwargsself.conv5 = nn.Sequential(nn.Conv2d(in_channels[-1], width, 3, padding=1, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.conv4 = nn.Sequential(nn.Conv2d(in_channels[-2], width, 3, padding=1, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.conv3 = nn.Sequential(nn.Conv2d(in_channels[-3], width, 3, padding=1, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.dilation1 = nn.Sequential(SeparableConv2d(3*width, width, kernel_size=3, padding=1, dilation=1, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.dilation2 = nn.Sequential(SeparableConv2d(3*width, width, kernel_size=3, padding=2, dilation=2, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.dilation3 = nn.Sequential(SeparableConv2d(3*width, width, kernel_size=3, padding=4, dilation=4, bias=False),norm_layer(width),nn.ReLU(inplace=True))self.dilation4 = nn.Sequential(SeparableConv2d(3*width, width, kernel_size=3, padding=8, dilation=8, bias=False),norm_layer(width),nn.ReLU(inplace=True))def forward(self, *inputs):feats = [self.conv5(inputs[-1]), self.conv4(inputs[-2]), self.conv3(inputs[-3])]_, _, h, w = feats[-1].size()feats[-2] = F.interpolate(feats[-2], (h, w), **self.up_kwargs)feats[-3] = F.interpolate(feats[-3], (h, w), **self.up_kwargs)feat = torch.cat(feats, dim=1)feat = torch.cat([self.dilation1(feat), self.dilation2(feat), self.dilation3(feat), self.dilation4(feat)], dim=1)return inputs[0], inputs[1], inputs[2], feat

五、Encoding（Context Encoding Module）

Encoding模块最早由ENCNet提出，Context Encoding能够有选择地突出不同类别之间相互依赖的特征。

输入特征图维度为[N, C, H, W]，，引入可学习参数codebook:(是codeword，论文中又叫做visual centers)，引入visual centers对应的可学习参数平滑因子：，，计算公式如下：

codebook中的K个visual centers与所有训练数据的特征图进行过计算，可以学习到全局的语义信息。

ENCNet的网络结构如下图：

Context Encoding Module的输入是[N, C, H, W]的特征图，输出有2个，一个是[N, C, 1, 1]的注意力权重，一个是[N, num_classes]的矩阵，用于计算SELoss。

Context Encoding Module代码（来自Encoding）：

class Encoding(nn.Layer):def __init__(self, channels, num_codes):super().__init__()self.channels, self.num_codes = channels, num_codesstd = 1 / ((channels * num_codes) ** 0.5)self.codewords = self.create_parameter(shape=(num_codes, channels),default_initializer=nn.initializer.Uniform(-std, std),)  # codebook，visual centers合集self.scale = self.create_parameter(shape=(num_codes,),default_initializer=nn.initializer.Uniform(-1, 0),)  # codewords对应的平滑因子self.channels = channelsdef scaled_l2(self, x, codewords, scale):# 对应公式中分子分母括号内部分num_codes, channels = paddle.shape(codewords)reshaped_scale = scale.reshape([1, 1, num_codes])expanded_x = paddle.tile(x.unsqueeze(2), [1, 1, num_codes, 1])reshaped_codewords = codewords.reshape([1, 1, num_codes, channels])scaled_l2_norm = paddle.multiply(reshaped_scale, (expanded_x - reshaped_codewords).pow(2).sum(axis=3)) # N, H*W, num_codesreturn scaled_l2_normdef aggregate(self, assignment_weights, x, codewords):num_codes, channels = paddle.shape(codewords)reshaped_codewords = codewords.reshape([1, 1, num_codes, channels])expanded_x = paddle.tile(x.unsqueeze(2), [1, 1, num_codes, 1])encoded_feat = paddle.multiply(assignment_weights.unsqueeze(3), (expanded_x - reshaped_codewords)).sum(axis=1) # N, num_codes, Cencoded_feat = paddle.reshape(encoded_feat, [-1, self.num_codes, self.channels])return encoded_featdef forward(self, x):x_dims = x.ndimassert x_dims == 4, "The dimension of input tensor must equal 4, but got {}.".format(x_dims)assert paddle.shape(x)[1] == self.channels, "Encoding channels error, excepted {} but got {}.".format(self.channels, paddle.shape(x)[1])batch_size = paddle.shape(x)[0]x = x.reshape([batch_size, self.channels, -1]).transpose([0, 2, 1]) # N, H*W, Cassignment_weights = F.softmax(self.scaled_l2(x, self.codewords, self.scale), axis=2) # N, H*W, num_codesencoded_feat = self.aggregate(assignment_weights, x, self.codewords) # N, num_codes, Creturn encoded_featclass EncModule(nn.Layer):def __init__(self, in_channels, num_codes):super().__init__()self.encoding_project = layers.ConvBNReLU(in_channels,in_channels,1,)self.encoding = nn.Sequential(Encoding(channels=in_channels, num_codes=num_codes),nn.BatchNorm1D(num_codes),nn.ReLU(),)self.fc = nn.Sequential(nn.Linear(in_channels, in_channels),nn.Sigmoid(),)self.in_channels = in_channelsself.num_codes = num_codesdef forward(self, x):encoding_projection = self.encoding_project(x)encoding_feat = self.encoding(encoding_projection) # N, num_codes, Cencoding_feat = encoding_feat.mean(axis=1) # N, Cbatch_size, channels, _, _ = paddle.shape(x)gamma = self.fc(encoding_feat)y = gamma.reshape([batch_size, self.in_channels, 1, 1])output = F.relu(x + x * y)return encoding_feat, output

六、Criss-Cross Attention Module

Criss-Cross Attention Module是CCNet提出的一种可以建立全局联系的模块，其原理如下图：

根据输入得到QKV，利用矩阵乘法使每个像素得到其纵向和横向所有像素的语义信息，只需要2个Criss-cross attention模块，每个像素就可以得到全局语义信息。

该模块详解可参考：【论文笔记】CCNet阅读笔记

模块代码：

def INF(B,H,W):return -torch.diag(torch.tensor(float("inf")).cuda().repeat(H),0).unsqueeze(0).repeat(B*W,1,1)class CrissCrossAttention(nn.Module):""" Criss-Cross Attention Module"""def __init__(self, in_dim):super(CrissCrossAttention,self).__init__()self.query_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)self.key_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)self.value_conv = nn.Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)self.softmax = Softmax(dim=3)self.INF = INFself.gamma = nn.Parameter(torch.zeros(1))def forward(self, x):m_batchsize, _, height, width = x.size()proj_query = self.query_conv(x)proj_query_H = proj_query.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height).permute(0, 2, 1)proj_query_W = proj_query.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width).permute(0, 2, 1)proj_key = self.key_conv(x)proj_key_H = proj_key.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height)proj_key_W = proj_key.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width)proj_value = self.value_conv(x)proj_value_H = proj_value.permute(0,3,1,2).contiguous().view(m_batchsize*width,-1,height)proj_value_W = proj_value.permute(0,2,1,3).contiguous().view(m_batchsize*height,-1,width)energy_H = (torch.bmm(proj_query_H, proj_key_H)+self.INF(m_batchsize, height, width)).view(m_batchsize,width,height,height).permute(0,2,1,3)energy_W = torch.bmm(proj_query_W, proj_key_W).view(m_batchsize,height,width,width)concate = self.softmax(torch.cat([energy_H, energy_W], 3))att_H = concate[:,:,:,0:height].permute(0,2,1,3).contiguous().view(m_batchsize*width,height,height)#print(concate)#print(att_H) att_W = concate[:,:,:,height:height+width].contiguous().view(m_batchsize*height,width,width)out_H = torch.bmm(proj_value_H, att_H.permute(0, 2, 1)).view(m_batchsize,width,-1,height).permute(0,2,3,1)out_W = torch.bmm(proj_value_W, att_W.permute(0, 2, 1)).view(m_batchsize,height,-1,width).permute(0,2,1,3)#print(out_H.size(),out_W.size())return self.gamma*(out_H + out_W) + x

声明

禁止转载。

本文标签：语义分割语义分割经典模块

版权声明：本文标题：【语义分割】语义分割经典模块内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.roclinux.cn/b/1687060199a59577.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

【语义分割】语义分割经典模块

【语义分割】语义分割经典模块

一、ASPP（Atrous spatial pyramid pooling）

二、PP（Pyramid Pooling Module）

三、DCM（Dynamic Convolutional Module）

四、JPU（Joint Pyramid Upsampling）

五、Encoding（Context Encoding Module）

六、Criss-Cross Attention Module

声明

更多相关文章

【语义分割】语义分割经典模块

发表评论

推荐文章

javascript - Is there a way to play audio and video separate from each other in the same video - Stack Overflow

javascript - Return Confirm Call function if yes - Stack Overflow

javascript - Keeping session alive after page refresh with server side rendering using react.js - Stack Overflow

javascript - Luxon does not format (DD&#39;MM yyyy) format - Stack Overflow

win10系统通过注册表删除域账户

热门文章

deepseek接入office软件操作

java - How can I save connections(ISOSource) to the JPOS? - Stack Overflow

javascript - How to call functions successively on button @click [VUE JS] - Stack Overflow

javascript - How to auto focus browser tab when clicking on browser notification in Chrome? - Stack Overflow

javascript - Working with a message having repeated field - Stack Overflow

python - Why does pathlib.Path.glob function in Python3.13 return map object instead of a generator? - Stack Overflow

python - Audio delay after resuming FFmpeg on Windows - Stack Overflow

javascript - JQVMap - How to show data values onregionclick - Stack Overflow

spring boot - Getting 500 Error When Trying to Submit a Thymeleaf Form - Stack Overflow

Windows Server 2022 2025年4月版 官方IOS下载

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

【免费下载】 windowsXP操作系统iso安装包

【免费下载】 重温经典：Windows XP Professional SP3 ISO镜像下载推荐

【免费下载】 HUAWEI华为荣耀猎人游戏本V700 i7独显2060(FRD-WFD9)原装出厂Windows10系统恢复指南

【免费下载】 Redis Windows 版本下载

【免费下载】 Corel VideoStudio 会声会影2024中文旗舰版

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

javascript - Luxon does not format (DD'MM yyyy) format - Stack Overflow

Windows Server 2022 2025年4月版官方IOS下载

【免费下载】重温经典：Windows XP Professional SP3 ISO镜像下载推荐