深度学习模型之CNN(十二)使用pytorch搭建ResNeXt并基于迁移学习训练

ResNet-50与ResNeXt-50(32x4d)

ResNet-50与ResNeXt-50(32x4d)网络结构参数

ResNet网络结构中的较深层结构(50层及以上结构)所采用的是上图(最左侧)的block结构,在ResNeXt网络结构中所对应采用的是上图(中间)的block结构。

区别在于结构显示中的第二层的3x3卷积层,对于普通的block结构(如最左侧),采用普通的3x3进行卷积,而对于ResNeXt的block(如中间),第二层是group conv。

ResNet-50和ResNeXt结构

相同点:

  • 整体框架一致。首先经过一个7x7的卷积层将输入的特征矩阵深度从3变为64,高宽不变;之后经过3x3的最大池化下采样层,深度不变,高宽从112变为56;之后重复堆叠block,且堆叠次数一致,图中都是 [ 3,4,6,3 ];之后是平均池化、全连接层,最后是softmax概率输出。
  • 在每一个网络结构相对应的block中,输出的特征矩阵的深度是一致的。

不同点:

以conv2为例,在ResNet网络结构中第一层采用1x1卷积层的个数是64,但在对应ResNeXt网络结构中的个数是普通block结构中采用卷积核个数的2倍,在下一层3x3的卷积层中,ResNet采用64个卷积核,而ResNeXt中分成了32group,每个group采用4个卷积核,所以ResNeXt第二层中采用128个卷积核。

因此,在每一层block结构中,ResNeXt的第1、2层的卷积核个数的都是ResNet中对应block层数的2倍。

工程目录

1
2
3
4
5
6
7
8
9
10
11
├── Test5_resnext
├── model.py(模型文件)
├── train.py(调用模型训练,自动生成class_indices.json,resNext.pth)
├── predict.py(调用模型进行预测)
├── tulip.jpg(用来根据前期的训练结果来predict图片类型)
├── resnext50_32x4d.pth(用于迁移学习时,提前下载好官方的resNet权重脚本)
├── batch_predict.py(批量预测图片分类)
└── data
└── imgs(批量数据图片)
└── data_set
└── data数据集

model.py

修改Bottleneck类

在初始化函数中参数传递加入groups和width_per_group

  • groups:分组数(例如上图中的32)
  • width_per_group:(例如上图conv2中的4:指每个group中卷积核的个数)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def __init__(self, in_channel, out_channel, stride=1, downsample=None,
groups=1, width_per_group=64):
super(Bottleneck, self).__init__()

width = int(out_channel * (width_per_group / 64.)) * groups

self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
kernel_size=1, stride=1, bias=False)
self.bn1 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
kernel_size=3, stride=stride, bias=False, padding=1)
self.bn2 = nn.BatchNorm2d(width)
# -----------------------------------------
self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
kernel_size=1, stride=1, bias=False)
self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample

其中:当group和width_per_group采用默认值时,width输出值为out_channel。当采用ResNeXt结构时,以conv2为例,groups= 32,width_per_group = 128,则width = ( 4 * ( 128 / 64 ))* 32 = 256

因此本句代码意义为在ResNeXt结构中,输出特征矩阵的channel是输入特征矩阵channel的2倍,因此可以通过本条语句,得出ResNet和ResNeXt网络结构在block中第1,2层卷积层所采用的卷积核的个数。

1
width = int(out_channel * (width_per_group / 64.)) * groups

注意:在conv3中的out_channels=out_channel*self.expansion,以conv2为例,上一层out_channels = 64,因此在本语句中总体依旧是输出特征矩阵是上一层(block上一层)的4倍。

1
2
self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
kernel_size=1, stride=1, bias=False)

实例化Bottleneck

ResNet网络结构

1
2
3
4
5
6
7
8
9
10
11
12
def resnet50(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnet50-19c8e357.pth
return ResNet(Bottleneck, [3, 4, 6, 3],
num_classes=num_classes,
include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
return ResNet(Bottleneck, [3, 4, 23, 3],
num_classes=num_classes,
include_top=include_top)

ResNeXt网络结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def resnext50_32x4d(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
groups = 32
width_per_group = 4
return ResNet(Bottleneck, [3, 4, 6, 3],
num_classes=num_classes,
include_top=include_top,
groups=groups,
width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
# https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
groups = 32
width_per_group = 8
return ResNet(Bottleneck, [3, 4, 23, 3],
num_classes=num_classes,
include_top=include_top,
groups=groups,
width_per_group=width_per_group)

修改ResNet类

初始化函数及_make_layer函数的参数传递加入groups和width_per_group

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class ResNet(nn.Module):

def __init__(self,
block,
blocks_num,
num_classes=1000,
include_top=True,
groups=1,
width_per_group=64):
super(ResNet, self).__init__()
self.include_top = include_top
self.in_channel = 64

self.groups = groups
self.width_per_group = width_per_group

self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channel)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, blocks_num[0])
self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
if self.include_top:
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)
self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

def _make_layer(self, block, channel, block_num, stride=1):
downsample = None
if stride != 1 or self.in_channel != channel * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(channel * block.expansion))

layers = []
layers.append(block(self.in_channel,
channel,
downsample=downsample,
stride=stride,
groups=self.groups,
width_per_group=self.width_per_group))
self.in_channel = channel * block.expansion

for _ in range(1, block_num):
layers.append(block(self.in_channel,
channel,
groups=self.groups,
width_per_group=self.width_per_group))

return nn.Sequential(*layers)

train.py

修改调用model.py函数

1
2
from model import resnext50_32x4d
net = resnext50_32x4d()

修改调用迁移学习权重路径

1
model_weight_path = "./resnext50_32x4d.pth"

另:因为机器撑不住,所以我这把batch_size改为4

1
2
3
4
5
6
7
8
9
10
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=4, shuffle=True,
num_workers=nw)

validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = torch.utils.data.DataLoader(validate_dataset,
batch_size=4, shuffle=False,
num_workers=nw)

训练结果

train训练结果

predict.py

修改调用model.py函数

1
2
from model import resnext50_32x4d
model = resnext50_32x4d(num_classes=5).to(device)

修改使用权重路径(train.py中通过迁移学习产生的权重pth文件)

1
weights_path = "./resNext50.pth"

预测结果

预测结果

批量预测batch_predict.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import os
import json

import torch
from PIL import Image
from torchvision import transforms

from model import resnext50_32x4d


def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

data_transform = transforms.Compose(
[transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

# load image
# 指向需要遍历预测的图像文件夹
imgs_root = "data/imgs"
assert os.path.exists(imgs_root), f"file: '{imgs_root}' dose not exist."
# 读取指定文件夹下所有jpg图像路径
img_path_list = [os.path.join(imgs_root, i) for i in os.listdir(imgs_root) if i.endswith(".jpg")]

# read class_indict
json_path = './class_indices.json'
assert os.path.exists(json_path), f"file: '{json_path}' dose not exist."

json_file = open(json_path, "r")
class_indict = json.load(json_file)

# create model
model = resnext50_32x4d(num_classes=5).to(device)

# load model weights
weights_path = "./resnext50.pth"
assert os.path.exists(weights_path), f"file: '{weights_path}' dose not exist."
model.load_state_dict(torch.load(weights_path, map_location=device))

# prediction
model.eval()
batch_size = 8 # 每次预测时将多少张图片打包成一个batch
with torch.no_grad():
for ids in range(0, len(img_path_list) // batch_size):
img_list = []
for img_path in img_path_list[ids * batch_size: (ids + 1) * batch_size]:
assert os.path.exists(img_path), f"file: '{img_path}' dose not exist."
img = Image.open(img_path)
img = data_transform(img)
img_list.append(img)

# batch img
# 将img_list列表中的所有图像打包成一个batch
batch_img = torch.stack(img_list, dim=0)
# predict class
output = model(batch_img.to(device)).cpu()
predict = torch.softmax(output, dim=1)
probs, classes = torch.max(predict, dim=1)

for idx, (pro, cla) in enumerate(zip(probs, classes)):
print("image: {} class: {} prob: {:.3}".format(img_path_list[ids * batch_size + idx],
class_indict[str(cla.numpy())],
pro.numpy()))


if __name__ == '__main__':
main()

批量预测结果

批量预测结果