(1)本人挑战手写代码验证理论,获得一些AI工具无法提供的收获和思考,对于一些我无法回答的疑问请大家在评论区指教;

(2)本系列文章有很多细节需要弄清楚,但是考虑到读者的吸收情况和文章篇幅限制,选择重点进行分享,如果有没说清楚或者解释错误的地方欢迎在评论区提出;

(3)写的时候是用英文撰写的,这里就不翻译成中文了,希望大家理解;

(4)本系列内容基于李沐老师《动手学深度学习》教材,网址:

《动手学深度学习》 — 动手学深度学习 2.0.0 documentation

(5)由于代码量较大,以免费资源的形式上传到个人空间,方便读者运行和使用。

注:AlexNet提供了Pytorch和MxNet两种实现方式,LeNet只提供基于MxNet框架的实现。

原论文作者训练后的模型参数也可以通过深度学习框架直接下载获取,不过本实验意在探究CNN的理论基础和实现思路,因此从零开始训练不同版本的"LeNet"和"AlexNet"。

同时提出了不同的代码实现方案和分析思路,对于训练成本较大的模型,建议使用Google Colaboratory提供的免费算力平台,本质是配置了Pytorch和Tensorflow等深度学习框架的基于Ubuntu系统的服务器。

本篇主要分析:

【1】CNN卷积神经网络中卷积层、池化层、批规范化层、激活层、“暂退层”的作用原理;

下篇文章主要分析:

【2】单CPU核训练背景下的时间花费组成和实验验证,以及函数接口的加速效果;

【3】学习率、优化方法、批量大小、激活函数等超参数(Hyperparameters)的调参方法;

【4】卷积神经网络(LeNet,1998)和深度卷积神经网络(AlexNet,2012)在MNIST,Fashion_MNIST,CIFAR100数据集上的表现与一种可能可行的参数量自适应调整方法;

【5】CNN激活层特征可视化,直观比对人工设计卷积核的滤波效果,理解CNN的信息提取过程;

【6】混淆矩阵的作用分析,绘制自定义混淆矩阵。

从零开始搭建深度学习大厦系列-3.卷积神经网络基础(5-9)-CSDN博客https://blog.csdn.net/2302_80464577/article/details/149260898?sharetype=blogdetail&sharerId=149260898&sharerefer=PC&sharesource=2302_80464577&spm=1011.2480.3001.8118

A Quick Look

LeNet(Based on mxnet; Textbook:2019+GPU,Max-pooling; Mine: Max-pooling+2 PREFETCHING PROCESS / 5  Prefetching processes)

2 PREFETCHING PROCESSES

5 Prefetching processes

Alexnet (Based on pytorch; Textbook:Original;Mine: Parameter size nearly 1/256 of original design)

2 Prefetching processes (Batch size=64)

5 Prefetching processes(Batch size=64/32, initial learning rate=0.01/0.03)

Figure 1 result of textbook’s VS mine

Content

Environment Setting. 5

Experiment Goals. 6

1.    Edge Detection. 6

1.1      Basic Principle. 6

1.2      Function Design. 6

1.3      Carrying-out Result 8

2.    Shape of layers and kernels in a CNN.. 14

2.1      Basic Theories. 15

2.2      Code implementation(numpy,mxnet.gluon.nn,mxnet.nd) 17

2.3      Result 18

3.    1x1 Convolution. 20

3.1      Basic Theory. 20

3.2      Code implementation (3 lines) 20

3.3      Result 21

4-5 CNN Architecture Implementation and Evaluation. 21

About Data loaders. 22

About num _workers and prefetching processes. 23

4.    LeNet Implementation (MxNet based) 24

4.1      Basic Theories. 24

4.2      Code Implementation. 25

4.3      Model Evaluation on Fashion-MNIST dataset 26

4.3.1 Pooling: Maximum-pooling VS Average-pooling. 26

4.3.2 Optimization: sgd vs sgd+momentum(nag) 28

4.3.3 Activation Function: ReLU vs sigmoid. 28

4.3.4 Normalization Layer: Batch Normalization VS None. 29

4.3.5 Batch size: 64 vs 128. 31

4.3.6 Textbook Result(Batch Normalization) && Running Snapshot 32

4.4      LeNet Evaluation on MNIST dataset 33

4.5      Evaluating LeNet on CIFAR100. 35

4.5.1 Coarse Classification (20 classes) 36

4.5.2 Fine Classification (100 classes) 38

4.5.3 Running Snapshot 39

5.    AlexNet Architecture. 39

5.1      Code Implementation. 39

5.2      Fashion_MNIST Dataset (Mxnet vs Pytorch) 44

5.3      MNIST Dataset (Pytorch only) 45

5.4      CIFAR100(100 classes, fine labels)-Pytorch Only. 46

5.4.1 Learning rate setting. 46

6.    CNN activation layer characteristics visualization. 48

6.1      MNIST Dataset 49

6.2      Fashion_MNIST Dataset 52

7.    Confusion Matrix. 54

7.1 MNIST.. 54

7.2 Fashion_MNIST.. 55

References. 55

Environment Setting

All of the four experiments are carried out on virtual environment based on Python interpreter 3.7.0 and mainly used packages include deep-learning package mxnet1.7.0.post2(PREFETCHING PROCESS version), visualization package matplotlib.pyplot, image processing package opencv-python, array manipulation package numpy.

Experiment Goals

  1. Design appropriate kernels of fixed parameters and detect edges with horizontal, vertical, diagonal orientation separately;
  2. Derive shape transformation formula in the forward propagation process of CNN(Convolutional Neural Network) and verify the result by fundamental coding and calling scripts;
  3. Understand the effect and principle of 1x1 kernels and then explore different implementation versions of 1x1 convolution in 2-dimensional plane such as cross-correlation calculation and matrix multiplication;
  4. Construct LeNet[2] by hand using mxnet.gluon.nn and explore how different settings of hyperparameters impact training result and model performance;
  5. Construct AlexNet[3] by hand using torch.nn and explore how different settings of hyperparameters impact training result and model performance.

1.Edge Detection

1.1 Basic Principle

According to corresponding theories in DIP(Digital Image Processing), one-order difference calculators or kernels such as Prewitt and Sobel kernels with horizontal, vertical and two diagonal design versions can be used to detect edges in gray-scale images.

These kernels can filter out transition between different objects or parts of an object due to rapid change in intensity level of pixels distributed along the edges on both sides.

By the way, adding a comprehensive orientation algorithm to combine all direction information, see ‘combimg’ implementation for details.

1.2 Function Design

This section employs two tool functions to accomplish the goal: get_data(input_dir) for image loading(similar to building dataset) ;  edge_detect(input_dir) for cross-correlation calculation under different settings of kernel shape and layer shape.

Figure 2 Code implementation

1.3 Carrying-out Result

Choosing 6 different scenario photos with obvious edge information posted by professional photographers on websites as a mini-dataset.

Figure 3 Mini-dataset in Mission 1

Only saving the ‘combimgs’. ‘canyon.jpg’ interprets direction attribute of Prewitt kernel vividly.

Figure 4 canyon.jpg

Other examples are as follows, orientation of texture can verify the DIP theory more or less.

Figure 5 galaxy, notice the small dot in the picture with interesting behavior(The combination dot has a black circle within it, others only include shapes like rectangular line)

Figure 6 Bungalow laying in the embrace of lake and mountains

Figure 7 grassland and night sky in an estate

Figure 8 Clouds

2. Shape of layers and kernels in a CNN

2.1 Basic Theories

Figure 9 Kernel and Layer in a CNN

Unlike hidden neurons(intermediate outputs) and lines(weights) fully connecting them in a multiple-layer perceptron(MLP), CNN is mainly characterized by kernels(similar to weights in MLP) and feature maps(similar to nodes in MLP), with activation functions, normalization layers and some other designs together construct the architecture. Kernels can also be understood as component of certain CNN layers.

Kernel is here mainly to reduce overwhelming parameter size and to reuse parameter scientifically according to spatial distribution locality and adjacency principles. Input images are transformed to different feature maps after going through convolution or cross-correlation operations of kernels. Notice that these kernels can have either adjustable(Convolutional kernel) or non-adjustable parameter settings(Pooling kernel).

These feature maps can include any implicit information such as edges of objects and so on. Part I of this experiment demonstrates the effect of human-designed edge-detection kernels. For layers near the top of deeper neural networks, the feature maps within may indicate rather global information(sometimes nothing can be learned may be due to smaller images input and deeper depth, so ResNet was born), taking AlexNet and LeNet as examples.

Figure 10             Characteristics Visualization & Understanding[1]

Figure 11      Primitive CNN architecture proposed(1998,2012)

A common design problem is to estimate the parameter size(storage amount) and training time(PREFETCHING PROCESS/GPU hour measured) of a 2D-CNN architecture. The size of a featured map is fixed as ‘NCHW’ format(or ‘NHWC’), while the size of a kernel is denoted as ‘CoCiKhKw’(or ‘KhKwCiCo’). See Figure 9 for graphic explanation.

Figure 12             Cross-correlation calculation at a 2D-convolutional layer

According to academic design and that within the textbook, NCHW and CoCiKhKw  should satisfy C== Ci. When Co==1, Ci different kernels are used to do convolution(equivalent to cross-correlation operations in manipulation section) separately in correspondence to feature maps. One kernel aims at one feature map with size Nx1xHxW.

The result is obtained by pixel-wise summation on Ci different Nx1xHxW to get a composite Nx1xH2xW2 feature map with richer information. Then repeating similar process Co times to achieve final output: NxCoxH2xW2. Kernel size, paddings and strides and three basic settings for convolution operation which leads to different mapping: H->H2, W->W2.

Pooling layer has kernels with unlearnable parameters, generally divided into max-pooling and average-pooling.

2.2 Code implementation(numpy,mxnet.gluon.nn,mxnet.nd)

Use 2 ways to verify: direct hand-coding and package calling.

The input images are random values generated by numpy and simulate noises, serving to verify shapes of feature maps at the current layer. Kernels vary in Ci channels and are identical in Co channels. Use 5 nested loops to accomplish.

​​​​​​​2.3 Result

These hand-coded kernels can be actually interpreted as smoothing filters with small variance because of k_base setting in the code block, however parameters in nn.Conv2D are initialized randomly without meaning at the start of training. By the way, the network layer isn’t initialized in this section because it is not necessary to do so.

nn.Conv2D can detect in_channels automatically and the layer goes through delayed initialization in this case. Reinitialization or assigning in_channels by hand can avoid delayed initialization.

In the simulation, N=2 and Ci=3, Co=4,H=360,W=480,Kh=Kw=3,ph=pw=1(only for unilateral),sh=sw=1. Result shows that the shape formula is correct.

Figure 13      H2 and W2 should be floored to integer[1]

3. 1x1 Convolution

​​​​​​​3.1 Basic Theory

1x1 convolution is specifically used to compress channels C of feature maps to contract parameters needed. In this case, ph=pw=0,sh=sw=1, Co<Ci.

​​​​​​​3.2 Code implementation (3 lines)

Use NHWC format in 1x1 convolution matrix multiplication implementation.

A much slower implementation of 1x1 convolution is simply adjusting the parameter and calling hand-coded convsize_verify().

​​​​​​​3.3 Result

 

The result indicates that mxnet.gluon.nn implements convolution in the form of matrix multiplication. Similar method can be generalized into implementation of general kernel size convolution:

Given a layer of N feature maps, first divide input feature maps into M(H2xW2) flattened pixel vectors(length=KhxKw);

then do dot product with flattened kernels on two dimensions;

finally, reshape the output feature maps to obtain result.

A more detailed explantion is as follows.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/diannao/90538.shtml
繁体地址,请注明出处:http://hk.pswp.cn/diannao/90538.shtml
英文地址,请注明出处:http://en.pswp.cn/diannao/90538.shtml

如若内容造成侵权/违法违规/事实不符,请联系英文站点网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【iOS设计模式】深入理解MVC架构 - 重构你的第一个App

目录 一、MVC模式概述 二、创建Model层 1. 新建Person模型类 2. 实现Person类 三、重构ViewController 1. 修改ViewController.h 2. 重构ViewController.m 四、MVC组件详解 1. Model&#xff08;Person类&#xff09; 2. View&#xff08;Storyboard中的UI元素&#x…

前端项目集成lint-staged

lint-staged (lint-staged) 这个插件可以只针对进入git暂存区中的代码进行代码格式检查与修复&#xff0c;极大提升效率&#xff0c;避免扫描整个项目文件&#xff0c;代码风格控制 eslint prettier stylelint 看这两篇文章 前端项目vue3项目集成eslint9.x跟prettier 前端项…

李宏毅genai笔记:模型编辑

0 和post training的区别直接用post training的方法是有挑战的&#xff0c;因为通常训练资料只有一笔而且之后不管问什么问题&#xff0c;都有可能只是这个答案了1 模型编辑的评估方案 reliability——同样的问题&#xff0c;需要是目标答案generalization——问题&#xff08;…

Oracle:union all和union区别

UNION ALL和UNION在Oracle中的主要区别体现在处理重复记录、性能及结果排序上&#xff1a;处理重复记录‌UNION‌&#xff1a;自动去除重复记录&#xff0c;确保最终结果唯一。‌UNION ALL‌&#xff1a;保留所有记录&#xff0c;包括完全重复的行。性能表现‌UNION‌&#xff…

[C#/.NET] 内网开发中如何使用 System.Text.Json 实现 JSON 解析(无需 NuGet)

在实际的企业开发环境中&#xff0c;尤其是内网隔离环境&#xff0c;开发人员经常面临无法使用 NuGet 安装外部包的问题。对于基于 .NET Framework 4.8 的应用&#xff0c;JSON 解析是一个常见的需求&#xff0c;但初始项目中往往未包含任何 JSON 处理相关的程序集。这时&#…

JVM(Java 虚拟机)的介绍

JVM原理JVM 核心架构与工作流程1. 类加载机制&#xff08;Class Loading&#xff09;2. 运行时数据区&#xff08;Runtime Data Areas&#xff09;堆&#xff08;Heap&#xff09;方法区&#xff08;Method Area&#xff09;:元空间&#xff08;Metaspace&#xff09;公共区域虚…

Qt 信号槽的扩展知识

Qt 信号槽的扩展知识一、信号与槽的重载Qt信号与槽的重载问题注意事项示例场景二、一个信号连接多个槽1、直接连接多个槽2、使用lambda表达式连接3、连接顺序控制4、断开特定连接5、自动连接方式三、 多个信号连接一个槽基本连接语法使用QSignalMapper区分信号源&#xff08;Qt…

链表算法之【合并两个有序链表】

目录 LeetCode-21题 LeetCode-21题 将两个升序链表合并成一个新的升序链表并返回 class Solution {public ListNode mergeTwoLists(ListNode list1, ListNode list2) {if (list1 null)return list2;if (list2 null)return list1;ListNode dummyHead new ListNode();ListN…

Linux - firewall 防火墙

&#x1f525; 什么是 firewalld&#xff1f;firewalld 是一个动态管理防火墙的守护进程&#xff08;daemon&#xff09;&#xff0c;它提供了一个 D-Bus 接口来管理系统或用户的防火墙规则。与传统的静态 iptables 不同&#xff0c;firewalld 支持&#xff1a;区域&#xff08…

【GESP】C++二级真题 luogu-B4356 [GESP202506 二级] 数三角形

GESP C二级&#xff0c;2025年6月真题&#xff0c;多重循环&#xff0c;难度★✮☆☆☆。 题目题解详见&#xff1a;【GESP】C二级真题 luogu-B4356 [GESP202506 二级] 数三角形 | OneCoder 【GESP】C二级真题 luogu-B4356 [GESP202506 二级] 数三角形 | OneCoderGESP C二级&…

遥感影像岩性分类:基于CNN与CNN-EL集成学习的深度学习方法

遥感影像岩性分类&#xff1a;基于CNN与CNN-EL集成学习的深度学习方法 大家好&#xff0c;我是微学AI&#xff0c;今天给大家介绍一下遥感影像岩性分类&#xff1a;基于CNN与CNN-EL集成学习的深度学习方法。该方法充分利用了多源遥感数据的光谱和空间信息&#xff0c;同时结合…

【STM32 学习笔记】SPI通信协议

SPI通信协议 SPI协议是由摩托罗拉公司提出的通讯协议(Serial Peripheral Interface)&#xff0c;即串行外围设备接口&#xff0c; 是一种高速全双工的通信总线。它被广泛地使用在ADC、LCD等设备与MCU间&#xff0c;要求通讯速率较高的场合。   学习本章时&#xff0c;可与I2C…

Kafka如何做到消息不丢失

一、三种消息传递语义(Message Delivery Semantics):核心是“消息被消费处理的次数” Kafka的三种传递语义本质上描述的是“一条消息从生产到最终被消费者处理完成,可能出现的次数”,这由生产者的消息写入可靠性和消费者的offset提交策略共同决定。 1. At most once(最…

HEVC/H.265 码流分析工具 HEVCESBrowser 使用教程

引言 研究视频编解码的都知道&#xff0c;少不了各类的分析工具助力标准研究和算法开发&#xff0c;目前最出名的流媒体分析工具就是elecard系列&#xff0c;但基于一些原因可能大家用的都比较少。因此&#xff0c;找到合适的码流分析工具才是编解码研究的便捷途径&#xff0c…

量子计算+AI芯片:光子计算如何重构神经网络硬件生态

前言 前些天发现了一个巨牛的人工智能免费学习网站&#xff0c;通俗易懂&#xff0c;风趣幽默&#xff0c;忍不住分享一下给大家。点击跳转到网站 量子计算AI芯片&#xff1a;光子计算如何重构神经网络硬件生态 ——2025年超异构计算架构下的万亿参数模型训练革命 产业拐点&a…

linux 4.14 kernel屏蔽arm arch timer的方法

在 ARMv7 架构的单核 CPU 系统中&#xff0c;完全禁用 coretime 时钟中断&#xff08;通常是 ARM 私有定时器中断&#xff09;需要谨慎操作&#xff0c;因为这会导致调度器无法工作&#xff0c;系统可能失去响应。以下是实现方法及注意事项&#xff1a;方法 1&#xff1a;通过 …

[实战]调频(FM)和调幅(AM)信号生成(完整C语言实现)

调频&#xff08;FM&#xff09;和调幅&#xff08;AM&#xff09;信号生成 文章目录调频&#xff08;FM&#xff09;和调幅&#xff08;AM&#xff09;信号生成1. 调频&#xff08;FM&#xff09;和调幅&#xff08;AM&#xff09;信号原理与信号生成调幅&#xff08;AM&#…

【LeetCode 热题 100】21. 合并两个有序链表——(解法一)迭代法

Problem: 21. 合并两个有序链表 题目&#xff1a;将两个升序链表合并为一个新的 升序 链表并返回。新链表是通过拼接给定的两个链表的所有节点组成的。 文章目录整体思路完整代码时空复杂度时间复杂度&#xff1a;O(M N)空间复杂度&#xff1a;O(1)整体思路 这段代码旨在解决…

力扣 hot100 Day40

23. 合并 K 个升序链表 给你一个链表数组&#xff0c;每个链表都已经按升序排列。 请你将所有链表合并到一个升序链表中&#xff0c;返回合并后的链表。 //自己写的垃圾 class Solution { public:ListNode* mergeKLists(vector<ListNode*>& lists) {vector<int…

validate CRI v1 image API for endpoint “unix:///run/containerd/containerd.sock“

1.现象pull image failed: Failed to exec command: sudo -E /bin/bash -c "env PATH$PATH crictl pull 172.23.123.117:8443/kubesphereio/pause:3.9"FATA[0000] validate service connection: validate CRI v1 image API for endpoint "unix:///run/container…