从RNN到Transformer

目录

  1. 基础篇:序列模型概述
  2. RNN循环神经网络
  3. LSTM长短期记忆网络
  4. Transformer架构
  5. 时间序列预测应用
  6. 计算机视觉应用
  7. 大语言模型应用
  8. 实战与优化
  9. 前沿发展

基础篇:序列模型概述 {#基础篇}

什么是序列数据?

序列数据是按照特定顺序排列的数据点集合,其中顺序信息至关重要:

  • 时间序列:股票价格、温度变化、销售数据
  • 文本序列:句子、段落、文档
  • 视频序列:连续的图像帧
  • 音频序列:声波信号

为什么需要专门的序列模型?

传统的前馈神经网络存在以下限制:

  1. 固定输入大小:无法处理变长序列
  2. 位置不变性:忽略了序列中的顺序信息
  3. 无记忆能力:不能利用之前的信息
  4. 参数爆炸:对长序列需要大量参数

序列建模的核心挑战

  • 长期依赖:捕获序列中相距较远的元素之间的关系
  • 梯度消失/爆炸:深层网络训练困难
  • 计算效率:处理长序列的计算复杂度
  • 泛化能力:适应不同长度和类型的序列

RNN循环神经网络 {#rnn}

基础概念

RNN通过引入循环连接,使网络能够保持内部状态(记忆),从而处理序列数据。

核心思想
# RNN的基本公式
h_t = tanh(W_hh @ h_{t-1} + W_xh @ x_t + b_h)
y_t = W_hy @ h_t + b_y# 其中:
# h_t: 时刻t的隐藏状态
# x_t: 时刻t的输入
# y_t: 时刻t的输出
# W_*: 权重矩阵
# b_*: 偏置向量

RNN的结构

1. 基本RNN单元
  • 输入层:接收当前时刻的输入x_t
  • 隐藏层:结合当前输入和上一时刻的隐藏状态
  • 输出层:产生当前时刻的输出
2. 展开的RNN

RNN可以展开成一个深度网络,每个时间步共享相同的参数:

x_0 → [RNN] → h_0 → y_0↓
x_1 → [RNN] → h_1 → y_1↓
x_2 → [RNN] → h_2 → y_2

RNN的变体

1. 多对一(Many-to-One)
  • 应用:情感分析、文本分类
  • 特点:整个序列输入,单个输出
2. 一对多(One-to-Many)
  • 应用:图像描述生成、音乐生成
  • 特点:单个输入,序列输出
3. 多对多(Many-to-Many)
  • 同步:机器翻译、视频分类
  • 异步:序列到序列任务

RNN的训练

反向传播穿越时间(BPTT)
# BPTT算法伪代码
def bptt(sequences, targets, rnn_params):total_loss = 0for seq, target in zip(sequences, targets):# 前向传播hidden_states = []h = initial_hidden_statefor x in seq:h = rnn_cell(x, h, rnn_params)hidden_states.append(h)# 计算损失loss = compute_loss(hidden_states[-1], target)# 反向传播gradients = backward_pass(loss, hidden_states)# 更新参数update_parameters(rnn_params, gradients)

RNN的问题

  1. 梯度消失:长序列训练时梯度呈指数级衰减
  2. 梯度爆炸:梯度值变得非常大,导致训练不稳定
  3. 长期依赖困难:难以捕获序列中的长距离依赖关系

LSTM长短期记忆网络 {#lstm}

LSTM的动机

LSTM专门设计用于解决RNN的长期依赖问题,通过引入门控机制和记忆单元。

LSTM的核心组件

1. 细胞状态(Cell State)
  • 信息的长期记忆通道
  • 可以选择性地保留或遗忘信息
2. 三个门(Gates)
遗忘门(Forget Gate)
f_t = sigmoid(W_f @ [h_{t-1}, x_t] + b_f)
# 决定从细胞状态中丢弃什么信息
输入门(Input Gate)
i_t = sigmoid(W_i @ [h_{t-1}, x_t] + b_i)
C̃_t = tanh(W_C @ [h_{t-1}, x_t] + b_C)
# 决定什么新信息存储在细胞状态中
输出门(Output Gate)
o_t = sigmoid(W_o @ [h_{t-1}, x_t] + b_o)
h_t = o_t * tanh(C_t)
# 决定输出什么信息

LSTM的完整计算流程

def lstm_cell(x_t, h_prev, C_prev, W, b):# 1. 遗忘门:决定丢弃什么f_t = sigmoid(W_f @ concat([h_prev, x_t]) + b_f)# 2. 输入门:决定存储什么i_t = sigmoid(W_i @ concat([h_prev, x_t]) + b_i)C_tilde = tanh(W_C @ concat([h_prev, x_t]) + b_C)# 3. 更新细胞状态C_t = f_t * C_prev + i_t * C_tilde# 4. 输出门:决定输出什么o_t = sigmoid(W_o @ concat([h_prev, x_t]) + b_o)h_t = o_t * tanh(C_t)return h_t, C_t

LSTM的变体

1. GRU(门控循环单元)
  • 简化版LSTM,只有两个门
  • 更新门和重置门
  • 参数更少,训练更快
def gru_cell(x_t, h_prev, W, b):# 重置门r_t = sigmoid(W_r @ concat([h_prev, x_t]) + b_r)# 更新门z_t = sigmoid(W_z @ concat([h_prev, x_t]) + b_z)# 候选隐藏状态h_tilde = tanh(W_h @ concat([r_t * h_prev, x_t]) + b_h)# 最终隐藏状态h_t = (1 - z_t) * h_prev + z_t * h_tildereturn h_t
2. 双向LSTM(BiLSTM)
  • 同时处理正向和反向序列
  • 捕获双向上下文信息
  • 在NLP任务中表现优异
3. 多层LSTM
  • 垂直堆叠多个LSTM层
  • 学习更抽象的特征表示
  • 增强模型表达能力

LSTM的优势与局限

优势:

  • 有效解决梯度消失问题
  • 能够捕获长期依赖关系
  • 在多种序列任务中表现优秀

局限:

  • 计算复杂度高
  • 难以并行化
  • 对于超长序列仍有挑战

Transformer架构 {#transformer}

Transformer的革命性创新

2017年提出的"Attention is All You Need"彻底改变了序列建模范式。

核心概念:自注意力机制

1. 缩放点积注意力
def scaled_dot_product_attention(Q, K, V, mask=None):# Q: Query矩阵 [batch, seq_len, d_k]# K: Key矩阵 [batch, seq_len, d_k]# V: Value矩阵 [batch, seq_len, d_v]d_k = K.shape[-1]# 计算注意力分数scores = Q @ K.transpose(-2, -1) / sqrt(d_k)# 应用mask(可选)if mask is not None:scores = scores.masked_fill(mask == 0, -1e9)# Softmax归一化attention_weights = softmax(scores, dim=-1)# 加权求和output = attention_weights @ Vreturn output, attention_weights
2. 多头注意力
class MultiHeadAttention:def __init__(self, d_model, n_heads):self.d_model = d_modelself.n_heads = n_headsself.d_k = d_model // n_heads# 线性投影层self.W_q = Linear(d_model, d_model)self.W_k = Linear(d_model, d_model)self.W_v = Linear(d_model, d_model)self.W_o = Linear(d_model, d_model)def forward(self, query, key, value, mask=None):batch_size = query.shape[0]# 1. 线性投影并重塑为多头Q = self.W_q(query).view(batch_size, -1, self.n_heads, self.d_k)K = self.W_k(key).view(batch_size, -1, self.n_heads, self.d_k)V = self.W_v(value).view(batch_size, -1, self.n_heads, self.d_k)# 2. 转置以便于并行计算Q = Q.transpose(1, 2)K = K.transpose(1, 2)V = V.transpose(1, 2)# 3. 计算注意力attn_output, _ = scaled_dot_product_attention(Q, K, V, mask)# 4. 拼接多头输出attn_output = attn_output.transpose(1, 2).contiguous()attn_output = attn_output.view(batch_size, -1, self.d_model)# 5. 最终线性投影output = self.W_o(attn_output)return output

Transformer的架构组件

1. 编码器(Encoder)
class TransformerEncoder:def __init__(self, d_model, n_heads, d_ff, dropout=0.1):# 子层self.self_attention = MultiHeadAttention(d_model, n_heads)self.feed_forward = FeedForward(d_model, d_ff)# 层归一化self.norm1 = LayerNorm(d_model)self.norm2 = LayerNorm(d_model)# Dropoutself.dropout = Dropout(dropout)def forward(self, x, mask=None):# 1. 自注意力子层attn_output = self.self_attention(x, x, x, mask)x = self.norm1(x + self.dropout(attn_output))# 2. 前馈网络子层ff_output = self.feed_forward(x)x = self.norm2(x + self.dropout(ff_output))return x
2. 解码器(Decoder)
class TransformerDecoder:def __init__(self, d_model, n_heads, d_ff, dropout=0.1):# 三个子层self.masked_self_attention = MultiHeadAttention(d_model, n_heads)self.cross_attention = MultiHeadAttention(d_model, n_heads)self.feed_forward = FeedForward(d_model, d_ff)# 层归一化self.norm1 = LayerNorm(d_model)self.norm2 = LayerNorm(d_model)self.norm3 = LayerNorm(d_model)self.dropout = Dropout(dropout)def forward(self, x, encoder_output, src_mask=None, tgt_mask=None):# 1. 掩码自注意力attn1 = self.masked_self_attention(x, x, x, tgt_mask)x = self.norm1(x + self.dropout(attn1))# 2. 交叉注意力attn2 = self.cross_attention(x, encoder_output, encoder_output, src_mask)x = self.norm2(x + self.dropout(attn2))# 3. 前馈网络ff_output = self.feed_forward(x)x = self.norm3(x + self.dropout(ff_output))return x

位置编码(Positional Encoding)

由于Transformer没有循环结构,需要显式注入位置信息:

def positional_encoding(seq_len, d_model):position = np.arange(seq_len)[:, np.newaxis]div_term = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))pos_encoding = np.zeros((seq_len, d_model))pos_encoding[:, 0::2] = np.sin(position * div_term)pos_encoding[:, 1::2] = np.cos(position * div_term)return pos_encoding

Transformer的优势

  1. 并行计算:所有位置可以同时计算
  2. 长距离依赖:直接建模任意两个位置的关系
  3. 可解释性:注意力权重提供了可视化依据
  4. 迁移学习:预训练模型可以适应多种下游任务

时间序列预测应用 {#时间序列}

传统时间序列预测

1. 问题定义
  • 单变量预测:预测单一时间序列的未来值
  • 多变量预测:同时预测多个相关时间序列
  • 多步预测:预测未来多个时间点
2. 数据预处理
class TimeSeriesPreprocessor:def __init__(self, window_size, horizon):self.window_size = window_sizeself.horizon = horizondef create_sequences(self, data):X, y = [], []for i in range(len(data) - self.window_size - self.horizon + 1):X.append(data[i:i + self.window_size])y.append(data[i + self.window_size:i + self.window_size + self.horizon])return np.array(X), np.array(y)def normalize(self, data):self.mean = np.mean(data)self.std = np.std(data)return (data - self.mean) / self.stddef denormalize(self, data):return data * self.std + self.mean

RNN/LSTM时间序列模型

1. 单步预测LSTM
class LSTMForecaster(nn.Module):def __init__(self, input_size, hidden_size, num_layers, output_size):super().__init__()self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)self.linear = nn.Linear(hidden_size, output_size)def forward(self, x):# x: [batch, seq_len, input_size]lstm_out, (h_n, c_n) = self.lstm(x)# 使用最后一个时间步的输出predictions = self.linear(lstm_out[:, -1, :])return predictions
2. Seq2Seq模型
class Seq2SeqForecaster(nn.Module):def __init__(self, input_size, hidden_size, output_size, horizon):super().__init__()self.encoder = nn.LSTM(input_size, hidden_size, batch_first=True)self.decoder = nn.LSTM(output_size, hidden_size, batch_first=True)self.output_layer = nn.Linear(hidden_size, output_size)self.horizon = horizondef forward(self, x):# 编码历史序列_, (h_n, c_n) = self.encoder(x)# 解码预测序列decoder_input = torch.zeros(x.size(0), 1, self.output_layer.out_features)predictions = []for _ in range(self.horizon):output, (h_n, c_n) = self.decoder(decoder_input, (h_n, c_n))prediction = self.output_layer(output)predictions.append(prediction)decoder_input = predictionreturn torch.cat(predictions, dim=1)

Transformer时间序列模型

1. Temporal Fusion Transformer (TFT)
class TemporalFusionTransformer:"""Google提出的时间序列预测专用Transformer"""def __init__(self, config):# 变量选择网络self.vsn = VariableSelectionNetwork(config)# 门控残差网络self.grn = GatedResidualNetwork(config)# 多头注意力self.attention = InterpretableMultiHeadAttention(config)# 位置编码self.positional_encoding = PositionalEncoding(config)# 量化损失self.quantile_loss = QuantileLoss(config.quantiles)def forward(self, x_static, x_historical, x_future):# 1. 变量选择selected_historical = self.vsn(x_historical)selected_future = self.vsn(x_future)# 2. 静态特征编码static_encoding = self.grn(x_static)# 3. LSTM编码历史信息historical_features = self.lstm_encoder(selected_historical)# 4. 自注意力机制temporal_features = self.attention(historical_features,static_context=static_encoding)# 5. 预测未来predictions = self.output_layer(temporal_features)return predictions
2. Autoformer
class Autoformer:"""基于自相关机制的Transformer变体"""def __init__(self, config):# 序列分解self.decomposition = SeriesDecomposition(config.kernel_size)# 自相关机制self.auto_correlation = AutoCorrelation(factor=config.factor,attention_dropout=config.dropout)# 编码器self.encoder = AutoformerEncoder(config)# 解码器self.decoder = AutoformerDecoder(config)def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec):# 1. 分解输入序列enc_seasonal, enc_trend = self.decomposition(x_enc)# 2. 编码器处理enc_out = self.encoder(enc_seasonal, x_mark_enc)# 3. 解码器生成预测seasonal_output, trend_output = self.decoder(x_dec, x_mark_dec, enc_out, enc_trend)# 4. 组合预测结果predictions = seasonal_output + trend_outputreturn predictions

时间序列预测的关键技术

1. 特征工程
def create_time_features(df, date_column):"""提取时间特征"""df['hour'] = df[date_column].dt.hourdf['dayofweek'] = df[date_column].dt.dayofweekdf['month'] = df[date_column].dt.monthdf['dayofyear'] = df[date_column].dt.dayofyeardf['weekofyear'] = df[date_column].dt.isocalendar().week# 周期性编码df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)return df
2. 处理多尺度模式
class MultiScaleBlock(nn.Module):def __init__(self, scales=[1, 4, 8]):super().__init__()self.scales = scalesself.convs = nn.ModuleList([nn.Conv1d(in_channels, out_channels, kernel_size=s, stride=s)for s in scales])def forward(self, x):multi_scale_features = []for scale, conv in zip(self.scales, self.convs):features = conv(x)# 上采样到原始尺度features = F.interpolate(features, size=x.size(-1))multi_scale_features.append(features)return torch.cat(multi_scale_features, dim=1)

计算机视觉应用 {#视觉}

视觉序列建模的挑战

  1. 空间-时间建模:同时处理空间和时间维度
  2. 计算复杂度:视频数据量巨大
  3. 长距离依赖:动作可能跨越多帧

RNN/LSTM在视觉中的应用

1. 图像描述生成(Image Captioning)
class ImageCaptioningModel(nn.Module):def __init__(self, encoder, decoder, vocab_size, embed_dim, hidden_dim):super().__init__()# CNN编码器self.encoder = encoder  # 例如:ResNetself.encoder_fc = nn.Linear(encoder.output_dim, hidden_dim)# LSTM解码器self.embedding = nn.Embedding(vocab_size, embed_dim)self.lstm = nn.LSTM(embed_dim + hidden_dim, hidden_dim, batch_first=True)self.output_layer = nn.Linear(hidden_dim, vocab_size)# 注意力机制self.attention = AdditiveAttention(hidden_dim)def forward(self, images, captions=None):# 1. 编码图像features = self.encoder(images)  # [batch, features_dim, H, W]features = features.view(features.size(0), features.size(1), -1)features = features.permute(0, 2, 1)  # [batch, H*W, features_dim]# 2. 初始化LSTMh_0 = self.encoder_fc(features.mean(dim=1))  # 全局特征c_0 = torch.zeros_like(h_0)if self.training and captions is not None:# Teacher forcing训练embedded = self.embedding(captions)outputs = []h_t, c_t = h_0.unsqueeze(0), c_0.unsqueeze(0)for t in range(embedded.size(1)):# 注意力机制context, _ = self.attention(h_t.squeeze(0), features)# LSTM步进input_t = torch.cat([embedded[:, t], context], dim=1)output, (h_t, c_t) = self.lstm(input_t.unsqueeze(1), (h_t, c_t))# 预测下一个词prediction = self.output_layer(output.squeeze(1))outputs.append(prediction)return torch.stack(outputs, dim=1)else:# 推理时生成return self.generate(features, h_0, c_0)
2. 视频理解
class VideoUnderstandingModel(nn.Module):def __init__(self, feature_extractor, hidden_dim, num_classes):super().__init__()# 3D CNN或2D CNN特征提取self.feature_extractor = feature_extractor# 双向LSTMself.bilstm = nn.LSTM(feature_extractor.output_dim,hidden_dim,bidirectional=True,batch_first=True)# 时间注意力self.temporal_attention = nn.Sequential(nn.Linear(hidden_dim * 2, hidden_dim),nn.Tanh(),nn.Linear(hidden_dim, 1))# 分类器self.classifier = nn.Linear(hidden_dim * 2, num_classes)def forward(self, video_frames):# 1. 提取帧特征batch_size, num_frames = video_frames.shape[:2]frame_features = []for t in range(num_frames):features = self.feature_extractor(video_frames[:, t])frame_features.append(features)frame_features = torch.stack(frame_features, dim=1)# 2. 时序建模lstm_out, _ = self.bilstm(frame_features)# 3. 时间注意力聚合attention_weights = self.temporal_attention(lstm_out)attention_weights = F.softmax(attention_weights, dim=1)# 加权平均video_representation = (lstm_out * attention_weights).sum(dim=1)# 4. 分类output = self.classifier(video_representation)return output

Vision Transformer (ViT)

1. 基础ViT架构
class VisionTransformer(nn.Module):def __init__(self, img_size=224, patch_size=16, in_channels=3,embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0,num_classes=1000):super().__init__()# Patch embeddingself.patch_embed = PatchEmbedding(img_size, patch_size, in_channels, embed_dim)num_patches = (img_size // patch_size) ** 2# 位置编码self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))# CLS tokenself.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))# Transformer编码器self.transformer = nn.ModuleList([TransformerBlock(embed_dim, num_heads, mlp_ratio)for _ in range(depth)])# 分类头self.norm = nn.LayerNorm(embed_dim)self.head = nn.Linear(embed_dim, num_classes)def forward(self, x):# 1. Patch embeddingx = self.patch_embed(x)  # [B, num_patches, embed_dim]# 2. 添加CLS tokencls_tokens = self.cls_token.expand(x.shape[0], -1, -1)x = torch.cat((cls_tokens, x), dim=1)# 3. 添加位置编码x = x + self.pos_embed# 4. Transformer编码器for block in self.transformer:x = block(x)# 5. 提取CLS token用于分类x = self.norm(x)cls_token_final = x[:, 0]# 6. 分类output = self.head(cls_token_final)return output
2. Patch Embedding实现
class PatchEmbedding(nn.Module):def __init__(self, img_size, patch_size, in_channels, embed_dim):super().__init__()self.img_size = img_sizeself.patch_size = patch_sizeself.num_patches = (img_size // patch_size) ** 2# 使用卷积实现patch embeddingself.proj = nn.Conv2d(in_channels, embed_dim,kernel_size=patch_size, stride=patch_size)def forward(self, x):# x: [B, C, H, W]x = self.proj(x)  # [B, embed_dim, H/P, W/P]x = x.flatten(2)  # [B, embed_dim, num_patches]x = x.transpose(1, 2)  # [B, num_patches, embed_dim]return x

视觉Transformer的改进

1. Swin Transformer
class SwinTransformer(nn.Module):"""具有层次结构和滑动窗口的Vision Transformer"""def __init__(self, img_size, patch_size, embed_dim, depths, num_heads):super().__init__()# Patch分区self.patch_partition = PatchPartition(patch_size)# 多个阶段self.stages = nn.ModuleList()for i, (depth, num_head) in enumerate(zip(depths, num_heads)):stage = nn.ModuleList([SwinTransformerBlock(dim=embed_dim * (2 ** i),num_heads=num_head,window_size=7,shift_size=0 if j % 2 == 0 else 3)for j in range(depth)])self.stages.append(stage)# Patch merging(除了最后一个阶段)if i < len(depths) - 1:self.stages.append(PatchMerging(embed_dim * (2 ** i)))def forward(self, x):x = self.patch_partition(x)for stage in self.stages:if isinstance(stage, nn.ModuleList):for block in stage:x = block(x)else:x = stage(x)  # Patch mergingreturn x
2. DETR(Detection Transformer)
class DETR(nn.Module):"""目标检测Transformer"""def __init__(self, backbone, transformer, num_classes, num_queries=100):super().__init__()self.backbone = backboneself.conv = nn.Conv2d(backbone.num_channels, hidden_dim, 1)self.transformer = transformer# 目标查询self.query_embed = nn.Embedding(num_queries, hidden_dim)# 预测头self.class_embed = nn.Linear(hidden_dim, num_classes + 1)  # +1 for no objectself.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)def forward(self, images):# 1. CNN backbone提取特征features = self.backbone(images)# 2. 投影到transformer维度h = self.conv(features)# 3. 添加位置编码pos_embed = self.positional_encoding(h)# 4. Transformerhs = self.transformer(self.flatten(h),query_embed=self.query_embed.weight,pos_embed=self.flatten(pos_embed))# 5. 预测类别和边界框outputs_class = self.class_embed(hs)outputs_coord = self.bbox_embed(hs).sigmoid()return {'pred_logits': outputs_class, 'pred_boxes': outputs_coord}

大语言模型应用 {#语言模型}

语言模型的演进

1. 从N-gram到神经网络语言模型
  • N-gram模型:基于统计的方法
  • 词向量:Word2Vec, GloVe
  • RNN语言模型:处理变长序列
  • Transformer语言模型:并行化和长距离依赖

GPT系列(生成式预训练)

1. GPT架构
class GPT(nn.Module):def __init__(self, vocab_size, n_layer, n_head, n_embd, block_size):super().__init__()# Token和位置嵌入self.token_embedding = nn.Embedding(vocab_size, n_embd)self.position_embedding = nn.Embedding(block_size, n_embd)# Transformer块self.blocks = nn.ModuleList([TransformerBlock(n_embd, n_head) for _ in range(n_layer)])# 最终层归一化和输出投影self.ln_f = nn.LayerNorm(n_embd)self.lm_head = nn.Linear(n_embd, vocab_size, bias=False)self.block_size = block_sizedef forward(self, idx, targets=None):B, T = idx.shape# Token和位置嵌入tok_emb = self.token_embedding(idx)pos_emb = self.position_embedding(torch.arange(T, device=idx.device))x = tok_emb + pos_emb# Transformer块for block in self.blocks:x = block(x)# 最终处理x = self.ln_f(x)logits = self.lm_head(x)# 计算损失(如果有目标)loss = Noneif targets is not None:loss = F.cross_entropy(logits.view(-1, logits.size(-1)),targets.view(-1))return logits, lossdef generate(self, idx, max_new_tokens, temperature=1.0, top_k=None):"""自回归生成"""for _ in range(max_new_tokens):# 裁剪序列到块大小idx_cond = idx if idx.size(1) <= self.block_size else idx[:, -self.block_size:]# 前向传播logits, _ = self(idx_cond)logits = logits[:, -1, :] / temperature# 可选的top-k采样if top_k is not None:v, _ = torch.topk(logits, top_k)logits[logits < v[:, [-1]]] = -float('Inf')# Softmax和采样probs = F.softmax(logits, dim=-1)idx_next = torch.multinomial(probs, num_samples=1)# 拼接idx = torch.cat((idx, idx_next), dim=1)return idx
2. GPT训练技巧
class GPTTrainer:def __init__(self, model, train_dataset, config):self.model = modelself.train_dataset = train_datasetself.config = config# 优化器self.optimizer = self.configure_optimizers()# 学习率调度self.scheduler = CosineAnnealingLR(self.optimizer,T_max=config.max_iters)def configure_optimizers(self):"""配置AdamW优化器,with weight decay fix"""decay = set()no_decay = set()for name, param in self.model.named_parameters():if 'bias' in name or 'ln' in name or 'embedding' in name:no_decay.add(name)else:decay.add(name)param_groups = [{'params': [p for n, p in self.model.named_parameters() if n in decay],'weight_decay': self.config.weight_decay},{'params': [p for n, p in self.model.named_parameters() if n in no_decay],'weight_decay': 0.0}]optimizer = torch.optim.AdamW(param_groups,lr=self.config.learning_rate,betas=(0.9, 0.95))return optimizer

BERT系列(双向编码器)

1. BERT预训练
class BERT(nn.Module):def __init__(self, vocab_size, hidden_size, num_layers, num_heads, max_len):super().__init__()# 嵌入层self.token_embedding = nn.Embedding(vocab_size, hidden_size)self.position_embedding = nn.Embedding(max_len, hidden_size)self.segment_embedding = nn.Embedding(2, hidden_size)# Transformer编码器self.encoder = nn.ModuleList([TransformerEncoderLayer(hidden_size, num_heads)for _ in range(num_layers)])# 预训练任务头self.mlm_head = nn.Linear(hidden_size, vocab_size)  # MLMself.nsp_head = nn.Linear(hidden_size, 2)  # NSPdef forward(self, input_ids, segment_ids, attention_mask, mlm_labels=None, nsp_labels=None):# 嵌入seq_len = input_ids.size(1)pos_ids = torch.arange(seq_len, device=input_ids.device)embeddings = (self.token_embedding(input_ids) +self.position_embedding(pos_ids) +self.segment_embedding(segment_ids))# 编码hidden_states = embeddingsfor encoder_layer in self.encoder:hidden_states = encoder_layer(hidden_states, attention_mask)# MLM预测mlm_logits = self.mlm_head(hidden_states)# NSP预测(使用CLS token)nsp_logits = self.nsp_head(hidden_states[:, 0])# 计算损失total_loss = 0if mlm_labels is not None:mlm_loss = F.cross_entropy(mlm_logits.view(-1, self.config.vocab_size),mlm_labels.view(-1),ignore_index=-100)total_loss += mlm_lossif nsp_labels is not None:nsp_loss = F.cross_entropy(nsp_logits, nsp_labels)total_loss += nsp_lossreturn {'loss': total_loss, 'mlm_logits': mlm_logits, 'nsp_logits': nsp_logits}
2. BERT微调
class BERTForSequenceClassification(nn.Module):def __init__(self, bert_model, num_classes, dropout=0.1):super().__init__()self.bert = bert_modelself.dropout = nn.Dropout(dropout)self.classifier = nn.Linear(bert_model.config.hidden_size, num_classes)def forward(self, input_ids, attention_mask, labels=None):# BERT编码outputs = self.bert(input_ids, attention_mask=attention_mask)# 使用CLS token表示pooled_output = outputs.last_hidden_state[:, 0]pooled_output = self.dropout(pooled_output)# 分类logits = self.classifier(pooled_output)# 计算损失loss = Noneif labels is not None:loss = F.cross_entropy(logits, labels)return {'loss': loss, 'logits': logits}

T5(Text-to-Text Transfer Transformer)

class T5Model(nn.Module):"""统一的文本到文本框架"""def __init__(self, config):super().__init__()# 共享嵌入self.shared = nn.Embedding(config.vocab_size, config.d_model)# 编码器self.encoder = T5Stack(config, embed_tokens=self.shared)# 解码器self.decoder = T5Stack(config, embed_tokens=self.shared, is_decoder=True)# 语言模型头self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)def forward(self, input_ids, decoder_input_ids, labels=None):# 编码encoder_outputs = self.encoder(input_ids)# 解码decoder_outputs = self.decoder(decoder_input_ids,encoder_hidden_states=encoder_outputs.last_hidden_state)# 预测lm_logits = self.lm_head(decoder_outputs.last_hidden_state)# 损失计算loss = Noneif labels is not None:loss = F.cross_entropy(lm_logits.view(-1, lm_logits.size(-1)),labels.view(-1),ignore_index=-100)return {'loss': loss, 'logits': lm_logits}

现代LLM的关键技术

1. 高效注意力机制
class FlashAttention(nn.Module):"""Flash Attention: 内存高效的精确注意力"""def forward(self, q, k, v, causal=False):# 使用分块计算减少内存使用B, H, N, D = q.shape# 分块大小BLOCK_SIZE = min(64, N)# 输出初始化O = torch.zeros_like(q)L = torch.zeros((B, H, N, 1), device=q.device)for i in range(0, N, BLOCK_SIZE):# 加载Q块q_block = q[:, :, i:i+BLOCK_SIZE]# 初始化块输出o_block = torch.zeros_like(q_block)l_block = torch.zeros((B, H, q_block.shape[2], 1), device=q.device)for j in range(0, N, BLOCK_SIZE):# 因果掩码检查if causal and j > i + BLOCK_SIZE:break# 加载KV块k_block = k[:, :, j:j+BLOCK_SIZE]v_block = v[:, :, j:j+BLOCK_SIZE]# 计算注意力分数scores = torch.matmul(q_block, k_block.transpose(-2, -1))# 因果掩码if causal and i >= j:mask = torch.triu(torch.ones_like(scores), diagonal=j-i+1)scores.masked_fill_(mask.bool(), float('-inf'))# 在线softmax更新m_block = scores.max(dim=-1, keepdim=True).valuesp_block = torch.exp(scores - m_block)l_block_new = p_block.sum(dim=-1, keepdim=True)# 更新输出o_block = o_block * l_block + torch.matmul(p_block, v_block)l_block = l_block + l_block_newo_block = o_block / l_blockO[:, :, i:i+BLOCK_SIZE] = o_blockL[:, :, i:i+BLOCK_SIZE] = l_blockreturn O
2. 参数高效微调(PEFT)
class LoRALayer(nn.Module):"""Low-Rank Adaptation for efficient fine-tuning"""def __init__(self, in_features, out_features, rank=16, alpha=32):super().__init__()self.rank = rankself.alpha = alphaself.scaling = alpha / rank# 冻结的预训练权重self.weight = nn.Parameter(torch.randn(out_features, in_features))self.weight.requires_grad = False# LoRA参数self.lora_A = nn.Parameter(torch.randn(rank, in_features))self.lora_B = nn.Parameter(torch.zeros(out_features, rank))# 初始化nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))def forward(self, x):# 原始前向传播result = F.linear(x, self.weight)# 添加LoRAresult += (x @ self.lora_A.T @ self.lora_B.T) * self.scalingreturn result
3. 长上下文处理
class LongContextTransformer(nn.Module):"""处理超长上下文的技术"""def __init__(self, config):super().__init__()# RoPE位置编码self.rotary_embedding = RotaryEmbedding(config.hidden_size)# 稀疏注意力模式self.attention_pattern = self.create_sparse_pattern(config.max_length)# 滑动窗口注意力self.window_size = config.window_sizedef create_sparse_pattern(self, seq_len):"""创建稀疏注意力模式"""pattern = torch.zeros(seq_len, seq_len)# 局部窗口for i in range(seq_len):start = max(0, i - self.window_size // 2)end = min(seq_len, i + self.window_size // 2)pattern[i, start:end] = 1# 全局token(每隔一定距离)stride = seq_len // 8pattern[::stride, :] = 1pattern[:, ::stride] = 1return pattern.bool()

实战与优化 {#实战}

模型训练最佳实践

1. 混合精度训练
from torch.cuda.amp import autocast, GradScalerclass MixedPrecisionTrainer:def __init__(self, model, optimizer):self.model = modelself.optimizer = optimizerself.scaler = GradScaler()def train_step(self, batch):self.optimizer.zero_grad()# 自动混合精度with autocast():outputs = self.model(**batch)loss = outputs['loss']# 反向传播with scalingself.scaler.scale(loss).backward()# 梯度裁剪self.scaler.unscale_(self.optimizer)torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)# 优化器步进self.scaler.step(self.optimizer)self.scaler.update()return loss.item()
2. 分布式训练
class DistributedTrainer:def __init__(self, model, rank, world_size):# 初始化进程组dist.init_process_group(backend='nccl', rank=rank, world_size=world_size)# 模型并行self.model = nn.parallel.DistributedDataParallel(model.cuda(rank),device_ids=[rank],output_device=rank,find_unused_parameters=False)# 数据并行self.train_sampler = DistributedSampler(train_dataset,num_replicas=world_size,rank=rank)def train_epoch(self):self.train_sampler.set_epoch(epoch)  # 保证每个epoch的随机性for batch in DataLoader(train_dataset, sampler=self.train_sampler):loss = self.train_step(batch)# 同步所有进程dist.barrier()

推理优化

1. 模型量化
class QuantizedModel:@staticmethoddef quantize_model(model, calibration_data):"""INT8量化"""model.eval()# 准备量化model.qconfig = torch.quantization.get_default_qconfig('fbgemm')torch.quantization.prepare(model, inplace=True)# 校准with torch.no_grad():for batch in calibration_data:model(batch)# 转换为量化模型torch.quantization.convert(model, inplace=True)return model
2. KV Cache优化
class OptimizedDecoder:def __init__(self, model, max_cache_size=1024):self.model = modelself.kv_cache = {}self.max_cache_size = max_cache_sizedef generate_with_cache(self, input_ids, max_length):outputs = []for i in range(max_length):# 使用缓存的KVif i > 0:# 只计算新token的KVnew_token_id = input_ids[:, -1:]key, value = self.compute_kv(new_token_id, i)# 更新缓存self.kv_cache[i] = (key, value)else:# 初始计算keys, values = self.compute_all_kv(input_ids)self.kv_cache = {j: (keys[:, :, j], values[:, :, j]) for j in range(input_ids.size(1))}# 使用缓存的KV计算注意力output = self.attention_with_cache(input_ids[:, -1:])outputs.append(output)# 采样下一个tokennext_token = self.sample(output)input_ids = torch.cat([input_ids, next_token], dim=1)# 缓存管理if len(self.kv_cache) > self.max_cache_size:self.evict_cache()return torch.cat(outputs, dim=1)

评估指标

1. 语言模型评估
def calculate_perplexity(model, eval_dataloader):"""计算困惑度"""model.eval()total_loss = 0total_tokens = 0with torch.no_grad():for batch in eval_dataloader:outputs = model(**batch)loss = outputs['loss']total_loss += loss.item() * batch['labels'].numel()total_tokens += batch['labels'].numel()avg_loss = total_loss / total_tokensperplexity = math.exp(avg_loss)return perplexitydef calculate_bleu(predictions, references):"""计算BLEU分数"""from nltk.translate.bleu_score import corpus_bleu# 分词pred_tokens = [pred.split() for pred in predictions]ref_tokens = [[ref.split()] for ref in references]# 计算BLEUbleu_1 = corpus_bleu(ref_tokens, pred_tokens, weights=(1, 0, 0, 0))bleu_2 = corpus_bleu(ref_tokens, pred_tokens, weights=(0.5, 0.5, 0, 0))bleu_4 = corpus_bleu(ref_tokens, pred_tokens, weights=(0.25, 0.25, 0.25, 0.25))return {'bleu_1': bleu_1, 'bleu_2': bleu_2, 'bleu_4': bleu_4}
2. 时间序列评估
def time_series_metrics(predictions, targets):"""时间序列预测评估指标"""# MAEmae = torch.mean(torch.abs(predictions - targets))# MSEmse = torch.mean((predictions - targets) ** 2)# RMSErmse = torch.sqrt(mse)# MAPEmape = torch.mean(torch.abs((targets - predictions) / targets)) * 100# SMAPEsmape = 200 * torch.mean(torch.abs(predictions - targets) / (torch.abs(predictions) + torch.abs(targets)))return {'mae': mae.item(),'mse': mse.item(),'rmse': rmse.item(),'mape': mape.item(),'smape': smape.item()}

前沿发展 {#前沿}

最新研究方向

1. Mamba:状态空间模型
class MambaBlock(nn.Module):"""线性复杂度的序列建模"""def __init__(self, d_model, d_state=16, d_conv=4, expand=2):super().__init__()self.d_model = d_modelself.d_state = d_stateself.d_conv = d_convself.expand = expandd_inner = int(self.expand * self.d_model)# 投影层self.in_proj = nn.Linear(d_model, d_inner * 2)# 卷积层self.conv1d = nn.Conv1d(d_inner, d_inner,kernel_size=d_conv,groups=d_inner,padding=d_conv - 1)# SSM参数self.x_proj = nn.Linear(d_inner, d_state + d_state + 1)self.dt_proj = nn.Linear(d_state, d_inner)# 输出投影self.out_proj = nn.Linear(d_inner, d_model)def forward(self, x):"""选择性状态空间模型"""# 此处简化了实现# 实际Mamba包含复杂的状态空间计算return self.ssm(x)
2. RWKV:线性Transformer
class RWKV(nn.Module):"""Receptance Weighted Key Value - 线性复杂度的RNN式Transformer"""def __init__(self, n_embd, n_layer):super().__init__()self.blocks = nn.ModuleList([RWKVBlock(n_embd) for _ in range(n_layer)])def forward(self, x, state=None):for block in self.blocks:x, state = block(x, state)return x, state
3. Mixture of Experts (MoE)
class MoELayer(nn.Module):"""稀疏激活的专家混合层"""def __init__(self, d_model, n_experts, n_experts_per_token=2):super().__init__()self.experts = nn.ModuleList([FeedForward(d_model) for _ in range(n_experts)])self.gate = nn.Linear(d_model, n_experts)self.n_experts_per_token = n_experts_per_tokendef forward(self, x):# 计算路由概率gate_logits = self.gate(x)# Top-k路由weights, selected_experts = torch.topk(gate_logits, self.n_experts_per_token)weights = F.softmax(weights, dim=-1)# 稀疏计算results = torch.zeros_like(x)for i, expert in enumerate(self.experts):# 只对被选中的token运行专家expert_mask = (selected_experts == i).any(dim=-1)if expert_mask.any():expert_input = x[expert_mask]expert_output = expert(expert_input)# 加权组合expert_weight = weights[expert_mask, selected_experts[expert_mask] == i]results[expert_mask] += expert_output * expert_weight.unsqueeze(-1)return results

多模态模型

1. CLIP风格的视觉-语言模型
class VisionLanguageModel(nn.Module):"""对比学习的多模态模型"""def __init__(self, vision_encoder, text_encoder, projection_dim=512):super().__init__()self.vision_encoder = vision_encoderself.text_encoder = text_encoder# 投影头self.vision_projection = nn.Linear(vision_encoder.output_dim, projection_dim)self.text_projection = nn.Linear(text_encoder.output_dim, projection_dim)# 温度参数self.temperature = nn.Parameter(torch.ones(1) * 0.07)def forward(self, images, texts):# 编码image_features = self.vision_encoder(images)text_features = self.text_encoder(texts)# 投影和归一化image_embeds = F.normalize(self.vision_projection(image_features), dim=-1)text_embeds = F.normalize(self.text_projection(text_features), dim=-1)# 计算相似度logits_per_image = image_embeds @ text_embeds.T / self.temperaturelogits_per_text = text_embeds @ image_embeds.T / self.temperaturereturn logits_per_image, logits_per_text
2. 统一的多模态Transformer
class UnifiedMultiModalTransformer(nn.Module):"""处理多种模态的统一架构"""def __init__(self, config):super().__init__()# 模态特定的编码器self.text_embedder = TextEmbedder(config)self.image_embedder = ImageEmbedder(config)self.audio_embedder = AudioEmbedder(config)# 共享Transformerself.transformer = nn.ModuleList([TransformerBlock(config) for _ in range(config.n_layers)])# 模态特定的解码器self.text_head = TextGenerationHead(config)self.image_head = ImageGenerationHead(config)def forward(self, inputs, modality_mask):# 多模态嵌入embeddings = []if 'text' in inputs:embeddings.append(self.text_embedder(inputs['text']))if 'image' in inputs:embeddings.append(self.image_embedder(inputs['image']))if 'audio' in inputs:embeddings.append(self.audio_embedder(inputs['audio']))# 拼接所有模态x = torch.cat(embeddings, dim=1)# Transformer处理for block in self.transformer:x = block(x, modality_mask)return x

实用工具和框架

1. Hugging Face Transformers
from transformers import AutoModel, AutoTokenizer# 加载预训练模型
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")# 使用
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)
2. 自定义训练循环
class CustomTrainer:def __init__(self, model, train_dataloader, eval_dataloader, config):self.model = modelself.train_dataloader = train_dataloaderself.eval_dataloader = eval_dataloaderself.config = config# 优化器和调度器self.optimizer = AdamW(model.parameters(), lr=config.learning_rate)self.scheduler = get_linear_schedule_with_warmup(self.optimizer,num_warmup_steps=config.warmup_steps,num_training_steps=config.total_steps)# 混合精度self.scaler = GradScaler() if config.fp16 else None# 记录self.writer = SummaryWriter(config.log_dir)def train(self):global_step = 0for epoch in range(self.config.num_epochs):# 训练self.model.train()for batch in tqdm(self.train_dataloader, desc=f"Epoch {epoch}"):loss = self.training_step(batch)# 记录if global_step % self.config.log_interval == 0:self.writer.add_scalar('train/loss', loss, global_step)# 评估if global_step % self.config.eval_interval == 0:eval_metrics = self.evaluate()for key, value in eval_metrics.items():self.writer.add_scalar(f'eval/{key}', value, global_step)# 保存检查点if global_step % self.config.save_interval == 0:self.save_checkpoint(global_step)global_step += 1def training_step(self, batch):self.optimizer.zero_grad()if self.scaler:with autocast():outputs = self.model(**batch)loss = outputs['loss']self.scaler.scale(loss).backward()self.scaler.unscale_(self.optimizer)torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)self.scaler.step(self.optimizer)self.scaler.update()else:outputs = self.model(**batch)loss = outputs['loss']loss.backward()torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)self.optimizer.step()self.scheduler.step()return loss.item()

1. 循序渐进的结构

  • 从序列模型的基础概念开始
  • 逐步深入到每种架构的核心原理
  • 最后延伸到前沿技术和实战应用

2. 理论与实践结合

  • 每个概念都配有详细的数学公式和原理解释
  • 提供了大量可运行的Python/PyTorch代码示例
  • 包含实际项目中的最佳实践

3. 三大应用领域全覆盖

  • 时间序列预测:从传统方法到最新的Autoformer
  • 计算机视觉:从CNN+RNN到Vision Transformer
  • 大语言模型:从GPT到BERT,再到现代LLM技术

4. 实用性强

  • 包含模型训练、优化和部署的实战技巧
  • 提供了评估指标和调试方法
  • 介绍了主流框架和工具的使用

5. 前沿内容

  • 涵盖了Mamba、RWKV等最新架构
  • 讨论了MoE、多模态等热门方向
  • 包含了Flash Attention、LoRA等优化技术

总结

从RNN到Transformer的深度学习序列模型,

涵盖了:

  1. 基础概念:序列建模的核心挑战和解决方案
  2. 模型架构:RNN、LSTM、Transformer的详细实现
  3. 应用领域:时间序列预测、计算机视觉、自然语言处理
  4. 实战技巧:训练优化、推理加速、评估方法
  5. 前沿发展:最新的研究方向和技术趋势

学习建议

初学者路径:

  1. 掌握RNN基础概念和反向传播
  2. 理解LSTM的门控机制
  3. 深入学习Transformer架构
  4. 实践简单的序列任务

进阶路径:

  1. 研究各种注意力机制变体
  2. 探索大规模预训练技术
  3. 学习分布式训练和优化
  4. 跟踪最新论文和实现

实战项目建议:

  1. 实现一个简单的语言模型
  2. 构建时间序列预测系统
  3. 开发图像描述生成应用
  4. 微调预训练模型解决实际问题
  • 论文:Attention is All You Need, BERT, GPT系列
  • 课程:Stanford CS224N, Fast.ai
  • 框架:PyTorch, TensorFlow, JAX
  • 社区:Hugging Face, Papers with Code

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/web/95148.shtml
繁体地址,请注明出处:http://hk.pswp.cn/web/95148.shtml
英文地址,请注明出处:http://en.pswp.cn/web/95148.shtml

如若内容造成侵权/违法违规/事实不符,请联系英文站点网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Java进阶】Java与SpringBoot线程池深度优化指南

Java与SpringBoot线程池深度优化指南Java与SpringBoot线程池深度优化指南一、Java原生线程池核心原理1. ThreadPoolExecutor 核心参数关键参数解析&#xff1a;2. 阻塞队列选择策略3. 拒绝策略对比二、SpringBoot线程池配置与优化1. 自动配置线程池2. 异步任务配置类3. 自定义异…

mysql(自写)

Mysql介于应用和数据之间&#xff0c;通过一些设计 &#xff0c;将大量数据变成一张张像excel的数据表数据页&#xff1a;mysql将数据拆成一个一个的数据页索引&#xff1a;为每个页加入页号&#xff0c;再为每行数据加入序号&#xff0c;这个序号就是所谓的主键。 将每个页的…

Nginx 502 Bad Gateway:从 upstream 日志到 FastCGI 超时复盘

Nginx 502 Bad Gateway&#xff1a;从 upstream 日志到 FastCGI 超时复盘 &#x1f31f; Hello&#xff0c;我是摘星&#xff01; &#x1f308; 在彩虹般绚烂的技术栈中&#xff0c;我是那个永不停歇的色彩收集者。 &#x1f98b; 每一个优化都是我培育的花朵&#xff0c;每一…

Dreamore AI-解读并描绘你的梦境

本文转载自&#xff1a;Dreamore AI-解读并描绘你的梦境 - Hello123工具导航 ** 一、&#x1f319; 初识 Dreamore AI&#xff1a;你的智能梦境伴侣 Dreamore AI 是一款超有趣的AI 梦境解析与可视化工具&#xff0c;它巧妙地把梦境解读和图像生成这两大功能融为一体。你只需要…

集合-单列集合(Collection)

List系列集合&#xff1a;添加的元素是有序、可重复、有索引的。Set系列集合&#xff1a;添加的元素是无序、不重复、无索引的。代码&#xff1a;public class A01_CollectionDemo1 {public static void main(String[] args) {/** 注意点&#xff1a;Collection是一个接口&…

写一个 RTX 5080 上的 cuda gemm fp16

1. cpu 计算 fp16 四则运算由于会用到cpu 的gemm 与 gpu gemm 的对比验证&#xff0c;所以&#xff0c;这里稍微解释一下 cpu 计算fp16 gemm 的过程。这里为了简化理解&#xff0c;cpu 中不使用 avx 相关的 fp16 运算器&#xff0c;而是直接使用 cpu 原先的 ALU 功能。这里使用…

web渗透PHP反序列化漏洞

web渗透PHP反序列化漏洞1&#xff09;PHP反序列化漏洞反序列我们可以控制对象中的值进行攻击O:1:"C":1:{s:3:"cmd";s:8:"ipconfig";}http://127.0.0.1/1.php?xO:1:%22C%22:1:{s:3:%22cmd%22;s:3:%22ver%22;}常见的反序列化魔术方法&#xff1a;…

FPGA学习笔记——SPI读写FLASH

目录 一、任务 二、需求分析 三、Visio图 四、具体分析 五、IP核配置 六、代码 七、实验现象 一、任务 实验任务&#xff1a; 1.按下按键key1&#xff0c;开启读ID操作&#xff0c;将读出来的ID&#xff0c;通过串口发送至PC端显示&#xff0c;显示格式为“读ID:XX-XX-XX…

一句话PHP木马——Web渗透测试中的隐形杀手

文章目录前言什么是"一句话木马"&#xff1f;常见变种与隐藏技巧1. 函数变种2. 加密混淆3. 变量拆分4. 特殊字符编码上传技巧与绕过防御常见上传绕过技巧检测与防御措施1. 服务器配置2. 上传验证3. 代码审计4. Web应用防火墙(WAF)实战案例分析深度思考&#xff1a;安…

房屋租赁系统|基于SpringBoot和Vue的房屋租赁系统(源码+数据库+文档)

项目介绍 : SpringbootMavenMybatis PlusVue Element UIMysql 开发的前后端分离的房屋租赁系统&#xff0c;项目分为管理端和用户端以及房主端 项目演示: 基于SpringBoot和Vue的房屋租赁系统 运行环境: 最好是java jdk 1.8&#xff0c;我们在这个平台上运行的。其他版本理论上…

C++动态规划——经典题目(下)

上一篇文章没有写全&#xff0c;这篇再补两道题酒鬼#include<bits/stdc.h> using namespace std; int dp[110][10]{0}; int a[1010]{0}; int n,m; int main() {cin>>n;dp[0][0]0;dp[1][0]0;dp[1][1]a[1];for(int i1;i<n;i){cin>>a[i];}for(int i2;i<n;…

介绍Ansible和实施Ansible PlayBook

第一章 介绍Ansible1. ansible的特点是什么&#xff1f;a. ansible使用yaml语法&#xff0c;语言格式简洁明了。b. ansible不需要代理&#xff0c;仅仅通过SSH远程连接就可以控制受管主机&#xff0c;是一种非常便捷、安全的方法。c. Ansible的功能强大&#xff0c;可以利用ans…

ComfyUI驱动的流程化大体量程序开发:构建上下文隔离的稳定系统

ComfyUI驱动的流程化大体量程序开发&#xff1a;构建上下文隔离的稳定系统 在现代软件工程中&#xff0c;随着程序体量的不断增长&#xff0c;上下文污染&#xff08;Context Pollution&#xff09;和状态依赖混乱已成为导致系统不稳定、调试困难、维护成本高昂的核心问题。尤…

基于SpringBoot的协同过滤余弦函数的美食推荐系统(爬虫Python)的设计与实现

基于SpringBootvue的协同过滤余弦函数的个性化美食(商城)推荐系统(爬虫Python)的设计与实现 1、项目的设计初衷&#xff1a; 随着互联网技术的快速发展和人们生活水平的不断提高&#xff0c;传统的美食消费模式已经无法满足现代消费者日益个性化和多样化的需求。在信息爆炸的时…

机器视觉学习-day19-图像亮度变换

1 亮度和对比度亮度&#xff1a;图像像素的整体强度&#xff0c;亮度提高就是所有的像素加一个固定值。对比度&#xff1a;当对比度提高时&#xff0c;图像的暗部与亮部的差值会变大。OpenCV调整图像亮度和对比度的公式使用一个&#xff1a;代码实践步骤&#xff1a;图片输入→…

redis详解 (最开始写博客是写redis 纪念日在写一篇redis)

Redis技术 1. Redis简介 定义与核心特性&#xff08;内存数据库、键值存储&#xff09; Redis&#xff08;Remote Dictionary Server&#xff0c;远程字典服务&#xff09;是一个开源的、基于内存的高性能键值存储数据库&#xff0c;由 Salvatore Sanfilippo 编写&#xff0c;用…

【MD文本编辑器Typora】实用工具推荐之——轻量级 Markdown 编辑器Typora下载安装使用教程 办公学习神器

本文将向大家介绍一款轻量级 Markdown 编辑器——Typora&#xff0c;并详细说明其下载、安装与基本使用方法。 引言&#xff1a; MD 格式文档指的是使用 Markdown 语言编写的文本文件&#xff0c;其文件扩展名为 .md。 Markdown 是一种由约翰格鲁伯&#xff08;John Gruber&am…

Vue2+Element 初学

大致实现以上效果 一、左侧自动加载菜单NavMenu.vue 菜单组件&#xff0c;简单调整了一下菜单直接的距离&#xff0c;代码如下&#xff1a;<template><div><template v-for"item in menus"><!-- 3、有子菜单&#xff0c;设置不同的 key 和 inde…

Shell编程知识整理

文章目录一、Shell介绍1.1 简介1.2 Shell解释器二、快速入门2.1 编写Shell脚本2.2 执行Shell脚本2.3 小结三、Shell程序&#xff1a;变量3.1 语法格式3.2 变量使用3.3 变量类型四、字符串4.1 单引号4.2 双引号4.3 获取字符串长度4.4 提取子字符串4.5 查找子字符串五、Shell程序…

AI与低代码的激情碰撞:微软Power Platform融合GPT-4实战之旅

引言 在当今数字化飞速发展的时代,AI 与低代码技术正成为推动企业变革的核心力量。AI 凭借其强大的数据分析、预测和决策能力,为企业提供了智能化的解决方案;而低代码开发平台则以其可视化、快速迭代的特性,大大降低了应用开发的门槛和成本。这两者的结合,开启了一场全新的…