一 Nanonets-OCR

简介

Nanonets-OCR不再满足于单纯提取文本,它能智能解析图像中的公式、表格、水印、签名、图表、复选框等复杂结构,并输出格式清晰的 Markdown。

核心功能

LaTeX 公式识别:自动将文中数学公式转为标准 LaTeX 格式

智能图像描述:识别图表、二维码等内容并生成结构化描述

签名识别与隔离:可精准定位文档中的签名内容

水印提取:有效检测并提取文档中的水印信息

复选框识别:将复选框状态标准化为统一符号,便于后续处理

复杂表格提取:支持嵌套结构的表格识别,输出 Markdown/HTML 格式

训练过程

模型基于 25 万页图文数据训练,涵盖科研、金融、医疗、法律、发票、收据等多个行业,结合合成数据与人工标注,最终在 Qwen2.5-VL-3B 基础上完成精调。
注意事项:

● 微调数据中不包含手写数据,因此暂不支持手写体识别

● 仍有一定幻觉风险(模型大小限制)

二 效果测试

在线试用:https://huggingface.co/spaces/Souvik3333/Nanonets-ocr-s

● 论文封面(带水印)
论文封面-带水印

输出的Markdown结果:

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Zhang Li¹, Yuliang Liu¹,†, Qiang Liu², Zhiyin Ma¹, Ziyang Zhang¹, Shuo Zhang¹, Zidun Guo¹, Jiarui Zhang², Xinyu Wang¹, Xiang Bai¹¹Huazhong University of Science and Technology, ²Kingsoft Office<img>
A bar chart comparing performance metrics across different datasets and models. The x-axis shows different document types (e.g., Formula (EN), Formula (ZH), Table (EN), Table (ZH), Exam paper, Academic Papers, Newspaper, Overall, Infer Speed) and their corresponding values. The y-axis shows the performance metric (e.g., accuracy, speed). The bars represent different models and their corresponding values.
</img>Figure 1: Performance comparison of MonkeyOCR and other SOTA models on OmniDocBench [33]. “Overall” represents the comprehensive evaluation across nine document types in OmniDocBench.Abstract
We introduce MonkeyOCR, a vision-language model for document parsing that advances the state of the art by leveraging a Structure-Recognition-Relation (SRR) triplet paradigm. This design simplifies what would otherwise be a complex multi-tool pipeline (as in MinerU’s modular approach) and avoids the inefficiencies of processing full pages with giant end-to-end models (e.g., large multimodal LLMs like Qwen-VL). In SRR, document parsing is abstracted into three fundamental questions – “Where is it?” (structure), “What is it?” (recognition), and “How is it organized?” (relation) – corresponding to layout analysis, content identification, and logical ordering. This focused decomposition balances accuracy and speed: it enables efficient, scalable processing without sacrificing precision. To train and evaluate this approach, we introduce the MonkeyDoc (the most comprehensive document parsing dataset to date), with 3.9 million instances spanning over ten document types in both Chinese and English. Experiments show that MonkeyOCR outperforms MinerU by an average of 5.1%, with particularly notable improvements on challenging content such as formulas (+15.0%) and tables (+8.6%). Remarkably, our 3B-parameter model surpasses much larger and top-performing models, including Qwen2.5-VL (72B) and Gemini 2.5 Pro, achieving state-of-the-art average performance on English document parsing tasks. In addition, MonkeyOCR processes multi-page documents significantly faster (0.84 pages per second compared to 0.65 for MinerU and 0.12 for Qwen2.5-VL-7B). The 3B model can be efficiently deployed for inference on a single NVIDIA 3090 GPU. Code and models will be released at https://github.com/Yuliang-Liu/MonkeyOCR.<watermark>arXiv:2506.05218v1 [cs.CV] 5 Jun 2025</watermark>Technical Report. †Project lead.

● 带表格和图片
带有图表的论文

输出的Markdown结果:

<table>
<thead>
<tr>
<th><strong>Model Type</strong></th>
<th><strong>Models</strong></th>
<th><strong>Book</strong></th>
<th><strong>Slides</strong></th>
<th><strong>Financial Report</strong></th>
<th><strong>Textbook</strong></th>
<th><strong>Exam Paper</strong></th>
<th><strong>Magazine</strong></th>
<th><strong>Academic Papers</strong></th>
<th><strong>Notes</strong></th>
<th><strong>Newspaper</strong></th>
<th><strong>Overall</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3"><strong>Pipeline Tools</strong></td>
<td>MinerU <sup>[43]</sup></td>
<td>0.055</td>
<td>0.124</td>
<td>0.033</td>
<td>0.102</td>
<td>0.159</td>
<td><strong>0.072</strong></td>
<td>0.025</td>
<td>0.984</td>
<td>0.171</td>
<td>0.206</td>
</tr>
<tr>
<td>Marker <sup>[35]</sup></td>
<td>0.074</td>
<td>0.340</td>
<td>0.089</td>
<td>0.319</td>
<td>0.452</td>
<td>0.153</td>
<td>0.059</td>
<td>0.651</td>
<td>0.192</td>
<td>0.274</td>
</tr>
<tr>
<td>Mathpix <sup>[26]</sup></td>
<td>0.131</td>
<td>0.220</td>
<td>0.202</td>
<td>0.216</td>
<td>0.278</td>
<td>0.147</td>
<td>0.091</td>
<td>0.634</td>
<td>0.690</td>
<td>0.300</td>
</tr>
<tr>
<td rowspan="3"><strong>Expert VLMs</strong></td>
<td>GOT-OCR <sup>[45]</sup></td>
<td>0.111</td>
<td>0.222</td>
<td>0.067</td>
<td>0.132</td>
<td>0.204</td>
<td>0.198</td>
<td>0.179</td>
<td>0.388</td>
<td>0.771</td>
<td>0.267</td>
</tr>
<tr>
<td>Nougat <sup>[3]</sup></td>
<td>0.734</td>
<td>0.958</td>
<td>1.000</td>
<td>0.820</td>
<td>0.930</td>
<td>0.830</td>
<td>0.214</td>
<td>0.991</td>
<td>0.871</td>
<td>0.806</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td rowspan="3"><strong>General VLMs</strong></td>
<td>GPT4o <sup>[32]</sup></td>
<td>0.157</td>
<td>0.163</td>
<td>0.348</td>
<td>0.187</td>
<td>0.281</td>
<td>0.173</td>
<td>0.146</td>
<td>0.607</td>
<td>0.751</td>
<td>0.316</td>
</tr>
<tr>
<td>Qwen2.5-VL-7B <sup>[1]</sup></td>
<td>0.148</td>
<td><strong>0.053</strong></td>
<td>0.111</td>
<td>0.137</td>
<td>0.189</td>
<td>0.117</td>
<td>0.134</td>
<td>0.204</td>
<td>0.706</td>
<td>0.205</td>
</tr>
<tr>
<td>InternVL3-8B <sup>[5]</sup></td>
<td>0.163</td>
<td><strong>0.056</strong></td>
<td>0.107</td>
<td>0.109</td>
<td><strong>0.129</strong></td>
<td>0.100</td>
<td>0.159</td>
<td><strong>0.150</strong></td>
<td>0.681</td>
<td>0.188</td>
</tr>
<tr>
<td rowspan="2"><strong>Mix</strong></td>
<td><strong>MonkeyOCR-3B</strong></td>
<td><strong>0.046</strong></td>
<td>0.120</td>
<td><strong>0.024</strong></td>
<td><strong>0.100</strong></td>
<td><strong>0.129</strong></td>
<td><strong>0.086</strong></td>
<td><strong>0.024</strong></td>
<td>0.643</td>
<td><strong>0.131</strong></td>
<td><strong>0.155</strong></td>
</tr>
<tr>
<td><strong>MonkeyOCR-3B*</strong></td>
<td>0.054</td>
<td>0.203</td>
<td>0.038</td>
<td>0.112</td>
<td>0.138</td>
<td>0.032</td>
<td><strong>0.194</strong></td>
<td>0.136</td>
<td><strong>0.120</strong></td>
<td></td>
</tr>
</tbody>
</table>Table 3: The end-to-end text recognition performance on OmniDocBench across 9 PDF page types.
* represents the use of the layout model trained by us with improved capability for Chinese layout detection.Specifically, it attained the highest end-to-end recognition accuracy in six categories. The 3B model outperformed InternVL3-8B <sup>[5]</sup> by 5% and surpassed MinerU <sup>[43]</sup> by 3.3% in overall accuracy. Notably, on the newspaper category, MonkeyOCR outperformed the previous state-of-the-art MinerU by 4%, demonstrating its strong capability in parsing dense and complex layouts. These results highlight MonkeyOCR’s superior generalization ability and robustness across various document types. Moreover, benefiting from enhanced Chinese language capabilities, MonkeyOCR* outperforms the original version by 44.9% on the notes category, achieving state-of-the-art overall performance.5.3 Implement DetailsDuring the training process, we utilize the AdamW optimizer with a learning rate of 2e-5 and a cosine learning rate schedule. We employ a batch size of 64. Our 3B model was trained for 53 hours on 32 A800 GPUs. By integrating with LMDeploy <sup>[7]</sup>, our model can successfully run on RTX 3090 GPUs.&lt;img&gt;Bar chart comparing performance of MonkeyOCR-3B, Gemini2.0-flash, Gemini2.5-Pro, Qwen2-VL-72B, Qwen2.5-VL-72B, and InternVL2-76B across different document parsing tasks.&lt;/img&gt;
**Figure 6:** **End-to-end evaluation on OmniDocBench.** Performance comparison of MonkeyOCR with closed-source and extra-large open-source VLMs across different document parsing tasks.6 DiscussionAs is well-established, increasing model scale generally leads to improved performance. To further explore the potential of MonkeyOCR, we conducted comparative evaluations against both larger open-source models and leading closed-source commercial solutions on OmniDocBench. As illustrated in Figure 6, MonkeyOCR achieves the highest overall performance on English documents, outperforming Qwen2.5-VL-72B by 7.4% and surpassing the current state-of-the-art closed-source model, Gemini 2.5-Pro, by 0.8%. However, Gemini 2.5-Pro demonstrates slightly better performance on Chinese documents, indicating there is still some margin for improvement in MonkeyOCR’s Chinese document parsing capabilities.

甚至部分加粗都能识别正确
识别结果

● latex公式
带有公式的论文

输出的Markdown结果:

selecting high-scoring pages for inclusion. This results in a curated set of 951,000 high-quality samples.**Manual Annotation for Chinese Documents.** To address the limited availability of Chinese region-level reading order annotations, we manually annotate a diverse set of Chinese documents, including research reports, academic papers, user manuals, books, test papers, slides, official documents, newspapers, journals, and contracts. This effort produces an additional 154,000 high-quality samples, substantially enhancing the representation of Chinese document scenarios.**Expert Model-Based Auto-Annotation.** For datasets that provide only region-level bounding boxes without reading order information, we leverage expert models to generate region-level reading order annotations automatically. Specifically, we utilize PPOCR [17] for line-wise text recognition within each region, obtain text line positions, and then apply LayoutReader [44] to predict the reading order of these lines. The region-level order is determined by aggregating the predicted order of all text lines within each region. Through this approach, we generate 78,000 additional region-level annotations, further enriching the diversity and coverage of our dataset.## 4 MonkeyOCR&lt;img&gt;A diagram illustrating the overall architecture of MonkeyOCR. It shows a pipeline with four main stages: Diverse PDF Types, Structure, Recognition, and Relation. Each stage is represented by a box with an arrow pointing to the next stage. The Diverse PDF Types stage shows various types of documents like textbooks, exam papers, academic papers, books, newspapers, slides, notes, financial reports, and magazines. The Structure stage involves cropping the input image, followed by recognition and relation prediction. The Recognition stage extracts structured information from each region in parallel. The Relation stage determines the logical reading order of the detected elements. Finally, the output is serialized as HTML, JSON, or Markdown.&lt;/img&gt;**Figure 5: The overall architecture of MonkeyOCR.** The system adopts a Structure-Recognition-Relation framework, consisting of structure detection, which locates and classifies semantic regions; block-level content recognition, which extracts structured information from each region in parallel; and relation prediction, which determines the logical reading order of the detected elements.The proposed method, **MonkeyOCR**, addresses the fundamental limitations of both pipeline-based and end-to-end document parsing approaches by introducing a modular yet globally optimized Structure-Recognition-Relation (SRR) framework. As illustrated in Figure 5, we decompose the document parsing process into three relatively independent but tightly integrated stages: *structure detection*, *block-level content recognition*, and *relation prediction*. This design aims to mitigate the cumulative error typically observed in pipeline toolchains, while also improving inference efficiency by reducing the context length compared to monolithic end-to-end models.In the first stage, a YOLO-based [49] document layout detector processes the input image $I \in \mathbb{R}^{H \times W \times 3}$, producing a set of bounding boxes $B = \{b_1, b_2, \ldots, b_n\}$ and their corresponding element types $T = \{t_1, t_2, \ldots, t_n\}$. Each bounding box $b_i = (x_{1i}, y_{1i}, x_{2i}, y_{2i})$ represents the spatial coordinates of the $i$-th element, and the element type $t_i \in \{\text{text}, \text{table}, \text{formula}, \text{figure}, \ldots\}$ specifies the category of the detected element.For the second stage, we perform block-level content recognition in parallel. Each detected region $b_i$ is cropped and, together with a type-specific prompt $p_{t_i}$, is fed into our LMM for type-aware content extraction:$$C = \text{LMM}( \{I^1_{\text{crop}}, I^2_{\text{crop}}, \ldots, I^n_{\text{crop}}\}, \{p_{t_1}, p_{t_2}, \ldots, p_{t_n}\}),$$

包含Latex的图片也不在话下,公式完全正确保存为Latex语法。

● 带水印
带有多个水印的论文

输出的Markdown结果:

selecting high-scoring pages for inclusion. This results in a curated set of 951,000 high-quality samples.**Manual Annotation for Chinese Documents.** To address the limited availability of Chinese region-level reading order annotations, we manually annotate a diverse set of Chinese documents, including research reports, academic papers, user manuals, books, test papers, slides, official documents, newspapers, journals, and contracts. This effort produces an additional 154,000 high-quality samples, substantially enhancing the representation of Chinese document scenarios.**Expert Model-Based Auto-Annotation.** For datasets that provide only region-level bounding boxes without reading order information, we leverage expert models to generate region-level reading order annotations automatically. Specifically, we utilize PPOCR [17] for line-wise text recognition within each region, obtain text line positions, and then apply LayoutReader [44] to predict the reading order of these lines. The region-level order is determined by aggregating the predicted order of all text lines within each region. Through this approach, we generate 78,000 additional region-level annotations, further enriching the diversity and coverage of our dataset.## 4 MonkeyOCR&lt;img&gt;A diagram illustrating the overall architecture of MonkeyOCR. It shows the process of converting diverse PDF types into structured data. The steps include structure detection, block-level content recognition, and relation prediction.&lt;/img&gt;**Figure 5: The overall architecture of MonkeyOCR.** The system adopts a Structure-Recognition-Relation framework, consisting of structure detection, which locates and classifies semantic regions; block-level content recognition, which extracts structured information from each region in parallel; and relation prediction, which determines the logical reading order of the detected elements.The proposed method, **MonkeyOCR**, addresses the fundamental limitations of both pipeline-based and end-to-end document parsing approaches by introducing a modular yet globally optimized Structure-Recognition-Relation (SRR) framework. As illustrated in Figure 5, we decompose the document parsing process into three relatively independent but tightly integrated stages: *structure detection*, *block-level content recognition*, and *relation prediction*. This design aims to mitigate the cumulative error typically observed in pipeline toolchains, while also improving inference efficiency by reducing the context length compared to monolithic end-to-end models.In the first stage, a YOLO-based [49] document layout detector processes the input image $I \in \mathbb{R}^{H \times W \times 3}$, producing a set of bounding boxes $B = \{b_1, b_2, \ldots, b_n\}$ and their corresponding element types $T = \{t_1, t_2, \ldots, t_n\}$. Each bounding box $b_i = (x_{1i}, y_{1i}, x_{2i}, y_{2i})$ represents the spatial coordinates of the $i$-th element, and the element type $t_i \in \{\text{text}, \text{table}, \text{formula}, \text{figure}, \ldots\}$ specifies the category of the detected element.For the second stage, we perform block-level content recognition in parallel. Each detected region $b_i$ is cropped and, together with a type-specific prompt $p_{ti}$, is fed into our LMM for type-aware content extraction:$$C = \text{LMM}(\{I^1_\text{crop}, I^2_\text{crop}, \ldots, I^n_\text{crop}\}, \{p_{t_1}, p_{t_2}, \ldots, p_{t_n}\}),$$

水印完全去除,同时正文、图片以及公式等内容都正常显示。

三 总结

传统的Pipeline方式,只能检测出图片,无法处理图片的内容;相比之下,Nanonets-OCR模型,不只是看得见文字,更能从图片中提取出具体的语义信息,从而丰富文档的内容。

在一些高级RAG场景中,可以借助VLM的多模态能力,对图片进行总结,在召回阶段对图片的语义信息进行向量检索,即可召回相关的图片,增加RAG的可信度。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/web/86454.shtml
繁体地址,请注明出处:http://hk.pswp.cn/web/86454.shtml
英文地址,请注明出处:http://en.pswp.cn/web/86454.shtml

如若内容造成侵权/违法违规/事实不符,请联系英文站点网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Git下载与使用完全指南:从安装到基础操作详解,附上git的学习网站(很直观)(可以模拟git的全过程)

一、Git简介与下载安装 1.1 Git是什么&#xff1f; Git是目前世界上最先进的分布式版本控制系统&#xff0c;由Linus Torvalds&#xff08;Linux之父&#xff09;开发。它可以高效地处理从小型到大型项目的版本管理&#xff0c;具有以下特点&#xff1a; 分布式架构&#xff…

论分布式设计

20250419-作 题目 分布式是指将一个系统或任务分解成多个子部分&#xff0c;并在多个计算机或服务器之间进行协同工作的方式。每个子部分都可以在不同的计算机节点上运行&#xff0c;彼此之间通过网络进行通信和协调。分布式技术在当今互联网应用中起着重要作用&#xff0c;例…

Vue样式绑定与条件渲染详

一、Vue样式绑定 在Vue中&#xff0c;我们可以通过多种方式动态地绑定样式&#xff0c;让界面根据数据状态变化而自动更新样式。 1. class样式绑定 (1) 字符串写法 适用场景&#xff1a;样式的类名不确定&#xff0c;需要动态指定 <template><div><!-- 绑定…

固态电池火热-美国固态电池企业QuantumScape宣布,产能规模化迈出了重要一步

美国固态电池企业QuantumScape宣布&#xff0c;其先进的Cobra隔膜工艺已成功集成到基线电池生产中&#xff0c;标志着公司生产能力规模化迈出了重要一步。 添加图片注释&#xff0c;不超过 140 字&#xff08;可选&#xff09; 600478 科力远 业绩固态电池 | 1.科力远发布20…

Python 商务数据分析—— NumPy 学习笔记Ⅰ

一、NumPy 简介 1.1 NumPy 特性 高性能科学计算库&#xff1a;专为处理多维数组设计&#xff0c;底层用 C 语言实现&#xff0c;运算速度远超 Python 原生列表。 矢量运算&#xff1a;支持批量数据操作&#xff0c;避免显式循环&#xff0c;代码更简洁高效。 广播机制&…

中州养老:搭建环境(第二节)

目录 项目初始工程搭建: 不同项目需要的前后端环境也不同 前端项目搭建: 熟悉模块的方式 代码阅读 如何开发一个接口 Swagger(接口文档) Api注解的说明 ​​​​​​​项目初始工程搭建: 公司项目分两种,新立项目(0-1)和已有项目开发(1-2) 熟悉项目结构,每个模块对应的…

[1-01-01].第78节:Java8新特性 - Lambda表达式

java基础语法大纲 一、Lambda 表达式 1.1.概述&#xff1a; 1.Lambda 是一个匿名函数&#xff0c;我们可以把 Lambda 表达式理解为是一段可以传递的代码&#xff08;将代码像数据一样进行传递&#xff09;2.使用Lambda 表达式可以写出更简洁、更灵活的代码。作为一种更紧凑的…

【2025.6.27 校内 NOI 模拟赛】总结(贪心, 容斥、组合计数, dsu on tree、 虚树)

文章目录 时间安排反思题解[六省联考 2017] 期末考试&#xff08;贪心&#xff0c; 枚举&#xff09;[JSOI2019] 神经网络&#xff08;容斥&#xff0c; 组合计数&#xff0c; 树背包&#xff09;[ZJOI2019] 语言&#xff08;dsu on tree&#xff0c; 虚树&#xff0c; 结论&am…

实际前端开发中,常用指令的封装

实际前端开发中&#xff0c;常用指令的封装 全局指令处理步骤main.ts指令目录结构src/directives/index.ts 一、输入框空格禁止指令1、指令文件clearSpace.ts2、指令使用 全局指令处理步骤 main.ts import { createApp } from "vue"; import App from "./App.…

鸿蒙OH南向开发 轻量系统内核(LiteOS-M)【异常调测】

基本概念 OpenHarmony LiteOS-M提供异常接管调测手段&#xff0c;帮助开发者定位分析问题。异常接管是操作系统对运行期间发生的异常情况进行处理的一系列动作&#xff0c;例如打印异常发生时异常类型、发生异常时的系统状态、当前函数的调用栈信息、CPU现场信息、任务调用堆栈…

算法-堆排序

文章目录 整体架构流程技术细节小结 整体架构流程 大顶推&#xff1a;是构建一个完整的二叉树 大顶推&#xff1a;即父节点的值大于左右子树的值。 循环构建大顶推 在给定的数组&#xff0c;既可以明确树的高度。 在循环的时候&#xff0c;构建树的高度从lgn至0。即从堆低往堆…

【鸿蒙HarmonyOS Next App实战开发】二维码生成技术实现与解析

随着移动应用开发中对便捷交互体验的需求日益增长&#xff0c;二维码作为信息传递的重要载体&#xff0c;其生成与使用变得越来越普遍。本文将基于鸿蒙HarmonyOS应用开发框架&#xff0c;详细介绍如何实现一个功能完备的二维码生成器&#xff0c;并附上完整代码解析。 注意该实…

1 Studying《Is Parallel Programming Hard》6-9

目录 Chapter 6 Partitioning and Synchronization Design 6.1 分区练习 6.2 设计准则 6.3 同步粒度 6.4 并行快速路径 6.5 超越党派分歧 6.6 分区、并行和优化 Chapter 7 Locking 7.1 活命 7.2 锁的类型 7.3 锁定实施问题 7.4 基于锁的存在性保证 7.5 锁定&a…

Java练习题精选16-20

Java练习题精选16-20 一、第十六题二、第十七题三、第十八题四、第十九题五、第二十题一、第十六题 现有一个存放学生成绩的数组{66, 77, 88, 99},要求将该数组正序输出每个下标所对应的元素。 public class Test {public static void main(String[] args) {int<

新能源知识库(68)汽车电镀与蒸汽

汽车电镀是提升零部件耐磨性、抗腐蚀性和美观性的关键工艺&#xff0c;其流程根据基材&#xff08;金属或塑料&#xff09;和部件功能需求有所差异。 汽车电镀是以 基材特性和 功能需求为导向的精密工艺&#xff1a; ​金属件​&#xff1a;核心流程为 ​除油→酸洗→电镀→钝…

Veo 3 视频生成大模型完整操作教程(2025)

随着 AI 多模态能力的飞跃&#xff0c;Google DeepMind 发布的 Veo 3 成为了生成视频领域的一颗重磅炸弹。它不仅能够根据文本生成高质量的视频画面&#xff0c;还能同步生成对白、背景音和环境音&#xff0c;是目前最接近真正“AI 导演”的大模型。 本文将带你详细了解 Veo 3…

10【认识文件系统】

1 认识硬件——磁盘 1.1 物理构成 磁盘是计算机中唯一的机械设备&#xff0c;同时也是一种外部存储设备&#xff08;外设&#xff09;。早期的计算机通常配备的是机械硬盘&#xff08;HDD&#xff09;&#xff0c;依靠磁头和盘片的机械运动来进行数据的读写。但随着用户对计算…

Windows命令连接符的安全风险分析与防御策略

1. 命令连接符简介 在 Windows 的命令行环境&#xff08;CMD/PowerShell&#xff09;中&#xff0c;命令连接符用于在同一行执行多个命令&#xff0c;提高效率。然而&#xff0c;攻击者常利用这些符号构造恶意命令&#xff0c;绕过安全检测或执行多阶段攻击。 常见命令连接符…

大屏可视化制作指南

一、大屏可视化概述 &#xff08;一&#xff09;概念 大屏可视化是指通过大屏幕展示复杂数据的视觉呈现形式&#xff0c;它借助图形、图表、地图等元素&#xff0c;将海量数据以直观易懂的方式呈现出来&#xff0c;帮助用户快速理解数据背后的含义和价值。 &#xff08;二&a…

Halcon ——— OCR字符提取与多类型识别技术详解

工业视觉实战&#xff1a;OCR字符提取与多类型识别技术详解 在工业自动化领域&#xff0c;OCR字符提取是产品追溯、质量控制和信息读取的核心技术。本文将深入解析Halcon中OCR字符提取的全流程&#xff0c;重点解释核心算子参数&#xff0c;并提供完整的工业级代码实现。 一、O…