内核源码及路径

路径	函数及宏功能
kernel\dma\mapping.c	dma_alloc_attrs
kernel\kernel\dma\coherent.c	dma_declare_coherent_memory dma_alloc_from_dev_coherent DMA设备一致性内存分配
\kernel\kernel\dma\direct.c	dma_direct_alloc DMA CMA内存分配
\kernel\include\linux\dma-mapping.h	dma_alloc_coherent
arch\arm64\mm	arch_dma_prep_coherent 分配内存页后，将内存页转换虚拟地址并调用__dma_flush_area
\kernel\arch\arm64\mm\cache.S	__dma_flush_area 功能clean & invalidate D / U line
I:\rk3588\kernel\arch\arm64\include\asm\pgtable.h	pgprot_syscached 功能Mark the prot value as outer cacheable and inner non-cacheable

CONFIG_DMA_DECLARE_COHERENT

声明设备默认支持硬件一致性DMA（Hardware-Coherent DMA），使得内核在分配DMA缓冲区时，自动假设设备与CPU缓存保持一致，无需软件维护同步。

场景	启用 `CONFIG_DMA_DECLARE_COHERENT`	不启用
内存分配	`dma_alloc_coherent()` 返回硬件一致性内存	默认返回非一致性内存（需手动同步）
同步操作	无需调用 `dma_sync_*`	必须显式同步缓存
设备树/ACPI配置	需设备节点包含 `dma-coherent` 属性	无特殊要求
性能	更高（无同步开销）	较低（依赖软件同步

如何验证硬件是否真正支持一致性？

检查设备手册是否声明支持（如ARM的ACP或PCIe的ATS）。在驱动中故意省略dma_sync_*，测试数据传输是否正常。

DTS示例配置

reserved-memory {#address-cells = <1>;#size-cells = <1>;ranges;my_coherent_pool: coherent_pool@0x10000000 {compatible = "shared-dma-pool";reg = <0x10000000 0x400000>; // 4MB区域no-map;};
};my_device: my_device@0 {compatible = "vendor,coherent-device";memory-region = <&my_coherent_pool>; // 关联内存区域dma-coherent;
};

基于上述的DTS与内核配置，在分配一致性内存时，从上述DTS中的区域分配，而非从cma分配。例如我们可以将高地址内存预留出来，通过上述方式给我们的视频接口使用。

此外可以在驱动中调用接口 dma_declare_coherent_memory将保留的物理内存与设备关联。进而绕开cma

dma_direct_alloc

特殊属性快速路径 (`DMA_ATTR_NO_KERNEL_MAPPING`)

if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) {page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));return page; // 返回 page 结构而非虚拟地址
}

应用场景：当内核不需要访问该内存时（如纯设备间DMA）

主体流程

1. 内存分配核心

page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);

使用 CMA 或 buddy 分配器获取物理连续页
明确排除 __GFP_ZERO 以优化性能（后续手动清零）

2. 地址转换

*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));

将物理地址转换为设备可识别的 DMA 地址
处理可能的地址偏移（如 SMMU 前向窗口）

3. 缓存一致性处理

arch_dma_prep_coherent(page, size);

确保 CPU 缓存与内存一致。这里是在分配内存后，被使用前，进行的一致性处理。架构特定实现（如 ARM 的 cache 刷写）

映射

主要属性控制，如上一篇所述

attrs不同属性的cache处理

/** Return the page attributes used for mapping dma_alloc_* memory, either in* kernel space if remapping is needed, or to userspace through dma_mmap_*.*/
pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
{if (force_dma_unencrypted(dev))prot = pgprot_decrypted(prot);if (dev_is_dma_coherent(dev))return prot;
#ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINEif (attrs & DMA_ATTR_WRITE_COMBINE)return pgprot_writecombine(prot);
#endifif (attrs & DMA_ATTR_SYS_CACHE_ONLY ||attrs & DMA_ATTR_SYS_CACHE_ONLY_NWA)return pgprot_syscached(prot);return pgprot_dmacoherent(prot);
}#define pgprot_dmacoherent(prot)	pgprot_noncached(prot)  //关闭cache

cache的标示（ARM64）

/*
* Mark the prot value as outer cacheable and inner non-cacheable. Non-coherent
* devices on a system with support for a system or last level cache use these
* attributes to cache allocations in the system cache.
*/


#define pgprot_syscached(prot) \__pgprot_modify(prot, PTE_ATTRINDX_MASK, \PTE_ATTRINDX(MT_NORMAL_iNC_oWB) | PTE_PXN | PTE_UXN)

dma_alloc_attrs

内核分配并映射内存的流程分为三个主体，本篇介绍了左边两个。至于smmu iommu的映射目前不涉及。

顺便提一句 dma_alloc_coherent，因为很多教程里面都提这个接口，这个接口内部实际封装了attrs这个接口，而仅仅把attrs属性设置了0，根据上述的分析，即映射到内核时，其页面被设置了no cache的属性，即关闭了cache。

static inline void *dma_alloc_coherent(struct device *dev, size_t size,dma_addr_t *dma_handle, gfp_t gfp)
{return dma_alloc_attrs(dev, size, dma_handle, gfp,(gfp & __GFP_NOWARN) ? DMA_ATTR_NO_WARN : 0);
}