查看已安装的驱动
[root@localhost:~] esxcli software vib list
Name Version Vendor Acceptance Level Install Date Platforms
----------------------------- ------------------------------------ ------ ---------------- ------------ ---------
NVD-VMware_ESXi_8.0.0_Driver 525.147.01-1OEM.800.1.0.20613240 NVD VMwareAccepted 2025-07-29 host
nvdgpumgmtdaemon 525.147.01-1OEM.700.1.0.15843807 NVD VMwareAccepted 2025-07-29 host
atlantic 1.0.3.0-13vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
bcm-mpi3 8.8.1.0.0.0-1vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
bnxtnet 226.0.21.0-31vmw.803.0.0.24022510 VMW VMwareCertified 2025-06-14 host
卸载Grid驱动
[root@localhost:~] esxcli software vib remove --vibname=nvdgpumgmtdaemon
Removal ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.VIBs Installed:VIBs Removed: NVD_bootbank_nvdgpumgmtdaemon_525.147.01-1OEM.700.1.0.15843807VIBs Skipped:Reboot Required: trueDPU Results:
[root@localhost:~]
[root@localhost:~] esxcli software vib remove --vibname=NVD-VMware_ESXi_8.0.0_Driver
Removal ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.VIBs Installed:VIBs Removed: NVD_bootbank_NVD-VMware_ESXi_8.0.0_Driver_525.147.01-1OEM.800.1.0.20613240VIBs Skipped:Reboot Required: trueDPU Results:
[root@localhost:~]
安装Grid驱动
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers] esxcli software component apply -d /vmfs/volumes/datastore1/Drivers/NVD-VGPU-800_535.230.02-1OEM.800.1.0.20613
240_24481118.zip
Installation ResultMessage: Operation finished successfully.Components Installed: NVD-VGPU-800_535.230.02-1OEM.800.1.0.20613240Components Removed:Components Skipped:Reboot Required: falseDPU Results:
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers]
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers] esxcli software component apply -d /vmfs/volumes/datastore1/Drivers/nvd-gpu-mgmt-daemon_535.230.02-0.0.0000_24
467933.zip
Installation ResultMessage: The update completed successfully, but the system needs to be rebooted for the changes to be effective.Components Installed: nvd-gpu-mgmt-daemon_535.230.02-0.0.0000Components Removed:Components Skipped:Reboot Required: trueDPU Results:
[root@localhost:/vmfs/volumes/684d9ba7-bb44fab2-ee38-f4939ff4b132/Drivers]
验证驱动
[root@localhost:~] nvidia-smi
Mon Aug 18 15:56:21 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: N/A |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P4 On | 00000000:18:00.0 Off | Off |
| N/A 36C P8 10W / 75W | 32MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A2 On | 00000000:3B:00.0 Off | 0 |
| 0% 54C P8 9W / 60W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A2 On | 00000000:86:00.0 Off | 0 |
| 0% 51C P8 8W / 60W | 0MiB / 15356MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------++---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
[root@localhost:~]
有几个大大的疑问
1、单插一张A2卡,nvidia-smi驱动识别不到A2设备
2、插两张A2卡,nvidia-smi驱动只识别到一个A2设备
3、PCIe卡槽顺序先插一张P4卡再插一张A2卡,nvidia-smi驱动识别到全部设备
4、PCIe卡槽顺序先插一张P4卡再插两张A2卡,nvidia-smi驱动识别到全部设备
5、PCIe卡槽顺序插卡,只识别到插在P4卡槽后的A2卡
听说是NVIDIA Ampere(2021.4)架构的问题,是主板(Cascade Lake 2019.4)太老了或是BIOS版本太老了