标签 NVIDIA Driver 下的文章

目录帖：

PS：看完的佬友可以发条评论，最近几章的阅读量越来越少，是行文出了问题或是难以理解？还是单纯的帖子被淹没了？

============ 以下正文 ============

驱动的安装

在这里只讲 Ubuntu 和 Windows 的。首先说 Windows，大家应该都非常熟悉了，官网下载驱动包一键安装就行。

对于 Ubuntu 来说，大部分情况下建议采用如下方式安装驱动，而非官网下载.run 文件安装驱动：

在 /etc/modprobe.d/blacklist.conf 的末尾追加 blacklist nouveau 并执行 sudo update-initramfs -u && reboot 以禁用 nouveau

执行 ubuntu-drivers devices 以查询推荐驱动。如：

(base) root@ubuntu:~# ubuntu-drivers devices
== /sys/devices/pci0000:c0/0000:c0:01.1/0000:c1:00.0 ==
modalias : pci:v000010DEd00002684sv000010DEsd000016F3bc03sc00i00
vendor   : NVIDIA Corporation
model    : AD102 [GeForce RTX 4090]
driver   : nvidia-driver-535 - distro non-free
driver   : nvidia-driver-570-server-open - distro non-free
driver   : nvidia-driver-580-open - distro non-free
driver   : nvidia-driver-580-server - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-570 - distro non-free
driver   : nvidia-driver-580 - distro non-free recommended
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-580-server-open - distro non-free
driver   : nvidia-driver-570-open - distro non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-570-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

可以看到目前推荐的驱动版本是 580。那么执行 sudo apt install nvidia-driver-580 -y 即可安装。

注意，驱动程序和 CUDA 版本的关系很简单：显卡驱动的版本必须大于等于 CUDA Toolkit 所要求的最低版本。

nvidia-smi

看懂 nvidia-smi

安装完驱动以后，执行 nvidia-smi 来检查一下驱动是否已经正确安装：

(base) root@ubuntu:~# nvidia-smi
Wed Jan 14 02:22:46 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:81:00.0 Off |                  Off |
| 67%   25C    P8             20W /  300W |       1MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        On  |   00000000:82:00.0 Off |                  Off |
| 30%   25C    P8             22W /  300W |       1MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        On  |   00000000:C1:00.0 Off |                  Off |
| 30%   28C    P8             25W /  300W |       1MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        On  |   00000000:C2:00.0 Off |                  Off |
| 30%   27C    P8             17W /  300W |       1MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

先从回显的第一行看起。这里可以看到我们安装了 580.95.05 版本的驱动。那么这个 CUDA Version 是怎么回事呢？

即使没装过 CUDA ToolKit，这里其实也会显示一个版本号的。这是因为驱动程序里有一个动态链接库 libcuda.so，它决定了当前版本的驱动能够支持的最高 CUDA 版本。

然后看一下每列的内容：

GPU / Fan：显卡序号（从 0 开始）、显卡风扇转速（以百分比计）
Name / Temp：显卡名、当前核心温度
Perf：性能状态（Performance State），P8 = 休眠模式；P2 = 高性能模式；P0 = 最大性能模式（一般 CUDA 计算都是在 P2 上）
Persistence-M / Pwr：持久化模式状态、功耗监控。持久化会在下面讲到。功耗监控列的格式为：当前功耗 / 最大功率（可以自己调整）
Bus ID：PCI 总线地址。这里可以和 lspci | grep -i vga 的回显对应上
Disp.A：Display Activate，显示输出激活状态。由于这些卡都没有连接显示器，所以目前都是 Off
Memory Usage：显存占用。喜闻乐见的核心指标，格式为：当前使用显存 / 总显存
GPU Util：GPU 核心使用率
Volatile Uncorr. ECC：ECC 开启状态。我没开，开了少 3G 显存，太致命了
Compute M.：计算模式（Compute Mode），分为 Default（默认）、Exclusive_Process（独占进程）、Prohibited（禁止计算）
MIG M.：多实例 GPU 模式（Multi Instance GPU Mode），高贵的数据中心卡才有，可以在硬件层面上把 GPU 切分

最后一行表格是当前活动进程。在这里可以看到使用了 GPU 的进程状态。

持久化

在 Windows 上，系统启动时内核模式驱动就会被加载并一直保持加载状态，天生就带着持久化的特性，所以通常情况下不用管。但在 Linux 中（尤其是无头机），因为没有客户端一直维护 GPU 句柄，所以每次目标 GPU 上有程序启动和停止时，内核模式驱动都会初始化和取消初始化目标 GPU。这无疑是一种资源上的浪费。最要命的是，在实践中，这个现象还会经常导致莫名其妙的 bug。你都准备跑 AI 计算了，还缺那点电费吗？持久化走起！如无意外，每个 Linux 用户都应该开启持久化功能：

nvidia-smi -pm 1

检查 GPU 拓扑

当你连接了 NVLink 后，可以使用如下命令来查看 NVLink 连接状态：

(base) root@gpu-a6000:~# nvidia-smi topo -m

        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     SYS     SYS     48-63,112-127   3               N/A
GPU1    NV4      X      SYS     SYS     32-47,96-111    2               N/A
GPU2    SYS     SYS      X      NV4     16-31,80-95     1               N/A
GPU3    SYS     SYS     NV4      X      0-15,64-79      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
NV#= Connection traversing a bonded set of # NVLinks

如上表格所示：这个系统里有 4 张卡，NVLink 两两一组，GPU0/1 一对，GPU2/3 一对。从 0/1 到 2/3 就要走 CPU PCIE 了。

至于 CPU/NUMA 亲和性在此处不再展开讲述，如果你使用多路 Xeon 或 AMD Epyc 则可能需要注意一下调节 NPS 以优化 NCCL 性能。

CUDA

如果你是一个拥有很大固态和很快网络的 Docker 战神，那我的建议是不要在宿主机里装任何 CUDA，一切交给 Docker。

一般地，建议使用：nvidia/cuda - Docker Image

但是，如果你需要编译一些源码或者坚持古法环境配置，那么就需要在官网下载.run 文件去安装 CUDA 了。有几点需要注意：

对于新安装的系统来说，安装 CUDA 前尽可能地补全环境，避免安装失败。CUDA 最好一口气装好，不然清理环境非常头疼：

sudo apt-get install zlib1g -y
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-11 g++-11 -y
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 60 --slave /usr/bin/g++ g++ /usr/bin/g++-11
sudo apt-get install build-essential libgomp1 -y

非常重要：安装时记得取消掉驱动安装。CUDA 工具包里会带一个驱动，不要用！如果你已经装好驱动了，再把这玩意带上，装完了必炸

安装结束后，记得检查环境变量（以 CUDA 12.6 为例）：

vi ~/.bashrc
尾部追加：
export PATH=/usr/local/cuda-12.6/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:/usr/local/cuda-12.6/extras/CUPTI/lib64:/usr/local/cuda-12.6/targets/x86_64-linux/lib:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-12.6
执行：
source ~/.bashrc

执行命令以检查 CUDA 环境是否都已正确配置：

(base) root@ubuntu:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

Pytorch

安装 Pytorch 有许多讲究。首先，你应该使用 uv 或 conda 创建一个靠谱的虚拟环境，任何操作都需要在虚拟环境里执行。

访问：https://pytorch.org/get-started/locally/，查看当前 Stable 版本的 Pytorch 安装命令。

Pytorch 依赖许多特定环境：如 CUDA 版本、Python 版本、C++ 版本等，所以通常官网会列出 3 个主力 CUDA 版本对应的 Pytorch 安装命令。那么，假如你在使用特定版本的 CUDA 或需要安装特定版本的 Pytorch，如何确定有可用的 whl 呢？

检查可用 whl

Pytorch 官方构建：

https://download.pytorch.org/whl/cuXXX/torch

XXX 代表 CUDA 版本，以三位数表示。比如 CUDA 11.8 就是 cu118，CUDA 12.6 就是 cu126。

截至本文动笔的时间，Pytorch Stable 版本是 2.9.1，官网默认列出的版本是 CUDA 12.6 / 12.8 / 13.0。假如我创建了一个 Python 3.11 的环境并且宿主机的 CUDA 版本为 12.4，应该如何确定 whl 是否存在（可以直接安装）呢？

按照以上信息，我们访问：

https://download.pytorch.org/whl/cu124/torch/

很遗憾！我们看到了，最新的也就只有 2.6 版本：

torch-2.6.0+cu124-cp310-cp310-linux_x86_64.whl
torch-2.6.0+cu124-cp310-cp310-win_amd64.whl
torch-2.6.0+cu124-cp311-cp311-linux_x86_64.whl
torch-2.6.0+cu124-cp311-cp311-win_amd64.whl
torch-2.6.0+cu124-cp312-cp312-linux_x86_64.whl
torch-2.6.0+cu124-cp312-cp312-win_amd64.whl
torch-2.6.0+cu124-cp313-cp313-linux_x86_64.whl
torch-2.6.0+cu124-cp313-cp313-win_amd64.whl
torch-2.6.0+cu124-cp313-cp313t-linux_x86_64.whl
torch-2.6.0+cu124-cp39-cp39-linux_x86_64.whl
torch-2.6.0+cu124-cp39-cp39-win_amd64.whl

正式版的版本太老，看看 nightly 会不会好些，访问：

https://download.pytorch.org/whl/nightly/cu124/torch/

好多了，有 2.7 版本的每夜构建。如果你只是一个小白用户，那在坚守 CUDA 12.4 的情况下也就只有 Pytorch 2.7 可用了：

torch-2.7.0.dev20250310+cu124-cp310-cp310-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp310-cp310-win_amd64.whl
torch-2.7.0.dev20250310+cu124-cp311-cp311-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp311-cp311-win_amd64.whl
torch-2.7.0.dev20250310+cu124-cp312-cp312-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp312-cp312-win_amd64.whl
torch-2.7.0.dev20250310+cu124-cp313-cp313-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp313-cp313-win_amd64.whl
torch-2.7.0.dev20250310+cu124-cp313-cp313t-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp313-cp313t-win_amd64.whl
torch-2.7.0.dev20250310+cu124-cp39-cp39-manylinux_2_28_x86_64.whl
torch-2.7.0.dev20250310+cu124-cp39-cp39-win_amd64.whl

指定镜像及安装特定版本

现在我们确定了最新的 whl 了，但显然直接从官方源下载太慢。比如，我是 CUDA 12.6 用户，虚拟环境里用的 Python 3.11，想要安装 Pytorch 2.9.1（当前的 stable），官方给出的命令是：

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

这里推荐使用南京大学的镜像，更新比较及时：

pip3 install torch torchvision --index-url https://mirrors.nju.edu.cn/pytorch/whl/cu126

实际上，这等价于安装了如下 whl（以 torch 包举例）：

torch-2.9.1+cu126-cp311-cp311-manylinux_2_28_x86_64.whl

可以看到 whl 的命名规则：包版本+CUDA版本-Python版本-对应系统_系统架构

那么，假如我想要安装 Pytorch 2.8.0 呢？那么就应该注意一下版本对应关系。torch 和 torchaudio 的版本号是直接对应的，而 torchvision 则可能需要查一下。

当 Pytorch 2.1.0 时，包版本如下：

torch==2.1.0
torchaudio==2.1.0
torchvideo==0.16.0
torchtext==0.16.0

从 Pytorch 2.4.0 开始，没有 torchtext 包了。包版本如下：

torch==2.4.0
torchaudio==2.4.0
torchvideo==0.19.0

可以看到，它们都是递增关系。那么很容易推出来 Pytorch 2.8.0 需要的安装命令：

pip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://mirrors.nju.edu.cn/pytorch/whl/cu126

注：Pytorch 2.9.0 的 Conv3d 有问题，如果你部署 / 微调 Qwen3VL 的时候发现明显的性能降级，则应将版本回退到 2.8 或升级至 2.9.1（最好是退到 2.8）

📌 转载信息

来源：
https://linux.do/t/topic/1446261

原作者：
flymyd

转载时间：
2026/1/14 18:25:02