Posts tagged: nvidia

NVIDIA GPU ‘passthrough’ to lxc containers on Proxmox 6 for NVENC in Plex

By , 2020-12-06 20:19

I’ve found multiple guides on how to enable NVIDIA GPU access from lxc containers, however I had to combine the information from multiple sources to get a fully working setup. Here are the steps that worked for me.

  1. Install dkms on your Proxmox host to ensure the nvidia driver can be auto-updated with new kernel versions.
    # apt install dkms
  2. Head over to https://github.com/keylase/nvidia-patch and get the latest supported Nvidia binary driver version listed there.
  3. Download the nvidia-patch repo
    git clone https://github.com/keylase/nvidia-patch.git
  4. Install the driver from step 2 on the host.
    For example, ./NVIDIA-Linux-x86_64-455.45.01.run
  5. Run the nvidia-patch/patch.sh script on the host.
  6. Install the same driver in each container that needs access to the Nvidia GPU, but without the kernel module.
    ./NVIDIA-Linux-x86_64-455.45.01.run --no-kernel-module
  7. Run the nvidia-patch/patch.sh script on the lxc container.
  8. On the host, create a script to initialize the nvidia-uvm devices. Normally these are created on the fly when a program such as ffmpeg calls upon the GPU, but since we need to pass the device nodes through to the containers, they must exist before the containers are started.

    I saved the following script as /usr/local/bin/nvidia-uvm-init. Make sure to chmod +x !
#!/bin/bash
## Script to initialize nvidia device nodes.
## https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i
  done
  mknod -m 666 /dev/nvidiactl c 195 255
else
  exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
  mknod -m 666 /dev/nvidia-uvm c $D 0
  mknod -m 666 /dev/nvidia-uvm-tools c $D 0
else
  exit 1
fi

Next, we create the following two systemd service files to start this script, and the nvidia-persistenced:

/usr/local/lib/systemd/system/nvidia-uvm-init.service

# nvidia-uvm-init.service
# loads nvidia-uvm module and creates /dev/nvidia-uvm device nodes
[Unit]
Description=Runs /usr/local/bin/nvidia-uvm-init
[Service]
ExecStart=/usr/local/bin/nvidia-uvm-init
[Install]
WantedBy=multi-user.target

/usr/local/lib/systemd/system/nvidia-persistenced.service

# NVIDIA Persistence Daemon Init Script
#
# Copyright (c) 2013 NVIDIA Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#
# This is a sample systemd service file, designed to show how the NVIDIA
# Persistence Daemon can be started.
#
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
[Install]
WantedBy=multi-user.target

Next, symlink the two service definition files into /etc/systemd/system

# cd /etc/systemd/system
# ln -s /usr/local/lib/systemd/system/nvidia-uvm-init.service
# ln -s /usr/local/lib/systemd/system/nvidia-persistenced.service

and load the services

# systemctl daemon-reload
# systemctl start nvidia-uvm-init.service
# systemctl start nvidia-persistenced.service

Now you should see all the nvidia device nodes have been created
# ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Dec 6 18:07 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 6 18:10 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Dec 6 18:07 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Dec 6 18:12 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511, 0 Dec 6 19:00 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511, 0 Dec 6 19:00 /dev/nvidia-uvm-tools


/dev/nvidia-caps:
total 0
cr-------- 1 root root 236, 1 Dec 6 18:07 nvidia-cap1
cr--r--r-- 1 root root 236, 2 Dec 6 18:07 nvidia-cap2

Check the dri devices as well
# ls -l /dev/dri*
total 0
drwxr-xr-x 2 root root 100 Dec 6 17:00 by-path
crw-rw---- 1 root video 226, 0 Dec 6 17:00 card0
crw-rw---- 1 root video 226, 1 Dec 6 17:00 card1
crw-rw---- 1 root render 226, 128 Dec 6 17:00 renderD128

Take note of the first number of each device after the group name. In the listings above I have 195, 511, 236 and 226.

Now we need to edit the lxc container configuration file to pass through the devices. Shut down your container, then edit the config file – example /etc/pve/lxc/117.conf. The relevant lines are below the swap: 8192 line

arch: amd64
cores: 12
features: mount=cifs
hostname: plex
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=4A:50:52:00:00:00,ip=192.168.1.122/24,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-117-disk-0,size=250G,acl=1
startup: order=99
swap: 8192
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 226:* rwm
lxc.cgroup.devices.allow: c 236:* rwm
lxc.cgroup.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Now, start your container back up. You should be able to use NVENC features. You can test by using ffmpeg:
$ ffmpeg -i dQw4w9WgXcQ.mp4 -c:v h264_nvenc -c:a copy /tmp/rickroll.mp4

You should now have working GPU transcode in your lxc container!

If you get the following error, recheck and make sure you have set the correct numeric values for lxc.cgroup.devices.allow and restart your container.

[h264_nvenc @ 0x559f2a536b40] Cannot init CUDA
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width
or height
Conversion failed!

Another way to tell the values are incorrect is having blank (———) permission lines for the nvidia device nodes. You will get this inside any containers that are started before the nvidia devices are initialized by the nvidia-uvm-init script on the host.

$ ls -l /dev/nvidia*
---------- 1 root root        0 Dec  6 18:04 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec  6 19:02 /dev/nvidiactl
---------- 1 root root        0 Dec  6 18:04 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Dec  6 19:02 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Dec  6 19:02 /dev/nvidia-uvm-tools

Sometimes, after the host has been up for a long time, the /dev/nvidia-uvm or other device nodes may disappear. In this case, simply run the nvidia-uvm-init script, perhaps schedule it to run as a cron job.

Message for WC from Bristow

By , 2010-12-17 22:56

Hello,

This is a message for the commenter “WC from Bristow” who posted a review of Mac OS X Snow Leopard on the Apple Store website on Decemer 11, 2010 (review posted below for SEO/reference).

I just wanted to let you know that your MacBook Pro is DEFECTIVE. I hope that by some strange chance you might read this before your warranty expires. There is a known issue with the GeForce 9400M graphics chips in the original unibody MacBooks with removable battery. There is random screen flickering. This is a hardware issue, and no matter what you do will not go away. If your laptop is still under warranty, and even if it’s not, take it back to Apple and demand a replacement. They will probably swap it for a new, current-generation MacBook.

Just a friendly suggestion from someone who went through the same frustration.

Not Impressed
  • Written by WC from Bristow
  • 11-Dec-2010

Snow Leopard brought some serious problems. Mail will chew up 50 to 75% of the processor just idling. Flashing video/screen on MacBook Pro 15″ when on low power video card setting causes me to run all the time in high performance mode so it does not flash… and now my battery lasts for an hour and a half if I am lucky. (in part because of the processor overloading of Mail). This adds 2 lbs+ to the laptop because I must now carry two extra batteries (and I do).

I spend a lot of time in Mail… and bugs like copying the email address and getting a bunch of extra garbage that needs to be trimmed after pasting is aggravating. Dock lockups, Browser lockups. Spinning beach ball of death with much greater frequency than normal Leopard. At points cascading chronic app crashing which is “cured” by rebooting the machine (a la Windows)

Hope that Lion brings some relief. Snow Leopard has fleas.

Custom theme by me. Based on Panorama by Themocracy