Posts tagged: proxmox

NVIDIA GPU ‘passthrough’ to lxc containers on Proxmox 6 for NVENC in Plex

By , 2020-12-06 20:19

I’ve found multiple guides on how to enable NVIDIA GPU access from lxc containers, however I had to combine the information from multiple sources to get a fully working setup. Here are the steps that worked for me.

  1. Install dkms on your Proxmox host to ensure the nvidia driver can be auto-updated with new kernel versions.
    # apt install dkms
  2. Head over to https://github.com/keylase/nvidia-patch and get the latest supported Nvidia binary driver version listed there.
  3. Download the nvidia-patch repo
    git clone https://github.com/keylase/nvidia-patch.git
  4. Install the driver from step 2 on the host.
    For example, ./NVIDIA-Linux-x86_64-455.45.01.run
  5. Run the nvidia-patch/patch.sh script on the host.
  6. Install the same driver in each container that needs access to the Nvidia GPU, but without the kernel module.
    ./NVIDIA-Linux-x86_64-455.45.01.run --no-kernel-module
  7. Run the nvidia-patch/patch.sh script on the lxc container.
  8. On the host, create a script to initialize the nvidia-uvm devices. Normally these are created on the fly when a program such as ffmpeg calls upon the GPU, but since we need to pass the device nodes through to the containers, they must exist before the containers are started.

    I saved the following script as /usr/local/bin/nvidia-uvm-init. Make sure to chmod +x !
#!/bin/bash
## Script to initialize nvidia device nodes.
## https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i
  done
  mknod -m 666 /dev/nvidiactl c 195 255
else
  exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
  mknod -m 666 /dev/nvidia-uvm c $D 0
  mknod -m 666 /dev/nvidia-uvm-tools c $D 0
else
  exit 1
fi

Next, we create the following two systemd service files to start this script, and the nvidia-persistenced:

/usr/local/lib/systemd/system/nvidia-uvm-init.service

# nvidia-uvm-init.service
# loads nvidia-uvm module and creates /dev/nvidia-uvm device nodes
[Unit]
Description=Runs /usr/local/bin/nvidia-uvm-init
[Service]
ExecStart=/usr/local/bin/nvidia-uvm-init
[Install]
WantedBy=multi-user.target

/usr/local/lib/systemd/system/nvidia-persistenced.service

# NVIDIA Persistence Daemon Init Script
#
# Copyright (c) 2013 NVIDIA Corporation
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#
# This is a sample systemd service file, designed to show how the NVIDIA
# Persistence Daemon can be started.
#
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
[Install]
WantedBy=multi-user.target

Next, symlink the two service definition files into /etc/systemd/system

# cd /etc/systemd/system
# ln -s /usr/local/lib/systemd/system/nvidia-uvm-init.service
# ln -s /usr/local/lib/systemd/system/nvidia-persistenced.service

and load the services

# systemctl daemon-reload
# systemctl start nvidia-uvm-init.service
# systemctl start nvidia-persistenced.service

Now you should see all the nvidia device nodes have been created
# ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Dec 6 18:07 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 6 18:10 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Dec 6 18:07 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Dec 6 18:12 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511, 0 Dec 6 19:00 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511, 0 Dec 6 19:00 /dev/nvidia-uvm-tools


/dev/nvidia-caps:
total 0
cr-------- 1 root root 236, 1 Dec 6 18:07 nvidia-cap1
cr--r--r-- 1 root root 236, 2 Dec 6 18:07 nvidia-cap2

Check the dri devices as well
# ls -l /dev/dri*
total 0
drwxr-xr-x 2 root root 100 Dec 6 17:00 by-path
crw-rw---- 1 root video 226, 0 Dec 6 17:00 card0
crw-rw---- 1 root video 226, 1 Dec 6 17:00 card1
crw-rw---- 1 root render 226, 128 Dec 6 17:00 renderD128

Take note of the first number of each device after the group name. In the listings above I have 195, 511, 236 and 226.

Now we need to edit the lxc container configuration file to pass through the devices. Shut down your container, then edit the config file – example /etc/pve/lxc/117.conf. The relevant lines are below the swap: 8192 line

arch: amd64
cores: 12
features: mount=cifs
hostname: plex
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=4A:50:52:00:00:00,ip=192.168.1.122/24,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-117-disk-0,size=250G,acl=1
startup: order=99
swap: 8192
lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 226:* rwm
lxc.cgroup.devices.allow: c 236:* rwm
lxc.cgroup.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia-caps dev/nvidia-caps none bind,optional,create=dir
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Now, start your container back up. You should be able to use NVENC features. You can test by using ffmpeg:
$ ffmpeg -i dQw4w9WgXcQ.mp4 -c:v h264_nvenc -c:a copy /tmp/rickroll.mp4

You should now have working GPU transcode in your lxc container!

If you get the following error, recheck and make sure you have set the correct numeric values for lxc.cgroup.devices.allow and restart your container.

[h264_nvenc @ 0x559f2a536b40] Cannot init CUDA
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width
or height
Conversion failed!

Another way to tell the values are incorrect is having blank (———) permission lines for the nvidia device nodes. You will get this inside any containers that are started before the nvidia devices are initialized by the nvidia-uvm-init script on the host.

$ ls -l /dev/nvidia*
---------- 1 root root        0 Dec  6 18:04 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec  6 19:02 /dev/nvidiactl
---------- 1 root root        0 Dec  6 18:04 /dev/nvidia-modeset
crw-rw-rw- 1 root root 511,   0 Dec  6 19:02 /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1 Dec  6 19:02 /dev/nvidia-uvm-tools

Sometimes, after the host has been up for a long time, the /dev/nvidia-uvm or other device nodes may disappear. In this case, simply run the nvidia-uvm-init script, perhaps schedule it to run as a cron job.

Enabling “swap” in an OpenVZ container

By , 2012-03-23 19:28

Oracle client for Linux for some reason requires 1GB of swap space, and will refuse to install, even if you have 9999999999TB of RAM, but 0 swap. Go figure.

Anyway, an OpenVZ container created with Proxmox will by default have 0 swap allocated, despite the Web UI allowing you to specify swap space.

In order to add swap to the container, from a shell prompt, run

On a somewhat related note, here’s how to install oracle client on Debian:

debian + php5 + oracle (oci8)

vzctl set 213 –swappages 262144 –save

 

Where 213 is your CTID, and 262144 is the amount of swappages you want. 1 page=4096 bytes , so 262144 = rougly 1024MB.

Also, Oracle installer detects 262144 swappages as 1023MB and change, so you will have to put something like 262200 instead.

 

Quick bash script to restore all OpenVZ dumps

By , 2011-10-05 22:57

This script will read the container ID from the file name, and use it to restore the tgz dump to the same ID on the new OpenVZ/Proxmox server.

Note that this only works if the default name for the vzdumps is kept, and it only works for the next 89 years, because I’m lazy.

Thanks to
http://www.cyberciti.biz/faq/bash-loop-over-file/ and http://bashcurescancer.com/10-steps-to-beautiful-shell-scripts.html

#!/bin/bash
VZDUMPS=/path/to/backups/*.tgz
for f in $VZDUMPS
 
do
        f2=${f#*openvz-}
        VEID=${f2%-20*}
        echo "Restoring $f to $VEID"
        vzrestore $f $VEID
done

Migration to iWeb and Ubuntu+Proxmox how-to

By , 2009-09-22 19:56

My faithful readers (all 0 of them) may notice that the site is considerably faster now. My blog is now hosted on a proper server over at iweb in Montréal instead of on my home server, leaving it free for other tasks.

As seen previously, I was attempting to set up a combination MythTV/OpenVZ server. Well, I finally got it working:

  1. Install Ubuntu Jaunty (9.04, 64-bit) and update until the update manager won’t update no more 😉
  2. Install and configure MythTV backend. This step’s difficulty may vary depending on the tuner card. My Hauppage HVR-1600 was supported by Ubuntu out-of-the-box.
  3. Add the Debian Lenny stable and update repos to /etc/apt/sources.list and apt-get update.
  4. Download Linux driver for Intel Pro 1000 PCIe card.
  5. Install vzctl, linux-image-2.6-openvz-amd64,  linux-headers-2.6-openvz-amd64, update-grub if necessary.
  6. Reboot, make sure openvz kernel is running.
  7. Make && make install Intel e1000e driver.
  8. (Optional) Install Proxmox VE by adding proxmox repo.
  9. (Optional) Install mercurial and hg clone v4l-dvb. The main branch was broken, so I used one of the dev’s personal repos. make && make install v4l-dvb; cx18 now works again.

MythTV/Proxmox (OpenVZ) multi-role server from hell

By , 2009-09-01 22:35

So… I just spent two hours trying to get MythTV running properly on my OpenVZ server (installed via the Proxmox VE bare-metal installer). This is starting to be a lot harder than I thought it would be…

As seen in my previous post, I installed the 2.6.26-2-openvz-amd64 kernel and headers, and compiled v4l-dvb from mercurial, and fixed a little bug with vzctl. Today, I installed the firmware files for my Hauppage HVR-1600 (see MythTV wiki page), added the Debian-multimedia repo and installed MythTV (apt-get install mythtv). Then I realized I needed X to use mythtv-setup, so for some reason I decided to install KDE 3.5. KDE installed fine (minus some missing files for the kdm theme… wonder why these aren’t included in the kdm package or a dependency…). I then proceeded to create a password for the mythtv user ($passwd mythtv as root) and then run mythtv-setup as the mythtv user. I managed to add the sources and scan for channels, but when I tried to “Watch TV”, I was told that the primary backend wasn’t running.

I tried some troubleshooting, but it’s getting kinda late and I’m lacking sleep (as you can probably tell from my grammar), so I decided to try installing a shiny new Intel Pro 1000 Desktop (82574L) PCIe x1 Ethernet card to get my server some gigabit love. Should be simple, right? Intel cards have good driver support, with the e100 and e1000 drivers, so much so that VM solutions like VMware and VirtualBox chose to emulate them as guest hardware. Well, this was not the case today. I popped the card in to a free PCIe x1 slot and powered on the PC. Link lights went on and all looked fine and dandy. But once the machine fully booted up (takes a while with all those OpenVZ containers 😉 ), ifconfig showed only the eth0 interface, which is my built-in Realtek/nForce controller. Some further probing with lspci and dmesg showed that the card is alive, but that the e1000 driver didn’t even bother to start up.

At this point, I GIVE UP for tonight. I’m cold from sitting down in the basement, tired from lack of sleep, and frustrated from uncooperative Linux servers.

No streaming TV for me tonight, but I suppose I should be glad that at least the blog is still up and running.

Which brings to mind http://xkcd.com/349/

40% of OpenBSD installs lead to shark attacks. It's their only standing security issue.

Upgrading Proxmox VE kernel

By , 2009-08-31 16:19

I currently am running this blog from an OpenVZ server managed via Proxmox VE. One issue I had with this setup is that the Proxmox 1.3 installer by default comes with a relatively old kernel (2.6.24), and I want a newer kernel (>=2.6.26, so that I can use my cx18-based TV tuner). Fortunately, Proxmox is just a customized version of Debian Lenny, so I just installed the linux-image-2.6.26-2-openvz-amd64 package from apt, then ran update-initramfs -u and update-grub.

After updating the kernel, however, I was unable to start any of my virtual machines from the Proxmox Web UI. Looking at the system log showed a message about vzctl being 32-bit; problem solved by updating vzctl via apt.

Now I’m attempting to compile v4l-dvb…. fingers crossed!

hg clone http://linuxtv.org/hg/v4l-dvb

Custom theme by me. Based on Panorama by Themocracy