homelab ai server

rationale Link to heading

hosting ai chatbot in homelab
using docker/podman
having nvidia gpu working with docker
prevent sneaky eyes from seeing my private sudo make-me-a-bomb converstations with chatbot
pulling several open llm models
using a fancy webui

todo Link to heading

going full podman for rootless
having a reverseproxy/sso to host for friends
limiting ressources per user
having a smtp sendmail server to do some user/auth management
doing metrics with prometheus/grafana
integrating stable diffusion (works but need to cleanup conf)

prerequisites Link to heading

a gpu because cpu…
a linux box with some ram/cpu
installing docker/nvidia drivers
some ssd with space for llm and .safetensors files

base os Link to heading

i use fedora server (40) but any mainstream distro should do the job because of docker’s portability

to using podman need to check nvidia’s container device interface

sudo dnf install -y docker-compose
sudo systemctl disable --now podman

nvidia stuff Link to heading

using nvidia gpu quadro M4000 8GiB DDR5

needed those nvidia packages

dnf repoquery --installed | grep nvidia

kmod-nvidia-latest-dkms-3:555.42.06-1.fc39.x86_64
libnvidia-container-tools-0:1.15.0-1.x86_64
libnvidia-container1-0:1.15.0-1.x86_64
nvidia-container-toolkit-0:1.15.0-1.x86_64
nvidia-container-toolkit-base-0:1.15.0-1.x86_64
nvidia-docker2-0:2.14.0-1.noarch
nvidia-driver-3:555.42.06-1.fc39.x86_64
nvidia-driver-NVML-3:555.42.06-1.fc39.x86_64
nvidia-driver-NvFBCOpenGL-3:555.42.06-1.fc39.x86_64
nvidia-driver-cuda-3:555.42.06-1.fc39.x86_64
nvidia-driver-cuda-libs-3:555.42.06-1.fc39.x86_64
nvidia-driver-devel-3:555.42.06-1.fc39.x86_64
nvidia-driver-libs-3:555.42.06-1.fc39.x86_64
nvidia-gpu-firmware-0:20240709-1.fc40.noarch
nvidia-kmod-common-3:555.42.06-1.fc39.noarch
nvidia-libXNVCtrl-3:555.42.06-1.fc39.x86_64
nvidia-libXNVCtrl-devel-3:555.42.06-1.fc39.x86_64
nvidia-modprobe-3:555.42.06-2.fc39.x86_64
nvidia-persistenced-3:555.42.06-1.fc39.x86_64
nvidia-settings-3:555.42.06-1.fc39.x86_64
nvidia-xconfig-3:555.42.06-2.fc39.x86_64

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf update -y && sudo dnf install -y nvidia-container-toolkit nvidia-docker2

tell nvidia that gpu should work in docker

sudo nvidia-ctk runtime configure --runtime=docker
sudo sed -i '/#user = "root:video"/c\user = "root:root"' /etc/nvidia-container-runtime/config.toml
sudo systemctl restart docker

check if nvidia smi is working in docker layer

sudo docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:12.5.0-runtime-ubuntu22.04 nvidia-smi

compose stuff Link to heading

~/ollama_stack/docker-compose.yml

services:

  webui:
    image: ghcr.io/open-webui/open-webui:main
    expose:
     - 8080/tcp
    ports:
     - 8080:8080/tcp
    environment:
      - OLLAMA_BASE_URL=http://192.168.10.60:11434
    volumes:
      - open-webui:/app/backend/data
    depends_on:
     - ollama

  ollama:
    image: ollama/ollama
# needed for gpu nvidia warning security
    privileged: true
    expose:
     - 11434/tcp
    ports:
     - 11434:11434/tcp
    healthcheck:
      test: ollama --version || exit 1
    command: serve
    volumes:
      - ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['all']
              capabilities: [gpu]

volumes:
  ollama:
  open-webui:

replace OLLAMA_BASE_URL=http://192.168.10.60:11434 with your real ip/url or with localhost if you need more security

for cloudlfare tunnels or so check the ressources chapter medium link

sudo doker-compose up -d

to check if it runs properly

sudo docker ps

CONTAINER ID   IMAGE                                COMMAND               CREATED         STATUS                   PORTS                                           NAMES
597d7d3b093a   ghcr.io/open-webui/open-webui:main   "bash start.sh"       7 minutes ago   Up 7 minutes (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp       ollama_webui_1
f700bfe9e83e   ollama/ollama                        "/bin/ollama serve"   7 minutes ago   Up 7 minutes (healthy)   0.0.0.0:11434->11434/tcp, :::11434->11434/tcp   ollama_ollama_1

you can now install llm inside your ollama instance easy peasy

docker exec -it ollama ollama run llama3
docker exec -it ollama ollama run codegemma

all compatible models

https://ollama.com/library

docker compose at startup Link to heading

/etc/systemd/system/docker-ollama.service

[Unit]
Description=Docker Compose Application Service
Requires=docker.service
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/myuser/ollama
ExecStart=/usr/bin/docker-compose up -d
ExecStop=/usr/bin/docker-compose down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

then

sudo systemctl daemon-reload
sudo systemctl enable --now docker-ollama

troubleshooting Link to heading

in case you need a fresh restart and detroy your volumes

docker system prune -a --volumes

if nvidia/cuda is not detected only cpu/ram will be used

time=2024-07-12T17:16:19.715Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"                                                                                                  
time=2024-07-12T17:16:19.719Z level=INFO source=gpu.go:526 msg="no nvidia devices detected" library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06                                            
time=2024-07-12T17:16:19.722Z level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered"                                                                                           
time=2024-07-12T17:16:19.722Z level=INFO source=types.go:103 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.7 GiB" available="60.8 GiB"

then you need to run your docker image with privileged rights !

warning this can cause security issues

add prvileged: true in your dockercompose.yml

time=2024-07-12T17:41:34.798Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-12T17:41:35.194Z level=INFO source=types.go:103 msg="inference compute" id=GPU-c64b91a4-ab17-5476-9aea-e6c70e0c50e4 library=cuda compute=5.2 driver=12.5 name="Quadro M4000" total="7.9 GiB" available="7.9 GiB"

if you don’t want to access webui via http:192.168.10.60 you can

setup docker compose to serve locally only
bind remote docker’s openwebui port to localhot via ssh

example

ssh -i .ssh/mykey -L 1337:localhost:8080 myuser@192.168.10.60

then access on your laptop via

http://localhost:1337/auth/

ressources Link to heading

docker for ollama

https://medium.com/@blackhorseya/running-llama-3-model-with-nvidia-gpu-using-ollama-docker-on-rhel-9-0504aeb1c924

https://github.com/OttCS/automatic1111-webui-fedora

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation

https://dev.to/ajeetraina/the-ollama-docker-compose-setup-with-webui-and-remote-access-via-cloudflare-1ion

https://github.com/NVIDIA/nvidia-docker/issues/1547

https://hub.docker.com/r/nvidia/cuda/

open-webui troubleshooting

https://github.com/open-webui/open-webui

https://github.com/open-webui/open-webui#troubleshooting

https://docs.openwebui.com/troubleshooting/

implementing websearch/images

https://docs.openwebui.com/tutorial/web_search

https://docs.openwebui.com/tutorial/images

https://github.com/AUTOMATIC1111/stable-diffusion-webui?source=post_page-----5278c8cc894b--------------------------------

https://medium.com/free-or-open-source-software/automatic1111-stable-diffusion-webui-open-webui-ollama-stable-diffusion-prompt-generator-generate-5278c8cc894b

podman stuff

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

https://github.com/ericcurtin/podman-ollama

https://github.com/ericcurtin/podman-llm

https://github.com/NVIDIA/nvidia-container-toolkit/issues/210

stable diffusion

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Containers

https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/master/docker-compose.yml

speech

https://docs.openwebui.com/tutorial/openedai-speech-integration/