rationale Link to heading
- hosting ai chatbot in homelab
- using docker/podman
- having nvidia gpu working with docker
- prevent sneaky eyes from seeing my private
sudo make-me-a-bomb
converstations with chatbot - pulling several open llm models
- using a fancy webui
todo Link to heading
- going full podman for rootless
- having a reverseproxy/sso to host for friends
- limiting ressources per user
- having a smtp sendmail server to do some user/auth management
- doing metrics with prometheus/grafana
- integrating stable diffusion (works but need to cleanup conf)
prerequisites Link to heading
- a gpu because cpu…
- a linux box with some ram/cpu
- installing docker/nvidia drivers
- some ssd with space for llm and .safetensors files
base os Link to heading
i use fedora server (40) but any mainstream distro should do the job because of docker’s portability
to using podman need to check nvidia’s container device interface
sudo dnf install -y docker-compose
sudo systemctl disable --now podman
nvidia stuff Link to heading
using nvidia gpu quadro M4000 8GiB DDR5
needed those nvidia packages
dnf repoquery --installed | grep nvidia
kmod-nvidia-latest-dkms-3:555.42.06-1.fc39.x86_64
libnvidia-container-tools-0:1.15.0-1.x86_64
libnvidia-container1-0:1.15.0-1.x86_64
nvidia-container-toolkit-0:1.15.0-1.x86_64
nvidia-container-toolkit-base-0:1.15.0-1.x86_64
nvidia-docker2-0:2.14.0-1.noarch
nvidia-driver-3:555.42.06-1.fc39.x86_64
nvidia-driver-NVML-3:555.42.06-1.fc39.x86_64
nvidia-driver-NvFBCOpenGL-3:555.42.06-1.fc39.x86_64
nvidia-driver-cuda-3:555.42.06-1.fc39.x86_64
nvidia-driver-cuda-libs-3:555.42.06-1.fc39.x86_64
nvidia-driver-devel-3:555.42.06-1.fc39.x86_64
nvidia-driver-libs-3:555.42.06-1.fc39.x86_64
nvidia-gpu-firmware-0:20240709-1.fc40.noarch
nvidia-kmod-common-3:555.42.06-1.fc39.noarch
nvidia-libXNVCtrl-3:555.42.06-1.fc39.x86_64
nvidia-libXNVCtrl-devel-3:555.42.06-1.fc39.x86_64
nvidia-modprobe-3:555.42.06-2.fc39.x86_64
nvidia-persistenced-3:555.42.06-1.fc39.x86_64
nvidia-settings-3:555.42.06-1.fc39.x86_64
nvidia-xconfig-3:555.42.06-2.fc39.x86_64
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf update -y && sudo dnf install -y nvidia-container-toolkit nvidia-docker2
tell nvidia that gpu should work in docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo sed -i '/#user = "root:video"/c\user = "root:root"' /etc/nvidia-container-runtime/config.toml
sudo systemctl restart docker
check if nvidia smi is working in docker layer
sudo docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:12.5.0-runtime-ubuntu22.04 nvidia-smi
compose stuff Link to heading
~/ollama_stack/docker-compose.yml
services:
webui:
image: ghcr.io/open-webui/open-webui:main
expose:
- 8080/tcp
ports:
- 8080:8080/tcp
environment:
- OLLAMA_BASE_URL=http://192.168.10.60:11434
volumes:
- open-webui:/app/backend/data
depends_on:
- ollama
ollama:
image: ollama/ollama
# needed for gpu nvidia warning security
privileged: true
expose:
- 11434/tcp
ports:
- 11434:11434/tcp
healthcheck:
test: ollama --version || exit 1
command: serve
volumes:
- ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['all']
capabilities: [gpu]
volumes:
ollama:
open-webui:
replace OLLAMA_BASE_URL=http://192.168.10.60:11434
with your real ip/url or with localhost if you need more security
for cloudlfare tunnels or so check the ressources chapter medium link
sudo doker-compose up -d
to check if it runs properly
sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
597d7d3b093a ghcr.io/open-webui/open-webui:main "bash start.sh" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp ollama_webui_1
f700bfe9e83e ollama/ollama "/bin/ollama serve" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp ollama_ollama_1
you can now install llm inside your ollama instance easy peasy
docker exec -it ollama ollama run llama3
docker exec -it ollama ollama run codegemma
all compatible models
docker compose at startup Link to heading
/etc/systemd/system/docker-ollama.service
[Unit]
Description=Docker Compose Application Service
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/myuser/ollama
ExecStart=/usr/bin/docker-compose up -d
ExecStop=/usr/bin/docker-compose down
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
then
sudo systemctl daemon-reload
sudo systemctl enable --now docker-ollama
troubleshooting Link to heading
in case you need a fresh restart and detroy your volumes
docker system prune -a --volumes
if nvidia/cuda is not detected only cpu/ram will be used
time=2024-07-12T17:16:19.715Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-12T17:16:19.719Z level=INFO source=gpu.go:526 msg="no nvidia devices detected" library=/usr/lib/x86_64-linux-gnu/libcuda.so.555.42.06
time=2024-07-12T17:16:19.722Z level=INFO source=gpu.go:324 msg="no compatible GPUs were discovered"
time=2024-07-12T17:16:19.722Z level=INFO source=types.go:103 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.7 GiB" available="60.8 GiB"
then you need to run your docker image with privileged rights !
- warning this can cause security issues
add prvileged: true
in your dockercompose.yml
time=2024-07-12T17:41:34.798Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-12T17:41:35.194Z level=INFO source=types.go:103 msg="inference compute" id=GPU-c64b91a4-ab17-5476-9aea-e6c70e0c50e4 library=cuda compute=5.2 driver=12.5 name="Quadro M4000" total="7.9 GiB" available="7.9 GiB"
if you don’t want to access webui via http:192.168.10.60 you can
- setup docker compose to serve locally only
- bind remote docker’s openwebui port to localhot via ssh
example
ssh -i .ssh/mykey -L 1337:localhost:8080 myuser@192.168.10.60
then access on your laptop via
http://localhost:1337/auth/
ressources Link to heading
- docker for ollama
https://github.com/OttCS/automatic1111-webui-fedora
https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md
https://github.com/NVIDIA/nvidia-docker/issues/1547
https://hub.docker.com/r/nvidia/cuda/
- open-webui troubleshooting
https://github.com/open-webui/open-webui
https://github.com/open-webui/open-webui#troubleshooting
https://docs.openwebui.com/troubleshooting/
- implementing websearch/images
https://docs.openwebui.com/tutorial/web_search
https://docs.openwebui.com/tutorial/images
- podman stuff
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
https://github.com/ericcurtin/podman-ollama
https://github.com/ericcurtin/podman-llm
https://github.com/NVIDIA/nvidia-container-toolkit/issues/210
stable diffusion
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Containers
https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/master/docker-compose.yml
speech
https://docs.openwebui.com/tutorial/openedai-speech-integration/