Helmholtz BLABLADOR

An experimental Large Language Model server

Alexandre Strube

September 26, 2023

Website

/ˈblæblæˌdɔɹ/
Bla-bla-bla 🗣️ + Labrador 🐕‍🦺
A stage for deploying and testing large language models
Models change constantly (constantly improving rank, some good, some awful)
Usually a code model and one of the top of the HF’s Open LLM Leaderboard
It is a web server and an api server. The API server is only available on the intranet.

We have no models of our own (yet) deployed
Most models for 2 gpus can be quantized for 1 gpu with GPTQ
Models based on Llama2-70 🦙 take 7 gpus (or 8 with vLLM)
VLLM: PagedAttention, batching etc. Speeds up inference at cost of gpus
No data collection at all. I don’t keep ANY data whatsoever
- We could, there’s code for ranking answers, or running models in parallel and voting
- So far, it has been simpler for GDPR (as there’s nothing to deal with)
- (I just wrote datenschutz on bing image creator)

A classic MVC web application:
- Model: large language model(s)
- View: a web server and api server (openAI-compatible)
- Controller: coordinates the models
A collaboration with LM-Sys (From Vicuña 🦙 fame)
- FastChat: https://chat.lmsys.org
Python app, Runs on bare metal 🤘 with venvs
Models run in different sc_venv_templates
- Conflicting versions of libraries
Website Authentication: Helmholtz AAI, no records, just to keep the bots out

Uses openai-python from OpenAI itself
All services which can use OpenAI’s API can use Blablador’s API (Jypyter, etc)
Only available on the intranet/vpn (yet)
The API is not yet authenticated,rate-limited, logged, monitored, documented or well-tested.
HOWTO:

export OPENAI_API_KEY = "EMPTY"
export OPENAI_API_BASE="https://haicluster1.fz-juelich.de:8000/v1"

(haicluster1 is offline this week, use haicluster2)

export OPENAI_API_BASE="https://helmholtz-blablador.fz-juelich.de:8000/v1"