Helmholtz BLABLADOR

An experimental Large Language Model server

Alexandre Strube

February 22, 2024

Website

https://helmholtz-blablador.fz-juelich.de

Blablador

  • /ˈblæblæˌdɔɹ/
  • Bla-bla-bla 🗣️ + Labrador 🐕‍🦺
  • A stage for deploying and testing large language models
  • Models change constantly (constantly improving rank, some good, some awful)
  • Usually a small/fast model and fone of the top of the HF’s Open LLM Leaderboard
  • It is a web server and an api server, and training code.

Why?

  • AI is becoming basic infrastructure
  • Which historically is Open Source
  • We train a lot, deploy little: Here is your code/weights, tschüssi!
  • Little experience with dealing with LLMs
  • From the tools point of view, this is a FAST moving target 🎯💨
  • Acquire local experience in issues like
    • data loading,
    • quantization,
    • distribution,
    • fine-tune LLMs for specific tasks,
    • inference speed,
    • deployment
  • Projects like OpenGPT-X, TrustLLM and Laion need a place to run
  • The usual: we want to be ready when the time comes
  • TL;DR: BECAUSE WE CAN! 🤘

Some facts

  • We have no models of our own (yet) deployed
  • Models based on Llama2-70 🦙 take 7 gpus (or 8 with vLLM)
  • VLLM: PagedAttention, batching etc. Speeds up inference at cost of gpus (no quantization)
  • SGLang: RadixAttention - Even faster inference. Single-GPU for now
  • Mixtral-8x7b: takes 7 gpus, but is faster and better than Llama2-70
  • No data collection at all. I don’t keep ANY data whatsoever
    • You can use it AND keep your data private
    • We could, there’s code for ranking answers, or running models in parallel and voting
    • No records? Privacy! GDPR is happy

Deployment as a service

  • Scientists from (currently just FZJ) can deploy their models on their hardware and point to blablador
  • This solves a bunch of headaches for researchers:
    • Authentication
    • Web server
    • Firewall
    • Availability
    • Etc
  • If you have a model and want to deploy it, contact me!

What is it?

  • A classic MVC web application:
    • Model: large language model(s)
    • View: a web server and api server (openAI-compatible)
    • Controller: coordinates the models
  • A collaboration with LM-Sys (From Vicuña 🦙 fame)
  • Python app, Runs on bare metal 🤘 with venvs
  • Models run in different sc_venv_templates
    • Conflicting versions of libraries
  • Website Authentication: Helmholtz AAI, no records, just to keep the bots out
  • API Authentication: token with api scope from helmholtz codebase

OpenAI-compatible API

  • Uses openai-python from OpenAI itself
  • All services which can use OpenAI’s API can use Blablador’s API (VSCode’s Continue.dev, etc)
  • The API is not yet rate-limited, logged, monitored, documented or well-tested.

Jülich Supercomputers

JSC Supercomputer Stragegy

Jülich Supercomputers

JSC Machine Room

Well, no. Not really.

Haicluster
Haicluster
Haicluster

JSC CLOUD

  • Web server frontend
  • Authentication
  • API server
  • Workers’ controller

Haicluster

  • Currently hosting the workers (models)
  • 2 nodes 4xNvidia 3090 24gb, 256gb RAM
  • 1 node 8xNvidia 3090 24gb, 256gb RAM
  • GlusterFS:
    • 6tb ssd for /home,
    • 24tb hdd for data
  • Software: Ubuntu 20.04 LTS and EasyBuild

Demo: Website

Demo: API

Demo: API

Demo: API

Demo: API

  • Go to /v1/models
  • Click on Try it out
  • Click on Execute

Demo: API

Demo: cURL

  • curl --header "Authorization: Bearer MY_TOKEN_GOES_HERE"   https://helmholtz-blablador.fz-juelich.de:8000/v1/models

Demo: VScode + Continue.dev

  • Yes. It DOES run with Emacs too. Ask your favorite Emacs expert.
  • And vim as well. I guess.
  • But this demo is for VSCode. Sorry.
  • Add continue.dev extension to VSCode
  • On Continue, choose to add model, choose Other OpenAI-compatible API
  • Click in Open Config.json at the end

Demo: VScode + Continue.dev

Demo: VScode + Continue.dev

  • Inside config.json, add at the "models" section:

  •     {
          "title": "Mistral helmholtz",
          "provider": "openai",
          "contextLength": 16384,
          "model": "Mistral-7B-Instruct-v0.2",
          "apiKey": "YOUR_TOKEN_GOES_HERE",
          "apiBase": "https://helmholtz-blablador.fz-juelich.de:8000"
        },
  • Try with the other models you got from the API!

Demo: VScode + Continue.dev

  • Select some code in a python file
  • Right click, Continue: Ask Code -> Add Highlited Code to Context
  • Ask Blablador to explain this code!
  • Can also fix, add tests, etc

Demo: VScode + Continue.dev

What can you do with it?

It’s being used in the wild!

https://indico.desy.de/event/38849/contributions/162118/
Dmitriy Kostunin’s talk earlier today at LIPS24
cosmosage is deployed on Blablador as of today!

It’s being used in the wild!

  • Someone reverse-engineered the API and created a python package

It’s being used in the wild!

https://git.geomar.de/everardo-gonzalez/blablador-python-bindings

It’s being used in the wild!

  • GEOMAR created a chatbot for their website
  • Scanned their material and created embeddings (RAG)
  • Calls Blablador’s API with embeddings and gets answers
  • Product of the NFDI hackathon DataXplorers // and beyond
  • It’s called TL;DR (Too Long; Didn’t Read) - Source code

It’s being used in the wild!

https://zenodo.org/records/10376144

It’s being used in the wild!

It’s being used in the wild!

  • EUDAT is a collection of data management services
  • Has an instance of NextCloud (File Share, Office, Calendar etc)
  • It’s integrating AI deeply into its services, backed by Blablador!

It’s being used in the wild!

It’s being used in the wild!

It’s being used in the wild!

  • FZJ’s IEK7 (Stratosphere) is also using Blablador on their IEK7Cloud

Todo

  • Run gigantic models multi-node (eg. Falcon-180B)
  • Open for Helmholtz to have their models pointing to Blablador (no one asked yet)
  • Multi-GPU for SGLang (For Mixtral, and other models which are too big for my GPUs)
  • Multi-modal models (text+image, text+audio, etc)
  • Auto-RAG with privacy: don’t upload pdf, vector database in browser’s ram

Maybe

  • ARENA to evaluate models (that would imply keeping data)
  • There’s code for fine-tuning and training models, but no users
    • Everyone chooses their own training model, this is just one more

Questions?

https://strube1.pages.jsc.fz-juelich.de/2024-02-talk-lips-blablador/

Gitlab link to source code of the slides (need JUDOOR account)

Blablador with a proper paw