Helmholtz BLABLADOR

An experimental Large Language Model server

Alexandre Strube

February 22, 2024

Website

https://helmholtz-blablador.fz-juelich.de

Blablador

  • /ˈblæblæˌdɔɹ/
  • Bla-bla-bla 🗣️ + Labrador 🐕‍🦺
  • A stage for deploying and testing large language models
  • Models change constantly (constantly improving rank, some good, some awful)
  • Usually a small/fast model and fone of the top of the HF’s Open LLM Leaderboard
  • It is a web server and an api server, and training code.

“I think the complexity of Python package management holds down AI application development more than is widely appreciated. AI faces multiple bottlenecks — we need more GPUs, better algorithms, cleaner data in large quantities. But when I look at the day-to-day work of application builders, there’s one additional bottleneck that I think is underappreciated: The time spent wrestling with version management is an inefficiency I hope we can reduce.”

Andrew Ng, 28.02.2024

“Building on top of open source can mean hours wrestling with package dependencies, or sometimes even juggling multiple virtual environments or using multiple versions of Python in one application. This is annoying but manageable for experienced developers, but creates a lot of friction for new AI developers entering our field without a background in computer science or software engineering.”

Andrew Ng, 28.02.2024

Why?

  • AI is becoming basic infrastructure
  • Which historically is Open Source
  • We train a lot, deploy little: Here is your code/weights, tschüssi!
  • Little experience with dealing with LLMs
  • From the tools point of view, this is a FAST moving target 🎯💨
  • Acquire local experience in issues like
    • data loading,
    • quantization,
    • distribution,
    • fine-tune LLMs for specific tasks,
    • inference speed,
    • deployment
  • Projects like OpenGPT-X, TrustLLM and Laion need a place to run
  • The usual: we want to be ready when the time comes
  • TL;DR: BECAUSE WE CAN! 🤘

Some facts

  • We have no models of our own (yet) deployed
  • Models based on Llama2-70 🦙 take 7 gpus (or 8 with vLLM)
  • VLLM: PagedAttention, batching etc. Speeds up inference at cost of gpus (no quantization)
  • SGLang: RadixAttention - Even faster inference. Single-GPU for now
  • Mixtral-8x7b: takes 7 gpus, but is faster and better than Llama2-70
  • No data collection at all. I don’t keep ANY data whatsoever
    • You can use it AND keep your data private
    • We could, there’s code for ranking answers, or running models in parallel and voting
    • No records? Privacy! GDPR is happy

Deployment as a service

  • Scientists from (currently just FZJ) can deploy their models on their hardware and point to blablador
  • This solves a bunch of headaches for researchers:
    • Authentication
    • Web server
    • Firewall
    • Availability
    • Etc
  • If you have a model and want to deploy it, contact me!

What is it?

  • A classic MVC web application:
    • Model: large language model(s)
    • View: a web server and api server (openAI-compatible)
    • Controller: coordinates the models
  • A collaboration with LM-Sys (From Vicuña 🦙 fame)
  • Python app, Runs on bare metal 🤘 with venvs
  • Models run in different sc_venv_templates
    • Conflicting versions of libraries
  • Website Authentication: Helmholtz AAI, no records, just to keep the bots out
  • API Authentication: token with api scope from helmholtz codebase

OpenAI-compatible API

  • Uses openai-python from OpenAI itself
  • All services which can use OpenAI’s API can use Blablador’s API (VSCode’s Continue.dev, etc)
  • The API is not yet rate-limited, logged, monitored, documented or well-tested.

Jülich Supercomputers

JSC Supercomputer Stragegy

Jülich Supercomputers

JSC Machine Room

Well, no. Not really.

Haicluster
Haicluster
Haicluster

JSC CLOUD

  • Web server frontend
  • Authentication
  • API server
  • Workers’ controller

Haicluster

  • Currently hosting the workers (models)
  • 2 nodes 4xNvidia 3090 24gb, 256gb RAM
  • 1 node 8xNvidia 3090 24gb, 256gb RAM
  • GlusterFS:
    • 6tb ssd for /home,
    • 24tb hdd for data
  • Software: Ubuntu 20.04 LTS and EasyBuild

Demo: Website

Demo: API

Demo: API

Demo: API

Demo: API

  • Go to /v1/models
  • Click on Try it out
  • Click on Execute

Demo: API

Demo: cURL

  • curl --header "Authorization: Bearer MY_TOKEN_GOES_HERE"   https://helmholtz-blablador.fz-juelich.de:8000/v1/models

Demo: VScode + Continue.dev

  • Yes. It DOES run with Emacs too. Ask your favorite Emacs expert.
  • And vim as well. I guess.
  • But this demo is for VSCode. Sorry.
  • Add continue.dev extension to VSCode
  • On Continue, choose to add model, choose Other OpenAI-compatible API
  • Click in Open Config.json at the end

Demo: VScode + Continue.dev

Demo: VScode + Continue.dev

  • Inside config.json, add at the "models" section:

  •     {
          "title": "Mistral helmholtz",
          "provider": "openai",
          "contextLength": 16384,
          "model": "Mistral-7B-Instruct-v0.2",
          "apiKey": "YOUR_TOKEN_GOES_HERE",
          "apiBase": "https://helmholtz-blablador.fz-juelich.de:8000"
        },
  • Try with the other models you got from the API!

Demo: VScode + Continue.dev

  • Select some code in a python file
  • Right click, Continue: Ask Code -> Add Highlited Code to Context
  • Ask Blablador to explain this code!
  • Can also fix, add tests, etc

Demo: VScode + Continue.dev

What can you do with it?

It’s being used in the wild!

https://indico.desy.de/event/38849/contributions/162118/
Dmitriy Kostunin’s talk earlier today at LIPS24
cosmosage is deployed on Blablador as of today!

It’s being used in the wild!

  • Someone reverse-engineered the API and created a python package

It’s being used in the wild!

https://git.geomar.de/everardo-gonzalez/blablador-python-bindings

It’s being used in the wild!

  • GEOMAR created a chatbot for their website
  • Scanned their material and created embeddings (RAG)
  • Calls Blablador’s API with embeddings and gets answers
  • Product of the NFDI hackathon DataXplorers // and beyond
  • It’s called TL;DR (Too Long; Didn’t Read) - Source code

It’s being used in the wild!

https://zenodo.org/records/10376144

It’s being used in the wild!

It’s being used in the wild!

  • EUDAT is a collection of data management services
  • Has an instance of NextCloud (File Share, Office, Calendar etc)
  • It’s integrating AI deeply into its services, backed by Blablador!

It’s being used in the wild!

It’s being used in the wild!

It’s being used in the wild!

  • FZJ’s IEK7 (Stratosphere) is also using Blablador on their IEK7Cloud

Todo

  • Run gigantic models multi-node (eg. Falcon-180B)
  • Open for Helmholtz to have their models pointing to Blablador (no one asked yet)
  • Multi-GPU for SGLang (For Mixtral, and other models which are too big for my GPUs)
  • Multi-modal models (text+image, text+audio, etc)
  • Auto-RAG with privacy: don’t upload pdf, vector database in browser’s ram

Maybe

  • ARENA to evaluate models (that would imply keeping data)
  • There’s code for fine-tuning and training models, but no users
    • Everyone chooses their own training model, this is just one more

Questions?

https://strube1.pages.jsc.fz-juelich.de/2024-02-talk-lips-blablador/

Gitlab link to source code of the slides (need JUDOOR account)

Blablador with a proper paw