Artificial Intelligence 18 min read

Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

This presentation explains how OpenLLM, an open‑source LLM framework, together with BentoML, addresses the challenges of deploying large language models by offering model switching, memory optimizations, multi‑GPU support, observability, and easy containerized deployment for production AI applications.

DataFunTalk
DataFunTalk
DataFunTalk
Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

The talk introduces OpenLLM, an open‑source framework for developing and deploying large language models, and explains its background and the rapid growth of LLMs since ChatGPT.

It describes the motivations of BentoML customers for having their own controllable, secure, and cost‑effective LLMs, and outlines the challenges of operating LLMs in production such as hardware constraints, scalability, throughput and latency.

The core features of OpenLLM are presented, including support for many open‑source models, one‑line model switching, built‑in quantization, token streaming, continuous batching and paged attention, as well as multi‑GPU support and metrics monitoring.

A step‑by‑step example shows how to start a Dolly‑V2 service with the command start dolly‑v2 , switch to LLaMA2, expose an HTTP/Swagger API, and call the service from Python using the provided SDK.

Integration with BentoML is covered, describing model versioning, containerization, deployment to cloud or Kubernetes, the Runner abstraction, load‑balancing, and observability features such as metrics, traces and logs.

The presentation concludes that OpenLLM together with BentoML offers a flexible, efficient stack for building production‑grade LLM applications.

PythonLarge Language ModelsAI optimizationLLM deploymentBentoMLOpenLLM
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.