Deploying and Using Ollama Large Language Models Locally with Streamlit
This guide explains how to install Ollama, explore its supported open‑source LLMs, use its REST API for generation, chat, and embeddings, and build a Streamlit‑based web chat application that runs locally on your machine.
If you want to deploy and run open‑source large language models on localhost , you can try Ollama. This article uses Ollama to deploy models and call them via its API.
Installation : Ollama provides Python and JavaScript packages that are friendly for developers. Install them with:
pip install ollama
npm install ollamaApplication scenarios : chat interface, multimodal usage, etc.
Supported models : Ollama’s model list includes gemma, llama2, mistral, mixtral and many others. For example, to use the open‑source model llama2 you can download and run it as follows:
# Pull the model
ollama pull llama2
# Run the model
ollama run llama2REST API : Similar to OpenAI, Ollama offers endpoints for text generation, chat, and embeddings.
Generation endpoint
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?"
}'Chat endpoint
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{"role": "user", "content": "why is the sky blue?"}
]
}'Embeddings endpoint
curl http://localhost:11434/api/embeddings -d '{
"model": "all-minilm",
"prompt": "Here is an article about llamas..."
}'Practical example : Build a chat application with Streamlit and Ollama. The following Python code creates a Streamlit UI, selects a model, sends user messages to Ollama, and streams the assistant’s response.
# Import Streamlit UI library
import streamlit as st
# Import Ollama client
import ollama
# Get list of models from Ollama
model_list = ollama.list()
# Set default model name
if "model_name" not in st.session_state:
st.session_state["model_name"] = "llama2:7b-chat"
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Sidebar for model selection
with st.sidebar:
st.subheader("Settings")
option = st.selectbox('Select a model', [model['name'] for model in model_list['models']])
st.write('You selected:', option)
st.session_state["model_name"] = option
# Main page title
st.title(f"Chat with {st.session_state['model_name']}")
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Input box
if prompt := st.chat_input("What is up?"):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
for chunk in ollama.chat(
model=st.session_state["model_name"],
messages=[{"role": m["role"], "content": m["content"]} for m in st.session_state.messages],
stream=True,
):
if 'message' in chunk and 'content' in chunk['message']:
full_response += chunk['message']['content'] or ""
message_placeholder.markdown(full_response + "▌")
message_placeholder.markdown(full_response)
st.session_state.messages.append({"role": "assistant", "content": full_response})Run the application with:
streamlit run app.pySummary : Ollama enables convenient local deployment of open‑source LLMs, and combined with Streamlit you can quickly build an interactive web chat interface.
References :
Ollama official website
Streamlit documentation
Ollama Python client examples on GitHub
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.