Running and Fine‑Tuning Large Language Models Locally with Ollama, Docker, and Cloud Resources
The author chronicles the challenges and solutions of running large language models locally using Ollama, experimenting with cloud GPUs on Google Colab, managing Python dependencies through Docker, and ultimately fine‑tuning a small Qwen model, providing a practical guide for AI enthusiasts.
Previously I used Ollama to run large language models locally (see the article "AI LLM Tool Ollama Architecture and Dialogue Processing Flow Analysis"). This time I wanted to try more advanced operations, such as fine‑tuning .
My idea was that, since a ready‑made large model exists, I could gather a domain‑specific dataset and "add some material" to the model, eventually obtaining a model optimized for that domain.
However, I quickly discovered that fine‑tuning is not that simple; the model must first be runnable via code. Thus this article was born, documenting my "simple" attempt to run an AI model and the many problems that followed.
Cloud Environment or Local?
It is well known that running AI models is best done with a GPU. I don’t have a GPU, so I turned to cloud resources. Both Google Colab and Kaggle Notebooks are attractive because they allow free GPU usage, so I chose Colab , hoping for abundant resources.
Reality hit hard: free‑user GPU slots are scarce and allocated by luck. Nevertheless, Colab remains useful despite the limitation.
Because I could not obtain better resources and the environment had restrictions, I decided to return to my modest local setup.
Python Environment: A Headache
Entering the AI field inevitably means using Python , which brings a slew of version and dependency‑management issues. Tools such as Conda , pipenv , pipx , and poetry were tried, but even installing PyTorch via the modern package manager poetry failed, leading to great frustration.
The solution was to abandon virtual‑environment tools and use Docker . By mounting the code directory into a clean Docker container, I obtained an isolated environment that works smoothly.
To avoid re‑downloading dependencies after a container restart, I either build a large base image (which I avoided) or persist the packages in the project directory—similar to node_modules for Node.js—then set PYTHONPATH to point to those locations.
With VSCode’s Remote Development extension, the development environment became stable and hassle‑free.
Model Selection: There’s One for You
After the environment was ready, I needed to pick a model (welcome, Hugging Face ). I first tried the popular LLaMA , but access requires permission, and my request was denied, likely due to regional restrictions.
Undeterred, I switched to another model family and chose Qwen . Both Qwen and LLaMA belong to the same family with various parameter sizes; to avoid overloading my computer, I selected the small Qwen/Qwen2.5-0.5B model.
Running a simple "Hello, world" test worked, though the tiny model responded slowly after waiting more than ten minutes. Still, it was encouraging to see a real response faster than many HR bots.
(Follow me for ad‑free, technology‑focused content; I welcome discussion.)
References:
https://huggingface.co/
https://huggingface.co/Qwen/Qwen2.5-0.5B
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.