Operations 18 min read

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

This article compares Thanos and VictoriaMetrics as long‑term storage solutions for Prometheus, evaluating their architecture, write and read paths, reliability, consistency, performance, scalability, high‑availability, and hosting costs to help you choose the most suitable option for your monitoring stack.

Efficient Ops

Dec 11, 2024

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

Today we compare two well‑known monitoring components in the Prometheus ecosystem: Thanos and VictoriaMetrics, both mature solutions for long‑term storage.

Both provide long‑term storage, global aggregation of data from multiple Prometheus instances, and horizontal scalability.

The comparison focuses on differences and pros/cons, mainly from write and read perspectives, covering configuration complexity, reliability, data consistency, performance, and scalability.

1. Architecture

Thanos

Core components of Thanos include:

Sidecar : runs alongside each Prometheus instance, uploads data older than 2 hours to object storage (e.g., S3, GCS) and serves recent data to Thanos Query.

Store Gateway : provides stored data from object storage to Thanos Query.

Query : implements the Prometheus query API, aggregates data from Sidecars and Store Gateways, and serves results to clients such as Grafana.

Compactor : merges uploaded data blocks into larger ones to improve query efficiency and reduce storage size.

Ruler : evaluates recording and alerting rules globally, can generate new metrics, and may upload data to object storage; it depends heavily on Query reliability.

Receiver : experimental component that supports Prometheus remote‑write API for real‑time data push.

VictoriaMetrics

Core components of the VictoriaMetrics cluster version are:

vmstorage – stores data.

vminsert – receives data from Prometheus via the remote_write API and distributes it across vmstorage nodes.

vmselect – queries vmstorage nodes, aggregates results, and returns data to clients such as Grafana.

Each component can be scaled independently on suitable hardware.

2. Write Comparison

Configuration and Operational Complexity

Thanos requires disabling local TSDB block compression, inserting a Sidecar into each Prometheus instance, configuring Sidecar monitoring, and setting up a Compactor for each object‑storage bucket.

VictoriaMetrics only needs a remote_write configuration in Prometheus; no Sidecar insertion or compression disabling is required.

Reliability and Availability

Thanos Sidecar uploads data in 2‑hour blocks, so a disk failure can lose up to 2 hours of data. Querying Sidecar may also impact upload performance.

VictoriaMetrics writes data via remote_write in near real‑time, so only a few seconds of data could be lost on disk failure.

Since Prometheus v2.8.0+, data is copied from the write‑ahead log to remote storage, preventing loss due to temporary remote‑storage outages.

Data Consistency

Thanos Compactor and Store Gateway can compete, leading to possible inconsistencies or query failures, especially with object‑storage eventual consistency.

VictoriaMetrics maintains strong consistency in its storage architecture.

Performance

Thanos write performance is good, but heavy queries can affect Sidecar upload speed; Compactor load may impact object‑storage buckets.

VictoriaMetrics adds negligible CPU overhead for remote_write and can allocate sufficient CPU on the receiving side to ensure performance.

Scalability

Thanos relies on object‑storage scalability; Sidecar upload depends on it.

VictoriaMetrics scales by adding more vminsert and vmstorage nodes or upgrading hardware.

3. Read Comparison

Configuration and Operational Complexity

Thanos read path requires Sidecar Store API, Store Gateway exposing object‑storage data, and Query aggregating across all components.

VictoriaMetrics provides a ready‑to‑use Prometheus query API without extra external components; Grafana can point directly to VictoriaMetrics.

Reliability and Availability

Thanos Query must connect to all Sidecars and Store Gateways, which can be challenging across data centers.

VictoriaMetrics queries involve only internal vmselect and vmstorage connections, offering higher reliability and faster startup.

Data Consistency

Thanos may return partial results if some Sidecars or Store Gateways are unavailable.

VictoriaMetrics can also return partial results but can be configured to deny partial responses, generally providing higher availability.

Performance

Thanos Query performance is limited by the slowest Sidecar or Store Gateway.

VictoriaMetrics query performance scales with the number of vmselect and vmstorage instances and can be improved by adding resources.

Scalability

Thanos Query is stateless and can be horizontally scaled; Store Gateway also supports scaling, but scaling individual Prometheus + Sidecar instances is difficult.

VictoriaMetrics allows independent scaling of vmselect and vmstorage components, with optimizations for low‑bandwidth environments.

4. High‑Availability Comparison

Thanos requires multiple Query instances across zones; if a zone fails, only partial results may be returned.

VictoriaMetrics can run multiple clusters in different zones, replicating data across them, allowing full query results even if a zone is down.

5. Hosted Cost Comparison

Thanos

Data stored in object storage (e.g., GCS, S3) incurs costs based on storage class, egress traffic, and API calls.

VictoriaMetrics

Data stored on block storage (e.g., GCE disks, EBS) with lower cost per TB; optimized compression reduces required disk space up to 10× compared to Thanos.

Summary

VictoriaMetrics uses standard remote_write to ingest data and stores it on block storage, while Thanos requires disabling local compression, using a Sidecar to upload data to object storage, and a Compactor to merge blocks.

VictoriaMetrics provides a built‑in global query view via the Prometheus query API, avoiding the need for external components; Thanos needs Sidecar, Store Gateway, and Query, making large deployments more complex.

VictoriaMetrics clusters deploy easily on Kubernetes with a simple architecture, whereas Thanos deployment and configuration are considerably more complex.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring VictoriaMetrics Thanos Long‑term Storage cost comparison

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.