Big Data 11 min read

Data Platform vs Backend Architecture: Benefits of Moving Functionality to a Data Platform

The article explains why shifting batch jobs, reporting, and machine‑learning model training from traditional backend services to a dedicated data platform can simplify development, improve fault tolerance, and scale analytics, using real‑world examples from Spotify and best‑practice guidelines.

Top Architect

Feb 27, 2021

Data Platform vs Backend Architecture: Benefits of Moving Functionality to a Data Platform

Modern tech stacks usually include at least a frontend and a backend, but they quickly evolve to require a data platform for analytics, reporting, cron jobs, dashboards, and batch data replication.

Typical data‑platform workloads have low latency requirements, can run up to 24 hours later, and are expressed as batch jobs on large datasets rather than per‑request operations. Examples include nightly transaction imports for accounting systems and periodic retraining of fraud‑detection models.

At Spotify, the data platform started with royalty reports and grew into a nightly pipeline that rebuilds personalized recommendations and retrains core models every few weeks.

1 Why is it complicated?

Using a data platform simplifies product building and delivery by tenfold: it removes concerns about latency, gives control over data flow, enables more fault‑tolerant (idempotent) batch processing, offers higher efficiency for large‑scale operations, and allows easy recovery from failures.

For instance, building a global headline service that updates hourly via a data‑platform cron job is far easier than implementing real‑time updates directly in the backend.

2 Have you done it with minimal tricks?

Backend architecture best practices (e.g., avoiding shared databases, keeping queries simple, using transactions, extensive unit and integration testing, and breaking monoliths into micro‑services) often become unnecessary when moving functionality to a data platform.

3 Data side: "Wild West"

Typical data pipelines start by shipping backend logs and database dumps to storage, historically Hadoop HDFS, now often scalable databases like Redshift.

4 Data latency considerations

Data in a platform is expected to be delayed (often 24 hours or more). Real‑time access to production databases via cron jobs can create an integrated database anti‑pattern, so separation of delayed batch jobs from real‑time endpoints is recommended.

5 Integrated databases

While traditional architecture discourages services sharing databases, in the data world it can be acceptable to combine three distinct datasets, because changes only require query updates, failures can be fixed and rerun, and queries are read‑only.

6 Large queries

Backend systems need low‑latency, low‑throughput queries per user, whereas data platforms handle large‑scale scans (OLAP) across massive tables; optimizing includes avoiding joins, using simple indexes, and targeting specific IDs.

7 Testing

Testing backend functions is straightforward, but testing data pipelines is hard due to high‑dimensional inputs, nondeterministic ML models, and subjective outputs, leading to low test fidelity and high maintenance cost.

8 Conclusion

Moving as many functions as possible—non‑transactional emails, search index generation, recommendations, reporting, data for business users, and ML model training—to a data platform run as cron jobs reduces backend code complexity by roughly an order of magnitude.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Engineering Backend Architecture Batch Processing Data Platform Cron Jobs

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.