Google's Midas Package Manager (MPM): Architecture, Build Process, and Security Features
The article explains Google’s internal Midas Package Manager (MPM), detailing its build definition files, immutable and mutable metadata stored in Bigtable, distributed replication via Colossus, client‑side pull and P2P copying, as well as its access‑control, encryption, and signing mechanisms that enable massive, conflict‑free software deployment at Google’s scale.
When discussing software package management systems, most people first think of client tools such as yum for RPM packages or apt‑get for DEB packages, which only provide a user‑facing interface for installing and uninstalling software while attempting to resolve dependency conflicts.
A complete package management system also requires a package repository that stores packages, manages versions, and offers access interfaces (e.g., HTTP or FTP). Build pipelines can automatically publish built packages to the repository for production deployment.
Traditional package management systems encounter several limitations when used for large‑scale deployment:
Distribution bottleneck caused by a centralized repository.
Dependency conflicts (diamond dependencies on different versions of the same library).
Deployment conflicts (different versions cannot coexist on the same machine).
Environment version management (pre‑release vs. production).
Even top Chinese internet companies struggle with these issues, wasting considerable engineering time debugging deployments across hundreds of thousands of machines. Google, with millions of machines, avoids these problems through its own package manager.
MPM: Midas Package Manager
Google’s internal package manager, MPM, is built on its infrastructure and provides packaging, replication, automatic cleanup, client tools, and seamless integration with the task‑scheduling system.
The article introduces MPM from several aspects:
Package building
Package metadata
Package distribution and replication
Package security
MPM system characteristics
Package Building
Building starts with a package definition file that specifies the files included, their owners and permissions, and pre‑install/post‑install commands. The definition format is similar to RPM/DEB files.
MPM packages are built by Google’s internal Blaze build system (now open‑sourced as Bazel). The build process is idempotent; unchanged content does not produce a new package, and metadata or signatures can be added during the build.
Google compiles source dependencies statically, so the resulting binaries contain most dependencies, allowing multiple versions to coexist on the same machine without conflicts.
Package Metadata
MPM supports two types of metadata:
Immutable : build time, builder, file list with checksums, labels (e.g., production=2016_04_09_00), version ID.
Mutable : user‑defined labels and deletion policies.
Immutable metadata guarantees uniqueness after a build, while mutable metadata can be flexibly used across different deployment environments.
Metadata is stored in a Bigtable cluster replicated across data centers. Clients query a local Root server, which caches metadata; if the local server fails, the client falls back to a nearby Root server.
Package Distribution and Replication
MPM packages are pulled by clients on demand, avoiding unnecessary network traffic. Replication is managed by a central replication server that stores packages in the distributed file system Colossus and ensures cross‑region availability. Each data center also runs a cache server for frequently used packages.
Clients use a P2P protocol to copy packages locally; in 2014, millions of pulls transferred several petabytes of data daily.
Package Security
Security is enforced through three mechanisms:
Access Control Lists (ACLs) : hierarchical namespaces with inherited permissions; three roles – Owner, Builder, and label‑owner – control package creation, label modification, and deletion.
Encryption : individual files can be encrypted; ACLs define who may encrypt; encryption/decryption happens automatically on the client, and servers never see plaintext data.
Digital Signatures : packages can be signed at build time or later using a dedicated signing service; verification requires matching package name and signer.
MPM System Characteristics
Key advantages that drive widespread internal adoption include:
Infrastructure Integration : standardized template files let users specify package name and labels; resource requirements for copy, unzip, and install are calculated automatically; the client is integrated into the cluster environment, making operations transparent.
Labels : enable version selection, environment segregation (dev, canary, production), immutable labels, and special labels like latest that always point to the newest build; rolling back is as simple as changing a label and restarting the task.
Filegroups : subsets of package files can be grouped for selective copying; a file may belong to multiple groups, allowing, for example, stripped and unstripped binaries to coexist in the same package but be fetched independently.
Web Interface : users can browse all packages, view metadata, and see historical size trends for filegroups.
Remarks
The only public presentation of Google’s MPM system was at the 2014 LISA conference; this article is compiled from that material. Interested readers can request the conference PPT by replying with “MPM”.
© Content sourced from the web; original author and source are credited. If any copyright issues arise, please notify us for removal.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.