How Alibaba Handles Leap Seconds: Definitions, Impacts, and Synchronization Strategies
The article explains the historical definition of the second, the concept of leap seconds, their impact on large‑scale IT systems, and details Alibaba's multi‑step approach of splitting the extra second into 86,400 parts to maintain accurate time synchronization across its infrastructure.
When discussing "leap seconds," most people are unfamiliar with the term, even though they know about leap years and months; however, precise computing and finance now require awareness of this extra second, especially for large internet companies like Alibaba.
Definition of the second
1956: defined as 1/86,400 of the mean solar day.
1960‑1967: defined as 1/86,400 of Earth's rotation period in 1960.
1967: defined by 9,192,631,770 periods of the cesium‑133 hyperfine transition.
1977: refined to account for gravitational time dilation, using cesium atoms at sea level.
2019: the International Committee on Weights and Measures began discussing a redefinition.
The definition shifted from astronomical to atomic standards, moving time measurement from macro to micro scales.
Time standards
UT (Universal Time) is based on Earth's rotation and suffers from irregularities; TAI (International Atomic Time) uses atomic clocks and has nanosecond accuracy; UTC (Coordinated Universal Time) combines UT and TAI, serving as the de‑facto global time reference and the basis for leap seconds.
Why leap seconds are added
Variations in Earth's rotation caused by tides, tectonic activity, and other natural phenomena lead to discrepancies between UT and atomic time; when the difference exceeds 0.9 seconds, the International Earth Rotation and Reference Systems Service (IERS) inserts or removes a second to keep UTC aligned with UT.
The 27th leap second
The 27th leap second occurred on 2017‑01‑01 at 07:59:60 Beijing time (00:00:00 UTC). It was announced by the National Time Service Center and IERS.
Impact of leap seconds
While everyday life is barely affected, IT, finance, and aerospace systems can experience failures if they do not handle the extra second, as seen with incidents on major websites during the 2012 leap second.
Older Linux kernels may crash or hang when processing leap seconds, and application‑level software can also fail, especially time‑sensitive services such as databases.
Common industry mitigations include spreading the extra second over a whole day or gradually slowing the clock before the event.
Alibaba's approach
Alibaba splits the extra second into 86,400 parts, adjusting the clock at a rate of 0.011574 ms per second over a 24‑hour window, starting 12 hours before the leap second and ending 12 hours after, reducing the maximum deviation to 0.5 seconds.
Synchronization rate changed from 0.5 ms/s to 0.011574 ms/s.
Synchronization window extended from 2,000 seconds to 86,400 seconds.
Adjustment period shifted to start 12 hours before and end 12 hours after the leap second.
Maximum error reduced from 1 second to 0.5 seconds.
Testing and implementation
Two months before the leap second, Alibaba performed extensive testing on large clusters, monitoring offset data per minute. The table below outlines the timeline of actions, from cutting GPS signals and using internal rubidium clocks to gradually adjusting client clocks and finally restoring normal operation after the leap second.
Time Point
Action
Result
36 hours before
Cut GPS, use internal rubidium clock, clear leap‑second flag
Eliminate kernel bug risk
18 hours before
Disconnect secondary from primary source
Secondary source stabilizes
12 hours before
Aggregate secondary sources, start 0.011574 ms/s adjustment
Clients begin slow sync
Leap‑second moment
...
Maximum error reaches 0.5 s
12 hours after
Stop adjustment, reconnect primary GPS
Error returns to near zero
13 hours after
Re‑establish primary‑secondary link
Leap‑second change completed
Alibaba notes that hardware, network, and temperature can affect individual server clocks during aggregation, requiring careful monitoring.
For more details, see the images and links embedded in the original article.
For further discussion, scan the QR code to follow the "AlibabaInfrastructure" channel.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.