Fundamentals 13 min read

Power‑Saving Techniques for PCI Express IP in SoC Designs

This article explains three power‑saving techniques—clock gating, power gating, and protocol‑level power management—for PCI Express IP in system‑on‑chip designs, detailing their impact on dynamic and static power, implementation challenges, and how designers can achieve high energy efficiency while meeting fast recovery requirements.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Power‑Saving Techniques for PCI Express IP in SoC Designs

For designers using PCI Express, integrating PCI Express IP into an SoC positively impacts power consumption because, besides the protocol's built‑in power‑saving features, advanced power‑management techniques can further control power. Clock gating addresses dynamic power only, while power gating is ideal for reducing the larger static power caused by smaller feature sizes. In deep‑sleep mode, power‑gated PCI Express IP typically requires link retraining or re‑configuration, extending recovery time and presenting a significant challenge.

This article uses PCI Express IP as an example to introduce three power‑saving techniques and shows how designers can leverage protocol and tool‑based power‑management features to create high‑efficiency SoCs for devices that require rapid recovery.

1. Clock‑gating techniques: synthesis tools

Modern synthesis tools provide several clock‑power‑management methods, such as traditional clock gating and self‑gating. Traditional clock gating creates an enable (EN) signal that disables the clock to flip‑flops. Integrated Clock Gating (ICG) units use EN to shut off the clock to a chain of flip‑flops, as shown in Figure 1. Self‑gating disables the clock when a flip‑flop's input remains constant, using an XOR of the input and output as the EN signal.

This strategy yields high clock‑gating efficiency (CE) but incurs a modest area increase due to extra logic. Tools typically insert self‑gating after traditional clock gating to further improve CE, balancing area versus power. Power‑analysis tools can evaluate and optimize this automated approach, reporting the efficiency of existing clock gates and identifying further insertion opportunities.

Optimized traditional clock gating combined with PCI Express IP can achieve at least a 40% power reduction and about a 9% area reduction on a 28 nm node. Adding self‑gating after traditional gating can save an additional 5% power with roughly a 1% area increase. On a 16 nm FinFET node, combined traditional and self‑gating can reach a 25% power saving.

2. Clock‑gating techniques: PCI Express IP

Although tool‑inserted clock‑gating can significantly cut power, it only considers the design at the flip‑flop level, gating the clock at each flip‑flop’s clock input. This fine‑grained approach ignores the higher‑level clock tree, which can consume at least 25% of standby power in complex designs. Gating the clock at the root of the hierarchy reduces this consumption because the clock tree switching is eliminated.

Consider a PCI Express IP design with an ARM® AMBA® interface and three clock domains (Figure 4). The AMBA master receives PCI Express requests, converts them to AMBA transactions, and forwards them to the application clock domain. The AMBA slave processes outgoing AMBA transactions on its own clock domain and converts them back to PCI Express requests. Remaining blocks handle the actual PCI Express functionality on the core or reference clock.

Various scenarios exist where AMBA master/slave clocks can be independently gated, regardless of the PCI Express link state:

1. For inbound requests from the link, keep the local core clock domain and AMBA master clock active while gating the AMBA slave clock.

2. For outbound requests from the application layer, keep the AMBA slave clock and core clock active while gating the AMBA master clock.

3. For traffic that does not require application‑logic intervention, only the local core clock domain is needed, allowing both AMBA master and slave clocks to be gated.

Structured clock‑gating blocks do not need to follow PCI Express link power‑management states. When no pending requests exist in a particular direction, the standby block’s clock can be turned off, saving at least 10% power and improving standby efficiency by a similar margin. Table 1 (not reproduced) compares full‑load and standby power with and without structured clock gating.

3. Clock‑gating and power‑gating techniques: PCI Express protocol

The PCI Express protocol defines power‑management states L0, L1 (and sub‑states), and L2/L3. Exiting L2/L3 requires power restoration and link retraining, which lengthens recovery time. In L0 and L1 sub‑states, clock gating is used to minimize recovery latency.

In L0, the AMBA master and slave clocks can be selectively enabled based on traffic direction. In the L1 sub‑state, the reference clock can be disabled, allowing the PLL to continue generating the core clock and avoiding PLL restart delays. This yields the lowest power consumption among clock‑gating techniques, limited to leakage in digital and analog circuits.

If a system can tolerate up to five times longer recovery, the PLL and transmitter/receiver can be turned off in the L1.1 sub‑state, achieving up to 97.5% energy savings compared to a clock‑gating‑only L1.1. Allowing up to fifteen times longer recovery in L1.2 enables disconnection of the common‑mode voltage, reducing power to 0.05%.

PCI Express also provides message‑based power‑management assistance such as Latency Tolerance Reporting (LTR) and Optimized Buffer Flush/Fill (OBFF). LTR conveys the maximum downstream latency a device can tolerate, enabling the host to schedule link operations without violating recovery constraints. OBFF lets the host inform downstream devices of system state, allowing them to optimize transmission scheduling and extend time spent in low‑power modes.

Conclusion

Power management is crucial for devices that require rapid recovery during intermittent communication and standby periods. Tool‑based and protocol‑based clock‑gating techniques can deliver substantial energy savings for PCI Express IP designs, especially when near‑zero recovery time is needed. Avoiding link retraining and re‑configuration further enhances both power efficiency and recovery speed.

Synopsys DesignWare PCI Express IP leverages tool‑injected clock‑gate insertion, structured clock‑gate blocks independent of link power states, support for L1 sub‑state clock management, power‑gating solutions for L1.2, and power‑management assistance features such as LTR and OBFF.

Download: Three Power‑Saving Techniques Using PCIe IP

power managementenergy efficiencyPCI ExpressSoC DesignClock Gating
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.