The advent of Peripheral Component Interconnect Express (PCIe) 5.0 and related protocols such as Compute Express Link (CXL) underscore a trend in the data center industry toward heterogenous computing topologies and computation-intensive workloads. Industry heavyweights and aspiring startups alike are developing semiconductor integrated circuits (ICs) to process an ever-increasing quantity of data. These purpose-built ICs have two overarching requirements in common: high bandwidth and low latency.
The low latency requirement is fairly intuitive; when you have multiple processing elements working on the same data sets, there must be a low-latency connection between all of them to efficiently maintain the coherency of the data.
The high bandwidth requirement is also an obvious one, but its implications on system design complexity and cost are far reaching. By 2021, it’s estimated that the number of workloads and compute instances will almost triple (2.7 fold) in cloud data centers compared to 2016; in that same period, the compute density (workloads and compute instances per physical server) will increase by 50%1 . With the deployment of more servers, each one having more compute density, the challenges associated with moving data between processor, networking and storage nodes cost-effectively are exploding; and signal integrity (SI) will be the primary pain point for these densely-packed systems. Solving the SI problem will require a careful balance between signal retimers and low-loss printed circuit board (PCB) materials.
With Higher Bandwidth Comes Heartburn
Moving data from point A to point B in a server is no simple task. Figure 1 shows an 8 gigatransfers per second (GT/s) PCIe 3.0 server channel topology. It is possible to construct a 10-inch motherboard plus a 1-inch riser card plus a 4-inch add-in card (AIC) with mainstream PCB material (such as FR4 TU-862) and still meet the 22-dB insertion loss budget across temperature without requiring a signal retimer.
For 16-GT/s PCIe 4.0, the data transfer speed per lane doubles; this same topology now exceeds Gen 4’s 28-dB budget by 3 dB. Either a retimer or redriver (typically placed on the riser card) or low-loss PCB material is necessary to close the gap. Read the blog post “PCI-Express Retimers vs. Redrivers: An Eye-Popping Difference” to understand the difference between retimers and redrivers.
For 32-GT/s PCIe 5.0, the speed doubles again. The same topology with mainstream PCB materials now violates Gen 5’s 36-dB budget by 16 dB when accounting for temperature and humidity effects (more on this later). In fact, the insertion loss budget is exceeded before the signal ever leaves the motherboard, and a retimer is necessary on the motherboard for any card electromechanical (CEM) slot greater than 5.5 inches away, with or without a riser card (see Table 1).
Signal integrity challenges of a common server topology and the use of retimers.
PCIe channel reach for standard topologies using mid-loss PCB material.
Upgrading to an ultra-low-loss PCB material (such as Megtron-6) is an option, but this can be an expensive proposition depending on board size, layer count and volume2,3 . Even when using ultra-low-loss material, many common topologies—multiconnector channels, captive channels longer than 14.9 inches (total) or standard CEM slot channels longer than 12.1 inches on the base board (see Table 2)—still exceed the total channel budget. In such cases, a retimer will still be necessary to ensure low error rates and robust link performance.
PCIe 5.0 channel reach for different topologies using ultra-low-loss PCB material.
Designing with Margin
Of course, no two server topologies are alike, and servers typically have hundreds of PCIe channels with varying lengths between the root complex (RC) and the end point (EP). One exercise every server designer must go through is to guarantee that all channels in the server fall below the PCIe channel loss budget, and to ensure that this requirement holds true across temperature/humidity variation and manufacturing tolerances.
If a given channel and PCB material meet the loss budget under nominal board conditions (e.g. 20o C), the same channel may not meet the budget under extreme conditions (e.g. 80o C), in which case a lower-loss material or retimer will be required to meet compliance across all conditions.
Figure 2 shows the losses of different PCB materials over temperature. The more expensive, lower-loss materials have less temperature variation, while mainstream materials can vary significantly over temperature, especially at Gen 5 data rates.
Loss per inch of different PCB materials over temperature.
Another factor to consider is the concept of safety margin, a self-imposed reduction in the specification channel loss limit to accommodate for manufacturing variances, simulation-to-measurement correlation mismatches and other unforeseen degradations affecting system performance. The PCIe 4.0 and 5.0 specifications allow for a maximum of 28 dB and 36 dB losses, respectively. Designing a system to just barely meet these requirements is like flying an airplane across the Pacific Ocean with just enough fuel in the tanks to make the journey. I would not board this plane if I were you!
Just as I feel safer flying knowing there’s some extra percentage of fuel onboard, you can take comfort in knowing that all channels are some percentage below the absolute maximum loss requirement. Typical safety margin figures are in the 10%-20% range; 15% is a good rule of thumb. This means that Gen-4 and Gen-5 systems will not exceed 28 dB – 15% = 24 dB and 36 dB – 15% = 31 dB, respectively.
To truly design a system with margin, you must account for well-quantified variances like temperature, and not-so-well-quantified risks covered by safety margin.
Striking a Balance – What’s a Practical Solution?
To analyze the impact of PCB material selection and the use of retimers on system margin, consider the following assortment of system board topologies and channel lengths, which span the shortest and longest channels implemented in many server designs, in Table 3.
Distribution of channel topologies considered in this analysis.
*For a captive channel topology, trace length noted includes system board and AIC or mezzanine card.
From the system board length, topology type and PCB material category, you can calculate the total channel loss for nominal and worst-case temperatures. For PCIe 4.0, Figure 3 shows that a relatively low-cost upgrade to a low-loss material such as Megtron-4 will enable 66% to 100% of the channels to pass, depending on how much safety margin is applied. A more costly upgrade to an ultra-low-loss material like Megtron-6 will bring that up to 89% to 100%. In both cases, you can easily resolve the lack of margin on the remaining channels by using a retimer.
16-GT/s Gen 4 total channel loss for different topologies and PCB materials.
For PCIe 5.0, Figure 4 shows that even with an ultra-low-loss material, a rather unsettling 11% to 78% of channels will not meet the loss target (again, depending on the safety margin applied). In such cases, it may be worthwhile to save on the PCB material expense; stick with lower-cost materials; and use retimers to effectively cut the channels in half, bringing their losses well below specifications with some added safety margin for peace of mind.
32-GT/s Gen 5 total channel loss for different topologies and PCB materials.
Time will tell how easy or difficult it will be to design for optimum signal integrity in PCIe 5.0 systems; however, the availability of retimers from Astera Labs and advanced PCB materials should make this upcoming challenge a lot less formidable.
- Cisco Global Cloud Index: Forecast and Methodology, 2016-2021 white paper
- Cost Drivers in Manufacturing of PCBs.
- PCB Cost Adders, IPC Designers Council.