Title
Design, Analysis and Application of System-Level Power Distribution Networks

Permalink
https://escholarship.org/uc/item/6zx0144n

Author
Zhang, Xiang

Publication Date
2017

Peer reviewed|Thesis/dissertation
UNIVERSITY OF CALIFORNIA, SAN DIEGO

Design, Analysis and Application of System-Level Power Distribution Networks

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy

in

Electrical Engineering (Computer Engineering)

by

Xiang Zhang

Committee in charge:

Professor Chung-Kuan Cheng, Chair
Professor Bill Lin
Professor Patrick Mercier
Professor Yuan Taur
Professor Michael Taylor

2017
The dissertation of Xiang Zhang is approved, and it is acceptable in quality and form for publication on microfilm and electronically:

Chair

University of California, San Diego

2017
DEDICATION

To my family.
TABLE OF CONTENTS

Signature Page ........................................ iii
Dedication ........................................ iv
Table of Contents ..................................... v
List of Figures ....................................... viii
List of Tables ....................................... xii
Acknowledgements ..................................<xiii
Vita ................................................... xvi
Abstract of the Dissertation ......................... xviii

Chapter 1 Introduction ................................ 1
  1.1 Power Distribution Network in System Integration and VLSI Design ................................ 1
  1.2 Current Research Efforts .......................... 5
  1.3 Dissertation Outline .............................. 7

Chapter 2 Background on Power Distribution Networks ....... 9
  2.1 Power Distribution Network Basics ............... 9
  2.2 Power Distribution Network Noise ................ 14
  2.3 Power Distribution Network Applications .......... 14

Chapter 3 Ratio of the Worst-Case Noise and the Impedance of Power Distribution Network ........ 18
  3.1 Background ..................................... 19
  3.2 Problem Formulation .............................. 21
    3.2.1 Worst-Case PDN Voltage Noise .............. 22
    3.2.2 Peak Output Impedance .................... 24
  3.3 Maximum Ratio $\gamma$ in Series RL/RC Circuits and Standard LC Tanks ......................... 24
    3.3.1 Series RL/RC Circuit ....................... 25
    3.3.2 Standard LC Tank with $ESR_c$ .............. 27
  3.4 Case Study: A Complete Power Distribution Network Path 39
  3.5 Case Study: Power Distribution Network Design Optimization with On-Die Voltage Dependent Leakage Path 47
    3.5.1 Voltage Dependent Leakage Resistance Model 47
    3.5.2 RLC Tank Model with Leakage Resistance 49
| Chapter 4 | Worst-Case Noise Area Prediction of On-Chip Power Distribution Network | 59 |
| 4.1 Background | 60 |
| 4.2 Problem Formulation | 62 |
| 4.3 Worst Noise Area Prediction of RLC tank: Analytical Solution | 66 |
| 4.4 Worst Noise Area Prediction for PDN Cases: Algorithmic Solution | 69 |
| 4.5 Experimental Results | 72 |
| 4.5.1 Circuit Delay vs Supply Noise Area | 72 |
| 4.5.2 Critical Path Delay under Worst-Area and Worst-Peak Supply Noises of an RLC Tank | 73 |
| 4.5.3 Worst-Area and Worst-Peak Noise of Multi-Stage Cascaded RLC Tanks | 75 |
| 4.5.4 Critical Path Delay under Worst Noise Area Fluctuation: a Test Case | 80 |
| 4.6 Summary | 82 |

| Chapter 5 | Enhancing Off-Chip Communication Throughput from Power Lines | 89 |
| 5.1 Background | 90 |
| 5.2 Design Overview | 92 |
| 5.2.1 On-Die Implementation | 94 |
| 5.2.2 Package Implementation | 96 |
| 5.2.3 PCB Implementation | 97 |
| 5.2.4 PCB Model Analysis | 99 |
| 5.3 Signal Integrity Investigation for PCB Model | 100 |
| 5.3.1 Middle Notch Effect | 102 |
| 5.3.2 Surrounding Notch Effect | 103 |
| 5.3.3 Analysis of PCB Model with industrial SOC Package Footprint | 108 |
| 5.4 Power Delivery Network Analysis | 109 |
| 5.5 PLC to PDN Noise Mitigation Analysis | 110 |
| 5.6 Case Study: A Complete Power Delivery and Data Communication Path | 111 |
| 5.6.1 Eye Diagram for Signal Mode | 113 |
| 5.6.2 PDN Analysis for Power Mode | 117 |
| 5.7 Summary | 120 |
Chapter 6  Boosting Off-Chip Interconnects through Inter-Package Capacitive Proximity Communication ........................................ 121
6.1  Background ................................................................. 121
6.2  Design Overview .......................................................... 123
  6.2.1  Capacitor Model Analysis ........................................ 125
  6.2.2  Manufacturing Tolerance ........................................... 128
6.3  Performance Analysis .................................................... 128
  6.3.1  The Size of the Metal Plate .................................... 128
  6.3.2  The Distance from Metal Plate to PCB GND Plane 129
  6.3.3  Transmitter Drive Strength (DS) .............................. 129
6.4  Summary ................................................................. 130

Chapter 7  Conclusion ............................................................ 132
7.1  Summary of Contributions ............................................ 132
7.2  Future Work ............................................................... 134

Bibliography ............................................................ 135
LIST OF FIGURES

Figure 1.1: Target impedance prediction according to ITRS. Assume $Z_{\text{target}} = \frac{V_{dd} \times 5\%}{I_{\text{load}}}$ ............................... 2
Figure 1.2: A main logic board for iPhone 7™. Courtesy of www.ifixit.com 3
Figure 1.3: Snapdragon 600™ pin assignment. ............................... 4
Figure 2.1: A cross-sectional view of power distribution network for high performance integrated circuits [66]. ............................... 10
Figure 2.2: A circuit diagram characterizing the impedance of PDN.  .... 11
Figure 2.3: A CPU load current profile measured on various power pins are shown in (a). The current spectrum for VDD pin 60 is shown in (b). [39] ............................... 12
Figure 2.4: A PDN with two-stage RLC tanks. ............................... 13
Figure 2.5: The impedance profile of a PDN with two-stage RLC tank.  .... 13
Figure 2.6: Off-chip bandwidth limitation. (Courtesy of Professor Yalamanchili) ............................... 15
Figure 2.7: The projection for the trend of the silicon process technology advancement. (Courtesy of ITRS) ............................... 16
Figure 2.8: The projection for package technology advancement. (Courtesy of Steve Bezuk) ............................... 17
Figure 3.1: Standard LC tank with $ESR_c$ ............................... 27
Figure 3.2: The step response of an underdamped LC tank with $ESR_c \ (Q > 0.5)$. (a) The first local extremum is a peak. (b) The first local extremum is a valley. ............................... 32
Figure 3.3: Impedance magnitude sweep of an underdamped LC tank with $ESR_c$, when $y_0 > 0$. The peak occurs at $Z_{\text{max}} = |Z(y_0)|$. ............................... 34
Figure 3.4: Impedance magnitude sweep of an underdamped LC tank with $ESR_c$, when $y_0 < 0$. (a)$Z_{\text{max}} = Z(0)$, (b)$Z_{\text{max}} = Z(\infty)$. ............................... 35
Figure 3.5: The contour line $(\gamma)$ as a function of $Q$ and $Q_2$ (The shaded area is not a valid area due to the condition $Q_2 > Q > 0.5$.) ............................... 36
Figure 3.6: The ratio $\gamma$ versus the quality factor $Q$ (when $R_1 = R_2$) ............................... 37
Figure 3.7: Standard LC tank without $ESR_C$ ............................... 38
Figure 3.8: The ratio $\gamma$ versus the quality factor $Q$ of a LC tank without $ESR_C$. ............................... 39
Figure 3.9: A complete PDN path is illustrated by a lumped cascaded LC tank model. A high order multi-stage PDN system can be approximate to three second-order LC tanks under different frequency regions. ............................... 40
Figure 3.10: The output impedance of a complete PDN path (a) Magnitude (b) Phase. ............................... 41
Figure 3.11: (a) Worst-case peak noise of a complete PDN path, (b) Worst-case load current pattern, (c) The zoomed-in view for the worst peak noise on PDN. (d) The zoomed-in view for the worst-case load current pattern.

Figure 3.12: Two PDN cases to test the proposed prediction method.

Figure 3.13: (a) Leakage current vs supply voltage (b) Equivalent leakage resistance vs supply voltage.

Figure 3.14: A circuit diagram characterizes the impedance of PDN. On-chip load can be modeled as (a) a single current source, (b) a current source with constant leakage resistor, (c) a current source with voltage-dependent leakage resistor.

Figure 3.15: A RLC tank model with leakage resistance.

Figure 3.16: (a) The optimal value $R_1$ and $R_2$ (with minimum worst-case noise) as leakage $R_3$ decreases. (b) The minimum worst-case noise of a RLC tank as leakage $R_3$ decreases.

Figure 3.17: Leakage resistance $R_3$ as the load current $i(t)$ changes.

Figure 3.18: Voltage noise of a RLC tank with different leakage resistance models.

Figure 3.19: Impedance profile of a complete PDN path with various leakage resistance values.

Figure 3.20: The peak voltage noise (droop) of a complete PDN path in time-domain.

Figure 3.21: Voltage noise of a complete PDN path with different leakage resistance models.

Figure 4.1: A typical circuit diagram characterizing the impedance of PDN.

Figure 4.2: A datapath of inverter chain under two supply patterns. The dash curve induces larger delay despite smaller peak noise. (period $T = T_1 - T_0$)

Figure 4.3: An example of PDN system with (a) the impulse response $h(t)$, (b) the step response $V_s(t)$, (c) the ramp response $R_s(t)$ (integral of $V_s(t)$) and (d) the noise area function $A_s(t)$.

Figure 4.4: A standard RLC tank model.

Figure 4.5: The generation of $t_k$ and $i_w(t)$ in terms of peak-to-valley distances.

Figure 4.6: Normalized delay of a datapath under different supply voltage noise area. (The delay under constant $V_{dd} = 1V$ is normalized to 1.)

Figure 4.7: Load current, voltage noise and voltage area of the worst-case peak and area of a standard RLC tank model, $T = 17ns$, (Nominal voltage 1V is superimposed in (b) and (c)).

Figure 4.8: The delay of the datapath under the worst-area and worst-peak noise of a standard RLC tank model ($T = 17ns$). Situation.

Figure 4.9: Circuit diagram of a cascaded RLC Tank PDN.
Figure 4.10: Three standard RLC tanks to model a cascaded tank in Case I of Table 4.1

Figure 4.11: The impedance profile of a complete PDN path

Figure 4.12: The worst-peak and worst-area current, voltage response and voltage area response \((T = 12.5 ns)\) of a complete PDN path. (d-f) shows the expanded view of (a-c) at the peak droop point.

Figure 4.13: The delay under worst-area and worst-peak supply noise for a complete PDN path \((T = 12.5 ns)\)

Figure 4.14: Downhill region \(r_{j-1}\) is sandwiched by peak \(pv_{j-1}\) and valley \(pv_{j}\), Uphill region \(r_{j}\) is sandwiched by valley \(pv_{j}\) and peak \(pv_{j+1}\), etc.

Figure 4.15: A set \(X_{j}\) of \(n'\) local sampling points \(\{x_{0}',...,x_{n'-1}'\}\) within region \(r_{j}\)

Figure 5.1: High-level overview of the proposed power line communication (PLC) on PDN.

Figure 5.2: The circuit diagram of an on-die differential-signal-to-power switch for PLC.

Figure 5.3: The capacitance model for a Mosfet.

Figure 5.4: A four-layer package (a) with the original shared power plane (b) with separate power planes for dedicated and hybrid pins for PLC.

Figure 5.5: An overview of the four-layer PCB test layout model for PLC.

Figure 5.6: An overview of a four-layer PCB test coupon layout for PLC.

Figure 5.7: The stackup of the test PCB layout.

Figure 5.8: The definition of design parameters on Layer 3 of PCB.

Figure 5.9: Five PCB test cases with different length of the middle notch on Layer 3.

Figure 5.10: \(Sdd21\) of the five cases in Figure 5.9.

Figure 5.11: Six PCB test coupons with different size of the surrounding notches.

Figure 5.12: \(Sdd21\) of the six test cases in Figure 5.11.

Figure 5.13: \(Sdd21\) of two channels from the original and the modified power plane.

Figure 5.14: A package power plane layout change for two hybrid pairs.

Figure 5.15: The probe points for the noise coupled from the data transmission of hybrid pins to dedicated power pins.

Figure 5.16: Schematic for data communication on a PDN.

Figure 5.17: Eye diagram of a 30GHz (with Manchester code) PLC (a) without equalizer, (b) with equalizer.

Figure 5.18: The transfer function of the receiver equalizer.

Figure 5.19: The circuit diagram of the receiver equalizer.

Figure 5.20: Receiver eye diagram after equalization with near-end and far-end noise source from power plane.
Figure 5.21: Receiver eye diagram with and without equalizers when both
canals transmit at the same time. ................. 117
Figure 5.22: Schematic for the original PDN without hybrid pins and the
modified PDN with one pair. ......................... 118
Figure 5.23: Impedance profile for the original and the modified PDN with
one pair of PLC. ................................. 119

Figure 6.1: High-level overview of the proposed Inter-Package Capacitive
Proximity Communication (IPCPC) .................. 124
Figure 6.2: High-level overview of the capacitor model for IPCPC .... 126
Figure 6.3: (a) Plate to plate capacitance $C_{26}$ vs $d$. (b) Plate to ground ca-
pacitance $C_{sg} + C_{bg}$ vs $d$. .................. 127
Figure 6.4: (a) Plate to plate capacitance $C_{26}$ vs $b$. (b) Plate to ground ca-
pacitance $C_{sg} + C_{bg}$ vs $b$. .................. 127
Figure 6.5: Simulation setup for channel performance for IPCPC. .... 129
Figure 6.6: Eye diagrams for signal and crosstalk observed at receiver and
neighboring channel. (a) Signal for $0.3 \times 0.3mm^2$ plate, (b)
Crosstalk for $0.3 \times 0.3mm^2$ plate, (c) Signal for $0.2 \times 0.2mm^2$
plate, (d) Crosstalk for $0.2 \times 0.2mm^2$ plate. ........ 130
Figure 6.7: Eye diagrams for signal with different $b$. (a) $b = 0.07mm$, (b)
$b = 0.57mm$. .................................... 131
Figure 6.8: Eye diagrams for signal with different source drive strength
(DS). (a) $R_1 = 20ohm$, (b) $R_1 = 50ohm$. ............ 131
## LIST OF TABLES

Table 1.1: Full-chip leakage power (normalized to full-chip leakage power dissipation in 2011) .......................................................... 3

Table 3.1: The worst-case noise prediction of three complete PDN cases. ......................................................... 46

Table 4.1: The R,L,C parameters for three cascaded RLC tank cases ..................................................... 76

Table 4.2: Comparison of the worst-case noise prediction between the RLC tank decomposition method and Alg. 3 results. $T = 10ns$ for $A_w$. 79

Table 4.3: Comparison of the worst-peak and the worst-area noise for a complete PDN path ($T = 12.5ns$) ........................................................................... 82

Table 5.1: Ball allocation for a commercial SOC [1, 2] ................................................ 91

Table 5.2: The length of the middle notch vs $f_{\text{valley}}$ ..................................................... 103

Table 5.3: The length of the side notch vs $f_{\text{valley}}$ ..................................................... 105

Table 5.4: Power pin impedance change for PLC .................................................. 109

Table 5.5: The maximum coupling noise at each probe point .............................................. 112
ACKNOWLEDGEMENTS

Pursuing a Ph.D. degree has been my goal since I came to US from China nine years ago, which has never been changed even I started working in industry six and a half years ago. Thank Professor Chung-Kuan Cheng to bring me the opportunities to work with him in power integrity and VLSI field in the last five and a half years. No only has he advised my research work throughout these year, but also he taught me how to be a better person in life. Along with this graduate study journey, I have learnt time management to balance the research works and the engineering projects with my full time job. I enjoyed the whole process of addressing challenging research problems, which is also beneficial to my knowledge of solving problems in my daily jobs. I would also like to thank Professor Bill Lin, Professor Patrick Mercier, Professor Yuan Taur, and Professor Michael Taylor, who have served as my Ph.D. committee members.

Thanks to my current and previous colleagues at Apple and Qualcomm, who have provided tremendous support to my part time study and insightful discussions about the research advisory and new ideas. Particularly, I would like to thank Raghunandan Nagesh, Shaun Raman, Xiaoming Chen, Tun Li and Siming Pan. Also my supervisors, Alex Gigglberger and Craig Birrell. Also thanks to my collaborates, labmates and friends, who I have worked with, including Prof. Yang Liu, Dr. Xiang Hu, Dr. Jingwei Lu, Dr. Hao Zhuang and Ryan Coutts.

Last but not least, I owe my deepest gratitude to my family for their tremendous and continuous support.

This dissertation uses the material from several papers during my PhD research. They are listed as follows:

Chapter 3, in part is a reprint of the material as it appears in ”Ratio of the Worst Case Noise and the Impedance of Power Distribution Network”, by Xiang

Chapter 4, in full is a reprint of the material as it appears in "Worst-Case Noise Area Prediction of On-chip Power Distribution Network", by Xiang Zhang, Jingwei Lu, Yang Liu, and Chung-Kuan Cheng in *Proceedings of ACM/IEEE International Workshop on System Level Interconnect Prediction 2014*. The thesis author was the primary investigator and author of the paper.

author was the primary investigator and author of the papers.

Chapter 6, in full is a reprint of the material as it appears in "Boosting Off-chip Interconnects through Inter-Package Capacitive Proximity Communication", which is in preparation for IEEE Conference on Electrical Performance Of Electronic Packaging and Systems 2017, by Xiang Zhang, Dongwon Park and Chung-Kuan Cheng. The thesis author was the primary investigator and author of the paper.
VITA

2008 B. Eng. in Electronic Engineering, Shanghai Jiaotong University

2010 M. S. in Electrical and Computer Engineering, University of Arizona

2011-2014 Senior Engineer, Qualcomm Technologies Inc

2012-2015 Ph. D. student, University of California, San Diego

2014-now iPhone Hardware Systems Design Engineer, Apple Inc

2015 C. Phil in Electrical Engineering (Computer Engineering), University of California, San Diego

2015-2017 Ph. D. candidate, University of California, San Diego

2017 Ph. D. in Electrical Engineering (Computer Engineering), University of California, San Diego

PUBLICATIONS


ABSTRACT OF THE DISSERTATION

Design, Analysis and Application of System-Level Power Distribution Networks

by

Xiang Zhang

Doctor of Philosophy in Electrical Engineering (Computer Engineering)

University of California, San Diego, 2017

Professor Chung-Kuan Cheng, Chair

The design of power distribution networks (PDNs) has become increasingly complex and less margin, as the CMOS technology node continues to scale down into 10nm and below and the operating voltage of high-performance (HPm) logic keeps decreasing. As circuit density on a single chip doubles every two to three years, the current density is growing rapidly as well. Thanks to the emerging of the application of machine learning and deep learning, more and more logic blocks, such as application-specific or heterogeneous integrations are needed on future application processors (APs). All of the above require a better design and
In this dissertation, we address design and analysis of PDNs from the whole electronic system, including board level, package level and die level designs. First, we analyze the mathematical relation between time-domain voltage response and the frequency-domain impedance of PDN. We also propose a method to fast estimate the worst-case PDN noise for industrial PDN models and extend PDN design and analysis by considering the impact of on-die leakage of PDN. Second, we discuss the PDN design applications by predicting the longest delay of a datapath due to the worst-case noise area of the supply voltage. Third, we propose power line communication (PLC) to reuse part of PDNs and package to package capacitive communications as data transmission channels to increase the off-chip bandwidth during SOC low performance state.
Chapter 1

Introduction

1.1 Power Distribution Network in System Integration and VLSI Design

Power distribution network (PDN) has become one of the most critical topics in nano-scale VLSI design. With the continuous scaling of CMOS transistor technology and the recent advances of 3D-IC technology, the current density of a single chip keeps increasing while the operating voltage of high performance processors is gradually dropping. This results in the target impedance of a PDN in 2026 to drop more than five-fold from that value in 2011 (Figure 1.1), which brings us an even tighter noise margin requirement. The higher frequency leads to an ever increasing dynamic supply switching noise. The ITRS roadmap shows that the operating voltage of high-performance (HPm) logic will move to 0.73V in 2018 [3], which brings us an even tighter noise margin requirement. As a result, minimizing IR drop and antiresonance peaks caused by parasitic resistance, loop inductance and decoupling capacitance have become extremely critical to maintain a robust circuit performance.
Table I

<table>
<thead>
<tr>
<th>Year</th>
<th>Target Impedance Index (nΩ·GHz·cm²)</th>
</tr>
</thead>
<tbody>
<tr>
<td>2010</td>
<td>22 x 10⁻¹²</td>
</tr>
<tr>
<td>2012</td>
<td>18 x 10⁻¹²</td>
</tr>
<tr>
<td>2014</td>
<td>14 x 10⁻¹²</td>
</tr>
<tr>
<td>2016</td>
<td>10 x 10⁻¹²</td>
</tr>
<tr>
<td>2020</td>
<td>6 x 10⁻¹²</td>
</tr>
<tr>
<td>2024</td>
<td>4 x 10⁻¹²</td>
</tr>
<tr>
<td>2026</td>
<td>2 x 10⁻¹²</td>
</tr>
</tbody>
</table>

**Figure 1.1:** Target impedance prediction according to ITRS. Assume $Z_{target} = \frac{V_{dd} \times 5%}{I_{load}}$

Meanwhile, the full-chip leakage power in 2016 is predicted almost three times as what in 2011 as shown in Table 1.1 [3, 35], indicating that on-die leakage is no longer negligible for PDN analysis. Therefore, minimizing IR drop and simultaneous switching noise (SSN) of a PDN caused by leakage and parasitic resistance, loop inductance and transient currents have become extremely important.

System-level PDN design is extremely critical for the consumer electronics design, such as mobile devices, laptops, IoT and game consoles. A large portion of design material cost is dedicated to power delivery. Figure 1.2 shows the bottom side of the logic circuit board of an iPhone 7™ teardown. Chip inside the green
Table 1.1: Full-chip leakage power (normalized to full-chip leakage power dissipation in 2011).

<table>
<thead>
<tr>
<th>Yr. of Production</th>
<th>2011</th>
<th>2012</th>
<th>2013</th>
<th>2014</th>
<th>2015</th>
<th>2016</th>
</tr>
</thead>
<tbody>
<tr>
<td>Leakage Power</td>
<td>1.00</td>
<td>1.00</td>
<td>1.27</td>
<td>1.45</td>
<td>2.18</td>
<td>2.91</td>
</tr>
</tbody>
</table>

Figure 1.2: A main logic board for iPhone 7™. Courtesy of www.ifixit.com

area is a power management IC (PMIC). Lots of decoupling capacitors for PDN are placed in the region which is at the back side of the SOC. Figure 1.3 shows a Snapdragon 600E™ SOC pin assignment [2]. 50% of SOC balls are allocated for power and grounds to accommodate the highest performance state and different voltage domains.

Based on this findings, we can conclude that industrial design has been taking seriously consideration for system-level PDN performance while delivering large quantity and high quality products, and open up the question whether we
Figure 1.3: Snapdragon 600E™ pin assignment.
can utilize power and ground pins dynamically to improve off-chip communication bandwidth.

1.2 Current Research Efforts

Power distribution network has been a critical topic for both academia and industry for many years. In order to meet more and more aggressive voltage scaling and current demand in system level design, power integrity engineers are looking for new manufacturing technologies, design methodologies and fast simulation to improve the robustness of PDN.

From manufacturing technologies perspective, deep trench capacitors [43, 31] and on-die regulators [58, 11, 65] have been proposed to reduce the impedance profile of PDN. Intel has been reported to use fully integrated voltage regulator (FIVR) in the latest desktop chipset [47, 42]. Although on-chip capacitors can provide the best PDN noise decoupling performance, the amount of on-chip capacitance is greatly limited by die area. FIVR is also susceptible by thermal runaway for mobile and IoT applications. Advanced packaging technologies, such as flip chip [64], package decoupling caps [13] and package-on-package (POP) [71], have been widely applied to reduce the parasitic resistance and inductance of the PDN. From system level design, multi-phase buck regulators and remote feedback [46] has been applied to compensate the PCB or system level DC losses. Remote feedback usually comes with a strict requirement on phase margin for the feedback network and buck output capacitors.

From design methodologies perspective, one hot research topic is to bridge the gap between PDN measurement and simulation correlation [38, 34], and application specific PDN design methodologies. Cai [12] proposed to design DDR
memory rail PDN based on signaling timing margin. Goral et al. [27] studied PDN simulation through IC behavior model. Based on modern IC design flow, an early PDN analysis without netlist information is very important for floor-planing and chip area estimation. Ko et al. [40] proposed a simplified chip power model as a function of leakage current, operating frequency and the measurement data from the previous generation chip. Lalgudi et al. [45] initiated a finite-difference formulation based on the latency insertion method (LIM) has been employed for simulating the power-supply noise in the on-chip PDN. However, most of those early prediction work requires a knowledge of circuit information that might cause confidentiality problems for the intellectual property (IP) from the industry perspective, and insufficient for application as a solution for SoC design because of the shortage of time caused by silicon delivery.

In the area of PDN simulation, there are two main research directions: frequency-domain (FD) analysis and time-domain (TD) analysis. For FD analysis, Larry Smith was the first to propose the concept of "target impedance" [62]. Many studies have been extended based on this concept [37, 54, 62, 63, 56]. Kim et al. [37] proposed a design methodology for optimized power distribution networks based on frequency-domain PDN resonance information. His method applies to high Q (quality factor) LC tank model without equivalent series resistance (ESR) considered. Kim et al. [34] gave a closed-form expression for supply noise caused by IC switching current for a PDN structure. Sun and Smith [60, 61] proposed a method to systematically characterize on-chip PDN noise and generate a worst-case current pattern. However, none of the methods has been able to derive the worst-case PDN noise from system level because such methodology assumes that there is a limit on PDN noise as long as the design is below target impedance in impedance profile. In one of our works, we demonstrate that there is no limit
on the ratio of worst-case noise to the target impedance [73], as the shape of the impedance profile also matters.

TD provides a more realistic PDN noise analysis as the worst-case load current may not happen at all. For example, there is a lot of fast-transient load on a typical CPU current load, while a GPU current profile tends to have more low frequency content as the rise and fall time are much longer. TD analysis is widely used in design verification. Such research topics focus on finding worst-case noise based on intensive simulation. For the simulation-based verification approach, one needs to know all possible current waveforms drawn by the circuits. The requirement of a complete set of possible current stimuli makes the simulation-based approach intractable, especially for large designs. Moreover, PDN verification must be signed off at an early design stage, when full knowledge of load currents is hardly available due to PVT (process, voltage and temperature) variations. Ghani and Najm [26, 23] found a vectorless approach to obtain the upper bound of the worst-case noise without any simulation based on given load current constraints. Zhuang and Cheng [77] proposed a distributed framework for transient simulation of power distribution network, which utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit.

1.3 Dissertation Outline

Chapter 2 introduces the background of system-level power distribution networks. The basic concepts of power delivery and the modeling and analysis of PDNs are briefed. An overview of power line communication was given.

Chapter 3 analyzes the mathematical relation between the time-domain voltage response and the frequency-domain impedance of PDN and discuss the
closed-form expressions of the maximum ratio for the series RL/RC circuit and LC tank cases in PDN structures. A method is proposed to predict the worst-case noise of the complete PDN path through cascaded LC tank model. The relation of on-die leakage resistance and PDN performance is also discussed.

Chapter 4 proposes a prediction of the worst-case noise area of the supply voltage on PDN. Previous works focus on the worst-peak drop to sign off PDN. In this chapter, we (1) compare the behavior of circuit delay over the worst-area and the worst-peak noise (2) study the different PDN models with theoretical derivation (3) develop an algorithm to generate the worst-case current for general PDN cases. Experimental results show that the worst-area noise induces an average 18% additional delay than that of the worst-peak noise.

Chapter 5 demonstrates power line communication (PLC) on a industrial SOC PDN. We propose to reuse some of the power pins as dynamic power/signal pins for off-chip data transmissions to increase the off-chip bandwidth during SOC low performance state. The performance of PLC model and the impact to PDN are investigated. The parasitic capacitance of the power gating switches is studied in the model. We also study the receiver channel equalization to improve channel performance.

Chapter 6 introduces Inter-Package Capacitive Proximity Communication to boost off-chip communication through the metal plates on the side wall of the package. The proposed architecture can transmit 20Gbps data on each channel and provide immunity to the coupling noise from adjacent channel, without adding additional cost or reliability The performance and design area trade-off is also discussed.

Chapter 7 concludes the dissertation by summarizing the main contributions. Future research directions are also discussed.
Chapter 2

Background on Power Distribution Networks

2.1 Power Distribution Network Basics

A power distribution network (PDN) is a network to supply power to high performance system level circuit design. The system supplying power to an IC can greatly affect the performance, size, and cost characteristics of the overall electronic system. The PDN may consist of a voltage regulator module (VRM), on-die load, board/package parasitics and on-die power grid with decoupling capacitors as shown in Figure 2.1. A VRM can be a buck/boost converter or LDO, which depends on the tradeoff between the noise requirement of the load and power efficiency of the system.

During chipset design stage, architects need to take into account the power network parameters from regulator, board, package to chip level. Lumped model is widely used in system level PDN analysis [55, 62, 34, 67]. As shown in Figure 2.2, a typical PDN can be represented by multi-stage cascaded LC tanks. Since each
RLC tank has one anti-resonance peak, multiple impedance peaks are observed from impedance profile. The overall worst-case noise is a cumulative effect of multiple anti-resonance peaks [73]. Thus, a clear understanding of single LC tank circuit effect becomes extremely important.

For simplicity, VRM is represented as a DC source, which is equivalent to AC short in impedance profile for PDN analysis. For complicate industrial designs, designers must also consider static load/line regulation, dropout voltage and power supply rejection ratio (PSRR) of the regulators into PDN design margins. The power load is modeled as time-variable current source \( i(t) \). The interconnect lines that connecting the supply and the load are not ideal, which includes DC resistance and loop inductance on power and ground traces from PCB, package and die level. Resistive IR voltage drops \( \Delta V_R = IR \) and inductive switching voltage drops \( \Delta V_L = L \frac{di(t)}{dt} \) develop across the parasitic interconnect impedances, as the load sinks current \( I(t) \) from PDN. Therefore, the voltage levels across the load terminals change from \( V_{dd} \) at the source to \( V_{dd} - IR - L \frac{di(t)}{dt} \). Note that \( R = R_p + R_g \) and \( L = L_p + L_g \), where \( R_p, L_p \) and \( R_g, L_p \) are the resistance and inductance of power and ground respectively. To mitigate the power supply noise, decoupling capacitors
are added in different level of the designs to counteract the impedance increase causing by the parasitic inductances. $\Delta V_R$ cannot be mitigated by decoupling capacitors, however, remote feedback at the load for buck regulators is widely applied in industry to compensate $\Delta V_R$. Figure 2.3 shows a typical GPU load current profile from one power pin. [39].

Figure 2.4 shows a PDN with two-stage RLC tank. Following the normal PDN distributions, we assume that $L_1 \gg L_2$ and $C_1 \gg C_2$, as we have $L_{brd} \gg L_{pkg} \gg L_{die}$ and $C_{brd} \gg C_{pkg} \gg C_{die}$ for typical PDNs. In Eq. 2.1 we define $\omega_a$ and $\omega_b$ to be the two resonant frequencies, $Q_a$ and $Q_b$ to be the quality factors of the low-frequency and high-frequency tank respectively. The contribution of each circuit component to the impedance profile is labelled in Figure 2.5.

\[
\begin{align*}
\omega_a &= \frac{1}{\sqrt{L_1C_1}} \ll \omega_b = \frac{1}{\sqrt{L_2C_2}} \\
Q_a &= \frac{1}{R_1+R_3}\sqrt{\frac{L_1}{C_1}} \\
Q_b &= \frac{1}{R_2+R_3+R_4}\sqrt{\frac{L_2}{C_2}}
\end{align*}
\] (2.1)
Figure 2.3: A CPU load current profile measured on various power pins are shown in (a). The current spectrum for VDD pin 60 is shown in (b). [39]
Figure 2.4: A PDN with two-stage RLC tanks.

Figure 2.5: The impedance profile of a PDN with two-stage RLC tank.
2.2 Power Distribution Network Noise

Power supply noise, caused by the static and dynamic switching current, adversely affects the operation of an integrated circuit through several mechanisms. First, the propagation delay of on-chip signal transmission depends on the power supply voltage, as $i_{ds}$ increases with $V_{gs}$. When the power supply voltage is reduced due to power supply variations, $V_{gs}$ of the NMOS and PMOS transistors decreases, lowering the output current ($i_{ds}$) of the transistors. The signal delay increases accordingly as compared to the delay under a nominal power supply voltage. Conversely, a higher power voltage and a lower ground voltage shortens the propagation delay. Consequently, power supply noise limits the maximum operating frequency of an integrated circuit. We will discuss more on this topic in Chapter 4. Second, clock jitter increases as power supply noises increase. There are two types of clock jitters caused by power supply noise, e.g. cycle-to-cycle jitter and peak-to-peak jitter. Many research works have been studied in this area [66].

2.3 Power Distribution Network Applications

The Application Processor (AP or SOC) of a typical consumer electronic device allocates half of its BGA balls and PCB planes for power delivery. As a result, off-chip communication bandwidth is limited by number of pins and layers that signal can be routed. Furthermore, high-speed signaling also requires solid reference planes for controlled impedance. One reference plane for microstrip, and two reference plane for striplines, resulting in less available traces available for signaling. As we know, the increasing usage for memory-intensive applications such as web service, database, machine/deep learning (ML/DL) and camera applications have forced computer architects to focus on ML/DL specific ASIC design. As
showed in Figure 2.6, "Memory Wall", which describes the disparity between the rate of core performance improvement and the relatively stagnant rate of off-chip memory bandwidth, keeps increasing as more transistors can be designed onto a single chip due to the advance of process node (Figure 2.7). The intuitive solution for this problem is to provide more chip pins and routing channels for off-chip data communication. However, Figure 2.8 shows that the package size of SOCs remains similar as more functions are added to the silicon die and PCB manufacturing technology has been moderately improved, e.g., BGA ball to ball pitches are reduced from 0.4mm in 2012 to 0.3mm in 2016 in industry. As a result, we have proposed to use PDN for data communication during SOC low performance state in Chapter 5, and Capacitive Communication in Chapter 6.
Figure 2.7: The projection for the trend of the silicon process technology advancement. (Courtesy of ITRS)
Figure 2.8: The projection for package technology advancement. (Courtesy of Steve Bezuk)
Chapter 3

Ratio of the Worst-Case Noise and the Impedance of Power Distribution Network

The classic method of designing power distribution networks (PDNs) is to control the target impedance across a broad frequency range. This methodology is based on the assumption that there is an upper bound for the ratio of the time-domain maximum output voltage noise to the product of target impedance and time-domain maximum input current. In this chapter, we analyze the mathematical relation between the time-domain voltage response and the frequency-domain impedance of PDN. We present the closed-form expressions of the maximum ratio for the series RL/RC circuit and LC tank cases in PDN structures. We observe that the maximum ratio for LC tank case is 1.5. Our results show that the worst-case noise is not only determined by target impedance, but also depended on the shape of the output impedance profile. A complete PDN path is demonstrated with the worst-case ratio of greater than 1. We further propose a method to pre-
dict the worst-case noise of the complete PDN path. The average prediction error of the proposed method is 7% under different PDN cases.

3.1 Background

In PDN design, the target design objective is set as the time-domain supply noise amplitude. A typical range is 5% of nominal voltage for the digital systems. One of the most widely adopted PDN design methodologies is to follow the concept of target impedance of the PDN so that its output impedance is no larger than this target impedance over the whole operation frequency range [37, 54, 62, 63, 56]. The target impedance in the frequency-domain is expressed in terms of the current and target voltage tolerance in time-domain as follows [37]:

$$Z_{\text{target}}(\omega) = \frac{\text{(power supply noise)} \times \text{(allowed ripple)}}{\text{current}}, \quad (3.1)$$

where current is the average current flowing through the PDN. Let $V_{\text{max}}$, $Z_{\text{max}}$, and $I_{\text{max}}$ denote the maximum magnitude of the worst-case PDN voltage noise $v(t)$, the maximum magnitude of the PDN output impedance $Z(\omega)$, and the maximum magnitude of the time-domain input current $i(t)$, respectively, i.e.,

$$V_{\text{max}} = \max_t |v(t)|, \quad (3.2)$$

$$Z_{\text{max}} = \max_\omega |Z(\omega)|, \quad (3.3)$$

$$I_{\text{max}} = \max_t |i(t)|. \quad (3.4)$$

The assumption behind Eq. 3.1 is that $V_{\text{max}}$ is less than the product of $Z_{\text{max}}$.
and $I_{\text{max}}$, i.e., the ratio

$$\gamma = \frac{\text{(power supply noise)} \times (\text{allowed ripple})}{Z_{\text{target}}(\omega) \times (\text{current})} = \frac{V_{\text{max}}}{Z_{\text{max}} \times I_{\text{max}}} \quad (3.5)$$

is no more than 1.

Eq. 3.5 is based on Ohm’s law. However, since $V_{\text{max}}$ and $I_{\text{max}}$ are functions of time and $Z_{\text{max}}$ is a function of frequency, such assumption does not necessarily hold and the ratio $\gamma$ may be larger than 1. Thus, the frequency-domain design approach may lead to a PDN design with larger power supply noise than expected value. For example, if $\gamma = 1.5$ and 5% of the allowed supply voltage ripple, the actual maximum noise of the designed PDN is 7.5%, i.e., a 50% more than expected ripple.

Several works have been performed which are related to the time-domain and the frequency-domain response of PDN analysis [37, 34, 36, 19, 26, 60, 24]. Kim et al. [37] proposed a design methodology for optimized power distribution networks based on frequency-domain PDN resonance information. His method applies to high (quality factor) LC tank model without equivalent series resistance (ESR) considered. Kim et al. [34] gave a closed-form expression for supply noise caused by IC switching current for a PDN structure. Drabkin et al. [19] presented a method of generating the worst-case PDN voltage noise based on the superposition of step responses. Ghani and Najm [26] found a vectorless approach to obtain the upper bound of the worst-case noise without any simulation based on given load current constraints. Sun and Smith [60] proposed a method to systematically characterize on-chip PDN noise and generate a worst-case current pattern. However, none of these works provides a quantitative analysis on the relation between the worst-case peak PDN voltage noise and the peak value of its impedance.
magnitude.

In this chapter, we propose a method to analyze the ratio $\gamma$ of the maximum time-domain voltage noise and the peak amplitude of the frequency-domain impedance profile. We give the exact upper bound of the ratio in LC tank cases instead of the approximations given by [37, 34, 60]. We prove that for a standard LC tank case in PDN structure, $\gamma$ is no more than 1.5.

### 3.2 Problem Formulation

In this section, we formulate the problem as to maximize the ratio $\gamma$ in a general PDN system. The ratio $\gamma$ is proportional to the worst-case peak voltage noise $V_{max}$ in time domain over the peak impedance $Z_{max}$ in frequency domain. Without loss of generality, the upper bound of load current $I_{max}$ is set to 1 throughout this chapter. Therefore, the problem formulation can be described as

$$\max \quad \gamma = \frac{V_{max}}{Z_{max}},$$  \hspace{1cm} (3.6)

$$s.t. \quad 0 \leq i(t) \leq 1, \ \forall t \geq 1.$$  \hspace{1cm} (3.7)

In the following section, we analyze the output impedance of system $Z(s)$ in s-domain as Fourier transform is equivalent to Laplace transform ($Z(s) = Z(\omega)$) when $s = j\omega$. $Z(s)$ can be distinguished by two categories: $Z(s)$ without passive realizability constraints and $Z(s)$ with passive realizability constraints, based on the location of poles and zeros of the system. Unless an active voltage regulator module is included, a PDN can be usually modeled as a passive RLC network [34]. We focus on $Z(s)$ with passive realizability in this chapter.
3.2.1 Worst-Case PDN Voltage Noise

The first step to find the maximum ratio $\gamma$ is to generate the worst-case PDN voltage noise $V_{\text{max}}$. One method to find $V_{\text{max}}$ is from the convolution of the impulse responses method. The PDN system $Z(s)$ is characterized by its impulse response $h(t)$ in time domain. Load current $i(t)$ is caused by circuit activities. Therefore, the voltage noise $v(t)$ is written as the convolution of $h(t)$ and $i(t)$, i.e.,

$$v(t) = \int_{0}^{\infty} h(\tau)i(t - \tau) d\tau$$

(3.8)

Since $i(t)$ is bounded in Eq. 3.7, the maximum voltage noise, $\max_{t} |v(t)|$, can be generated by setting $i(t - \tau) = 1$ when $h(\tau) \geq 0$ and $i(t - \tau) = 0$ when $h(\tau) < 0$. If we set time $t = T$ is long enough, i.e., $h(t) \approx 0$ when $t > T$, we can calculate the worst-case noise,

$$V_{\text{max}} = \max_{t} |v(T)|.$$  

(3.9)

Drabkin et al. proposed another method of creating the worst-case PDN voltage noise in [19]. This method is based on the superposition of the step responses, corresponding to the worst-case generation method based on impulse response discussed above. Let us assume that the unit step response of a PDN is $v_{u}(t)$. The idea is to overlay all the local maximums $V_{Mi}$ and local minimums $V_{mi}$ of the step response at the same point. The resultant input pattern is the superposition of many reverse time-shifted step inputs and time-shifted step inputs. The value "1" of the input covers the increasing period of the step response and the value "0" of the input covers the decreasing period of the step response. It can be proved that the method proposed in [19] generates the worst-case output voltage noise.
Lemma 1  Given a linear PDN with step response of $v_u(t)$ and the input current is bounded, i.e., $0 \leq i(t) \leq 1$, the worst-case PDN voltage noise can be generated by the superposition of step responses,

$$V_{\text{max}} = \sum_{i=1}^{N} (V_{M_i} - V_{m_i}) + V(\infty),$$  \hspace{1cm} (3.10)

where $V_{M1}, V_{M2}, \ldots, V_{MN}$ denote the local maximums of $v_u(t)$; $V_{m1}, V_{m2}, \ldots$, and $V_{mN}$ denote the local minimums of $v_u(t)$; and $V(\infty)$ denotes the stabilized IR drop when $i(t) = 1$.

Proof 1  Lemma 1 can be proved by observing that the input current $i(t)$ is bounded and the impulse response is the derivative of the step response. The local maximums and minimums of the step response correspond to the positive/negative areas of the impulse response.

We can set the impulse response of the system as $h(t) \geq 0$, when $t \in [0, t_1] \cup [t_2, t_3] \cup \ldots \cup [t_{2n}, t_{2n+1}]\ldots$ and $h(t) \leq 0$, when $t \in [t_1, t_2] \cup [t_3, t_4] \cup \ldots \cup [t_{2n-1}, t_{2n}]\ldots$

Since the step response $v_u(t)$ is the time integral of the impulse response, i.e.,

$$v_u(t) = \int_{0}^{t} h(\tau) d\tau,$$  \hspace{1cm} (3.11)

the local maximums/minimums of the step response can be expressed as follows,

$$
\begin{align*}
V_{M_i} &= \int_{0}^{t_{2i-1}} h(\tau) d\tau, \\
V_{m_i} &= \int_{0}^{t_{2i}} h(\tau) d\tau,
\end{align*}
$$  \hspace{1cm} (3.12)

where $i = 1, 2, \ldots, n$. 

From Eq. 3.8 and 3.9, the worst-case noise can be found as,

\[ V_{\text{max}} = \int_{0}^{t_1} h(\tau) \times 1 d\tau + \int_{t_1}^{t_2} h(\tau) \times 1 d\tau + \int_{t_3}^{t_4} h(\tau) \times 1 d\tau + \cdots \]

\[ = \int_{0}^{t_1} h(\tau) d\tau - \int_{0}^{t_2} h(\tau) d\tau + \int_{0}^{t_3} h(\tau) d\tau - \int_{0}^{t_4} h(\tau) d\tau + \int_{0}^{t_5} h(\tau) d\tau - \cdots \]  

(3.13)

\[ = V_{M1} - V_{m1} + V_{M2} - V_{m2} + V_{M3} - \cdots \]

\[ = \sum_{i=1}^{N} (V_{Mi} - V_{mi}) + V(\infty). \]

When \( N \) is sufficiently large, we have \( V(\infty) \approx V_{MN} \approx V_{mN} \). Thus, we prove Lemma 1.

### 3.2.2 Peak Output Impedance

The peak output impedance of PDN \( Z_{\text{max}} \) is calculated by setting the derivatives of \( |Z(\omega)| \) to zero and judge the sign of the second-order derivatives. If multiple anti-resonance peaks exist in the impedance profile, \( i.e., Z_{\text{peak1}}, Z_{\text{peak2}}, \ldots, Z_{\text{peakn}}, \)

\[ Z_{\text{max}} = \max(Z_{\text{peak1}}, Z_{\text{peak2}}, \ldots, Z_{\text{peakn}}). \]  

(3.14)

Plugging Eq. 3.10 and 3.14 into Eq. 3.6, we can find the ratio \( \gamma \) for a given PDN.

### 3.3 Maximum Ratio \( \gamma \) in Series RL/RC Circuits and Standard LC Tanks

In this section, we discuss the maximum ratio \( \gamma \) of two basic PDN models. The transfer function of the PDN models is passive realizable function as an
impedance if and only if it is a rational positive real function of $s$. A function of $Z(s)$ is positive real (p.r.) if the following conditions are satisfied [44]:

- $Z(s)$ is real for real $s$ and is a ratio of polynomials in $s$.
- $Re[Z(s)] \geq 0$ for all positive real $s$.
- All the poles and zeros of $Z(s)$ are in the left half plane, with any poles on the imaginary axis being simple and having positive residues.

Two PDN models listed below are addressed in this section, which are the critical components for PDN designs. One is series RL/RC circuit, and the other is standard LC tank.

### 3.3.1 Series RL/RC Circuit

Series RL/RC circuit can be modelled as a first-order impedance function. In this subsection, we show the upper bound of $\gamma$ for the first-order impedance function.

**Theorem 1** For a first-order system function $Z(s)$ of a passive network, $\gamma$ is always 1.

**Proof 2**

$$Z(s) = \frac{k}{s - p}, \quad (3.15)$$

or

$$Z(s) = \frac{s - z}{s - p}, \quad (3.16)$$

where $k$ is a constant, $z$ and $p$ are the zero and the pole of the system respectively.

To satisfy the realizability constraints, $k \geq 0$, $z \leq 0$ and $p \leq 0$.

(a) For $Z(s)$ expressed by Eq. 3.15, the magnitude of $Z(s)$ as a function of
\( \omega \) can be written as

\[
|Z(\omega)| = k \sqrt{\frac{1}{\omega^2 + p^2}}. \tag{3.17}
\]

Its step response is represented as

\[
v_u(t) = -\frac{k}{p} (1 - e^{pt})u(t). \tag{3.18}
\]

From Eq. 3.17 and 3.18, since we observe that \( v_u(t) \) increases with \( t \) increasing and \( |Z(\omega)| \) decreases with \( \omega \) increasing. Therefore,

\[
V_{max} = Z_{max} = -\frac{k}{p}. \tag{3.19}
\]

(b) For \( Z(s) \) from Eq. 3.16, the magnitude of \( Z(s) \) with frequency can be expressed as follows,

\[
|Z(\omega)| = \sqrt{1 + \frac{z^2 - p^2}{\omega^2 + p^2}}. \tag{3.20}
\]

and its step response is

\[
v_u(t) = \left[ \frac{z}{p} + (1 - \frac{z}{p})e^{pt} \right]u(t), \tag{3.21}
\]

where \( u(t) \) is the unit step response. Similarly, \( |Z(\omega)| \) and \( v_u(t) \) monotonically decreases and increases as \( \omega \) or \( t \) increases respectively. We have \( V_{max} = Z_{max} = \max(1, \frac{z}{p}) \).

In summary, \( V_{max} = Z_{max} \) for both \( Z(s) \) cases. Thus, the ratio \( \gamma \) of the first-order impedance function is

\[
\gamma = 1. \tag{3.22}
\]
3.3.2 Standard LC Tank with $ESR_c$

In this subsection, the maximum ratio $\gamma$ for the standard LC tanks with $ESR_c$ in real-case PDN structures is analyzed. In addition, we extend the study to two special LC tank cases. Fig. 3.1 shows a standard LC tank with $ESR_c$. $R_1$ and $L$ are to model the parasitic resistance and inductance of the PDN interconnects. $C$ is to model the decoupling capacitors. A resistor $R_2$ is placed in series with $C$ to consider the effect of $ESR_c$. The output impedance of the LC tank with $ESR_c$ can be written as

$$Z(s) = \frac{s^2LCR_2 + s(R_1R_2C + L) + R_1}{s^2LC + S(R_1 + R_2)C + 1}.$$  \hspace{1cm} (3.23)

The quality factor $Q$ is expressed as

$$Q = \frac{1}{R_1 + R_2 \sqrt{\frac{L}{C}}}.$$  \hspace{1cm} (3.24)

The natural frequency is defined as

$$\omega_0 = \frac{1}{\sqrt{LC}}.$$  \hspace{1cm} (3.25)
$Z(s)$ can be rewritten to

$$Z(s) = \frac{R_2(s + \frac{\omega_0}{Q_1})(s + \omega_0Q_2)}{s^2 + \frac{\omega_0}{Q}s + \omega^2},$$

(3.26)

where $Q_1 = \frac{1}{R_1} \sqrt{\frac{L}{C}}$, $Q_2 = \frac{1}{R_2} \sqrt{\frac{L}{C}}$. The maximum ratio $\gamma$ for the standard LC tank case is given in Theorem 2.

**Theorem 2** For the standard LC tank as shown in Fig. 3.1, the maximum ratio $\gamma$ is 1.5.

Theorem 2 can be further described as Lemma 2 and Lemma 3 upon the sign of the discriminant $\Delta$ of the denominator of $Z(s)$ (or equivalently, the value of the quality factor $Q$).

**Lemma 2** For the overdamped or critically damped LC tank ($Q \leq 0.5$) as shown in Fig. 3.1, the maximum ratio $\gamma$ is 1.5.

**Lemma 3** For the underdamped LC tank ($Q > 0.5$) as shown in Fig. 3.1, the maximum ratio $\gamma$ is 1.05.

**Proof 3** According to the discriminant $\Delta$ of the denominator of $Z(s)$, we divide the problem into two cases, i.e. $\Delta \geq 0$ or $\Delta < 0$.

i) When $\Delta \geq 0$ ($Q \leq 0.5$), all zeros and poles are real numbers and the LC tank is overdamped or critically-damped. From the relative locations of zeros and poles in the left half plane, we can conclude that

$$Z_{max} = \max(R_1, R_2).$$

(3.27)
To calculate $V_{\text{max}}$, we first derive the step response $v_u(t)$ from inverse Laplace transform of $\frac{Z(s)}{s}$. By using partial fraction expansion method, we find $v_u(t)$,

$$v_u(t) = k_1 + k_2 e^{p_1 t} + k_3 e^{p_2 t}, \quad (3.28)$$

where

$$\begin{cases}
  k_1 = R_1, \\
  k_2 = R_2 \frac{-\frac{1}{2} + \frac{1}{2} \sqrt{1 - 4Q^2} + \frac{Q}{Q_1} + QQ_2 + \frac{R_1}{R_2} (\frac{1}{2} - \frac{1}{2} \sqrt{1 - 4Q^2})}{\sqrt{1 - 4Q^2}}, \\
  k_2 = R_2 \frac{-\frac{1}{2} - \frac{1}{2} \sqrt{1 - 4Q^2} + \frac{Q}{Q_1} + QQ_2 + \frac{R_1}{R_2} (\frac{1}{2} + \frac{1}{2} \sqrt{1 - 4Q^2})}{-\sqrt{1 - 4Q^2}},
\end{cases} \quad (3.29)$$

and

$$\begin{cases}
  p_1 + p_2 = -\frac{\omega_0}{Q}, \\
  p_1 p_2 = \omega_0^2.
\end{cases} \quad (3.30)$$

From the relative locations of zeros and poles, there is a local minimum for the step response $v_u(t)$. The local minimum monotonically decreases as $Q$ decreases. When $Q \to 0$, we observe the smallest local minimum. Since we focus on the maximum ratio $\gamma_{\text{max}}$, we simplify Eq. 3.29 by setting $Q \to 0$,

$$\begin{cases}
  k_1 = R_1, \\
  k_2 = -\frac{R_1^2}{R_1 + R_2}, \\
  k_3 = \frac{R_2}{R_1 + R_2}.
\end{cases} \quad (3.31)$$

We define $t_0$ as the time when local minimum $v_{\text{min}}$ of $v_u(t)$ occurs, which
can be solved by setting the derivative of \( v_u(t) \) to be zero. Thus,

\[
t_0 = \frac{1}{p_1 - p_2} \ln \left[ \frac{(p_2 - z_1)(p_2 - z_2)}{(p_1 - z_1)(p_1 - z_2)} \right].
\] (3.32)

When \( Q \to 0 \), we substitute Eq. 3.31 and 3.32 into Eq. 3.28 and have

\[
v_{\text{min}Q \to 0} = \frac{R_1 R_2}{R_1 + R_2},
\] \hspace{1cm} (3.33)

and

\[
v_{\text{min}Q \to 0} < v_{\text{min}Q \neq 0}.
\] \hspace{1cm} (3.34)

From Eq. 3.28-3.34, we notice that the worst-case noise \( V_{\text{max}} \) occurs at \( Q \to 0 \),

\[
v_{\text{max}Q \to 0} = v_u(0) - v_{\text{min}Q \to 0} + v_u(\infty) \geq v_u(0) - v_{\text{min}Q \neq 0} + v_u(\infty) = V_{\text{max}Q \neq 0}.
\] \hspace{1cm} (3.35)

where \( v_u(0) = R_2, v_u(\infty) = R_1 \). Thus, we have

\[
V_{\text{max}} = R_1 + R_2 - \frac{R_1 R_2}{R_1 + R_2}.
\] \hspace{1cm} (3.36)

Combining Eq. 3.27 and 3.36, the ratio \( \gamma \) is

\[
\gamma = \frac{V_{\text{max}}}{Z_{\text{max}}} = \frac{R_1 + R_2 - \frac{R_1 R_2}{R_1 + R_2}}{\max(R_1, R_2)}.
\] \hspace{1cm} (3.37)

Since Eq. 3.37 is symmetric in terms of \( R_1 \) and \( R_2 \), we hereby assume \( R_1 \geq R_2 \).

Thus, Eq. 3.37 can be expressed as

\[
\gamma = 1 + \frac{1}{\psi + \psi^2}, \text{ while we define } \psi = \frac{R_1}{R_2} \geq 1.
\] \hspace{1cm} (3.38)
Therefore, we have the maximum ratio

\[ \gamma_{\text{max}} = 1.5, \text{ when } R_1 = R_2 \text{ and } Q \to 0. \] (3.39)

ii) When \( \Delta < 0 \) (\( Q > 0.5 \)), the poles are complex numbers and the LC tank is underdamped. \( v_u(t) \) of an underdamped LC tank can be expressed as follows,

\[ v_u(t) = K_1 + e^{-\alpha t}(K_2 e^{\beta t} + K_2^* e^{-\beta t}), \] (3.40)

or

\[ v_u(t) = K_1 + 2e^{-\alpha t}[A \cos \beta t - B \sin \beta t], \] (3.41)

where

\[
\begin{align*}
\alpha &= \frac{\omega_0}{2Q}, \\
\beta &= \sqrt{\omega_0^2 - \left(\frac{\omega_0}{2Q}\right)^2}, \\
K_1 &= \left.sH(s)\right|_{s=0} = \frac{R_2 \ast \frac{\omega_0}{Q_1} \ast \omega_0 Q_2}{\omega_0^2} = R_1, \\
K_2 &= \left.(s + \alpha - j\beta)H(s)\right|_{s=\alpha+j\beta} = \frac{R_2(\alpha + j\beta + \frac{\omega_0}{Q_1})(-\alpha + j\beta + \omega_0 Q_2)}{(-\alpha + j\beta) \ast 2j\beta}, \\
K_2^* &= A - j \ast B.
\end{align*}
\] (3.42)

After simplifying the results, we have

\[ A = \frac{R_2}{2} \left(1 - \frac{R_1}{R_2}\right) = \frac{1}{2}(R_2 - R_1), \] (3.43)

\[ B = R_2 \frac{1}{2Q} \left(1 + \frac{Q_2}{Q_1}\right) - \left(Q_2 + \frac{1}{Q_1}\right) \frac{1}{2\sqrt{1 - \frac{1}{4Q^2}}}. \] (3.44)
Figure 3.2: The step response of an underdamped LC tank with ESRc (Q > 0.5). (a) The first local extremum is a peak. (b) The first local extremum is a valley.

By equating the derivative of Eq. 3.41 to zero, we calculate the time $t_k$ where local extrema of $v_u(t)$ occur,

$$t_k = \begin{cases} 
\frac{1}{\beta} (\arctan \frac{\beta + A_0}{B_0 - A_0} + k\pi), & k = 0, 1, \ldots, \frac{\beta + A_0}{B_0 - A_0} \geq 0, \\
\frac{1}{\beta} (\arctan \frac{\beta + A_0}{B_0 - A_0} + k\pi), & k = 1, 2, \ldots, \frac{\beta + A_0}{B_0 - A_0} < 0,
\end{cases}$$

(3.45)

where $\frac{\beta + A_0}{B_0 - A_0} = \frac{\left(\frac{Q_2}{Q_1} - \frac{1}{Q_1}\right)\sqrt{1 - \frac{1}{Q_1^2}}}{\frac{1}{Q_1^2} + \frac{1}{Q_2^2} - 2}$. By plugging back into Eq. 3.41, the local extrema $v_{ek}$ of $v_u(t)$ are

$$v_{ek} = R_1 + 2e^{-\frac{\omega}{2\alpha}t_k}\left[\frac{R_2 - R_1}{2} \cos \beta t_k - \frac{1}{2\beta} \left(1 + \frac{Q_2}{Q_1}\right) - \left(Q_2 - \frac{1}{Q_1}\right) \sin \beta t_k\right].$$

(3.46)

From Eq. 3.46, the voltage peaks and valleys of the step response can be expressed as

**PEAKs**: $V_{M_i} = v_{ek}$, when $v_{ek} > R_1$,  

(3.47)

**VALLEYs**: $V_{m_i} = v_{ek}$, when $v_{ek} < R_1$.  

(3.48)

Based on the sequences of voltage peaks and valleys in time-domain, $V_{max}$
is determined from the following two cases.

a. The first extremum is a local maximum as shown in Fig. 3.2(a). In this case, $V_{\text{max}}$ can be obtained from Eq. 3.10,

$$V_{\text{max}} = \sum V_{Mi} - \sum V_{mi} + v_u(\infty).$$  \hspace{1cm} (3.49)

b. The first extremum is a local minimum as shown in Fig. 3.2(b). In this case, $V_{\text{max}}$ can be obtained from the following expressions.

$$V_{\text{max}} = v_u(0) + \sum V_{Mi} - \sum V_{mi} + v_u(\infty).$$  \hspace{1cm} (3.50)

Eq. 3.50 still satisfies Eq. 3.10, as $v_u(0)$ can be considered as one local maximum $V_{Mi}$.

$Z_{\text{max}}$ is calculated by solving $|Z(\omega)| = 0$. We hereby define $y = \left(\frac{\omega}{\omega_0}\right)^2$ and Eq. 3.26 is changed to,

$$|Z(\omega)|^2 = |Z(y)|^2 = \frac{y^2 + y(1 - \frac{1}{Q_1^2} + \frac{1}{Q_2^2})}{y^2 + y\left(\frac{1}{Q_1} - \frac{1}{Q_2}\right)^2 - 2} + 1.$$  \hspace{1cm} (3.51)

We set $y = y_0$ as the solution of $\frac{d|Z(y)|}{dy} = 0$, so $y_0$ can be expressed as,

$$y_0 = \frac{\sigma_1 + Q_1^4 Q_2^2 - Q_1^2 Q_2^4 - Q_1 Q_2 \sigma_1}{-Q_1^2 Q_2^4 - 2 Q_1^4 Q_2^2 + Q_1^4 + 2 Q_1^2 Q_2^4},$$  \hspace{1cm} (3.52)

where $\sigma_1 = \sqrt{Q_1^6 Q_2^6 + 2 Q_1^4 Q_2^4 + 2 Q_1^2 Q_2^2 + 2 Q_1^4 Q_2^2 + 2 Q_1^4 Q_2^4 + 5 Q_1^4 Q_2^4 + 2 Q_1^4 Q_2^4}$. Apparently, $y$ is non-negative value from its definition. However, $y_0$ is not always positive in Eq. 3.52. Therefore, the peak impedance can be analyzed into two cases upon the sign of $y_0$.

When $y_0 \geq 0$, there is an extremum in the frequency domain, where $Z_{\text{max}}$
Figure 3.3: Impedance magnitude sweep of an underdamped LC tank with $ESR_c$, when $y_0 > 0$. The peak occurs at $Z_{max} = |Z(y_0)|$. 
Figure 3.4: Impedance magnitude sweep of an underdamped LC tank with ESR, when \( y_0 < 0 \). (a) \( Z_{\text{max}} = Z(0) \), (b) \( Z_{\text{max}} = Z(\infty) \).

\[
Z_{\text{max}} = |Z(y_0)|. 
\]  
(3.53)

Fig. 3.3 shows the impedance magnitude sweep when \( y_0 \geq 0 \).

When \( y_0 < 0 \), \( Z_{\text{max}} \) monotonically increases or decreases in the frequency domain. Therefore, we have

\[
Z_{\text{max}} = \max(Z(0), Z(\infty)). 
\]  
(3.54)

Fig. 3.4 shows the impedance magnitude sweep when \( y_0 < 0 \).

Combining the above analyses on different cases of \( V_{\text{max}} \) and \( Z_{\text{max}} \), the bound of \( \gamma \) for an underdamped LC tank is summarized in Fig. 3.8 as

\[
\frac{2}{\pi} < \gamma \leq 1.05. 
\]  
(3.55)

The underdamped LC tank is particularly of interest as it is commonly observed in industrial PDN designs. Fig. 3.5 shows the bound of \( \gamma \) as contour lines. It can be inferred that \( Q_2 > Q > 0.5 \) from their definitions. We observe that
γ_{max} \approx 1.05 \text{ when } Q = 0.66 \text{ and } Q_2 = 0.67. \text{ When } Q \to \infty, \gamma \to 2/\pi.

We further analyze the analytical solution of V_{max}, Z_{max} \text{ and } \gamma \text{ under two special LC tank cases such as (1) } R_1 = R_2 \text{ and (2) } R_2 = 0.

(1) When \( R_1 = R_2 = R \), the expressions of \( Z_{max} \) and \( V_{max} \) are listed below.

\[
Z_{max} = \begin{cases} 
R & : Q \leq 0.5, \\
RQ\left(\frac{1}{2Q} + 2Q\right) & : Q > 0.5,
\end{cases} \tag{3.56}
\]

and

\[
V_{max} = \begin{cases} 
R + R \sqrt{\frac{1}{4Q^2}} - 1e^{-\frac{1}{\sqrt{1-4Q^2}}}ln\left(\frac{1}{2Q} + \sqrt{\frac{1}{4Q^2} - 1}\right) & : Q < 0.5, \\
R & : Q = 0.5, \\
R + R(2Q - \frac{1}{2Q})e^{-\frac{1}{\sqrt{4Q^2 - 1}}}arctan(\sqrt{4Q^2 - 1}) & : Q > 0.5.
\end{cases} \tag{3.57}
\]

The ratio \( \gamma \) from the results of Eq. 3.56 and 3.57 is shown in Fig. 3.6. We
Figure 3.6: The ratio $\gamma$ versus the quality factor $Q$ (when $R_1 = R_2$)

notice that when $Q = 0.5$ and $R_1 = R_2 = \sqrt{L/C} = R$, $Z_{max}$ is flat throughout

the frequency domain and the voltage step response is a constant. Such LC tank

is called as a distortion-less system, and $\gamma$ is always one regardless of the input

current pattern.

2) When $R_2 = 0$, the LC tank is simplified to a three-element circuit as shown in Fig. 3.7.

The impedance profile of Fig. 3.7 can be determined from

$$Z(s) = \frac{sL + R}{s^2LC + sRC + 1}. \quad (3.58)$$

The expressions of $Z_{max}$ and $V_{max}$ of this LC tank case are listed below, where
\( Q = Q_1 \text{ as } Q_2 \to \infty, \)

\[ Z_{\text{max}} = \begin{cases} R & : Q \leq 0.6436, \\ \frac{R}{RQ^2} \sqrt{\frac{1}{2Q\sqrt{Q^2 + 2 - 2Q^2 - 1}}} & : Q > 0.6436, \end{cases} \quad (3.59) \]

and

\[ V_{\text{max}} = \begin{cases} R & : Q \leq 0.5, \\ R(1 + Q \frac{e^{\frac{\pi - \arctan \sqrt{4Q^2 - 1}}{\sqrt{4Q^2 - 1}}}}{1 - e^{\frac{\pi}{\sqrt{4Q^2 - 1}}}}) & : Q > 0.5. \end{cases} \quad (3.60) \]

Therefore,

\[ \gamma = \begin{cases} 1 & : Q \leq 0.5, \\ 1 + Q \frac{e^{\frac{\pi - \arctan \sqrt{4Q^2 - 1}}{\sqrt{4Q^2 - 1}}}}{1 - e^{\frac{\pi}{\sqrt{4Q^2 - 1}}}} & : 0.5 < Q \leq 0.6436, \\ 1 + Q \frac{e^{\frac{\pi}{\sqrt{4Q^2 - 1}}}}{1 - e^{\frac{\pi}{\sqrt{4Q^2 - 1}}}} \sqrt{\frac{1}{2Q\sqrt{Q^2 + 2 - 2Q^2 - 1}}} & : Q > 0.6436. \end{cases} \quad (3.61) \]

The curve of \( \gamma \) as a function of \( Q \) is shown in Fig. 3.8. We observe that the maximum ratio \( \gamma \approx 1.041, \) when \( Q = 0.687 \) for this special case.
Figure 3.8: The ratio $\gamma$ versus the quality factor $Q$ of a LC tank without $ESR_C$.

3.4 Case Study: A Complete Power Distribution Network Path

In this section, we analyze $V_{max}$ and $Z_{max}$ of a complete PDN path case, which includes VRM, board, package, on-chip power distribution, and decoupling capacitors (Fig. 3.9) [29]. The on-chip power grid model is lumped with the package model as a single port. The circuit model is extracted from a real PDN design by Sigrity PowerSI 16.61.

Our PDN model includes the output impedance of the VRM and the impedance of the current path from the VRM to bulk decaps (on-board), the impedance of the current path from bulk decaps to the on-package decaps, the impedance of the current path from on-package decaps to die and the on-chip
Figure 3.9: A complete PDN path is illustrated by a lumped cascaded LC tank model. A high order multi-stage PDN system can be approximate to three second-order LC tanks under different frequency regions.
Figure 3.10: The output impedance of a complete PDN path (a) Magnitude (b) Phase.
power grid. We consider the $ESR_c$ and equivalent series inductance ($ESL_c$) effect for bulk decaps and on-package decaps. For on-die decaps and their associated $ESR_c$, we include both the intrinsic capacitance of the non-switching transistors in a circuit and the dedicated decoupling capacitance. The $ESL_c$ of on-die decaps is negligible and not considered as the target frequency range of PDN is less than 10GHz. The switching of the load circuit is represented by the current source $i(t)$. The impedance between the on-die decap and the load current is ignored assuming that the decap is placed sufficiently close to the load circuit. The PDN noise is observed at the on-die current load node.

The output impedance of the PDN is shown in Fig. 3.10. There are mainly three anti-resonance peaks around 219.0kHz, 4.372MHz and 91.38MHz in the impedance profile. The peak impedance is shown as,

$$Z_{\text{max}} = 0.215(\Omega).$$  \hspace{1cm} (3.62)

Those anti-resonance peaks result in low-frequency, middle-frequency and high-frequency fluctuations in the PDN step response. By catching the maximums and minimums of the step response and applying Eq. 3.10 in Matlab, the worst-case voltage noise is calculated as

$$V_{\text{max}} = 0.2998(V).$$  \hspace{1cm} (3.63)

Thus, the maximum $\gamma$ for this PDN case is

$$\gamma_{\text{max}} = 1.394.$$

Therefore, for real PDN cases, the maximum $\gamma$ can be greater than 1, which
shows that the traditional target impedance method underestimates the worst-case noise by assuming $\gamma$ no more than 1.

Fig. 3.11 demonstrates the method of generating the worst-case voltage noise in time domain. Load current is bounded from 0 to 1(A). Based on the impedance profile in Fig. 3.10, the impulse response $h(t)$ of the system is determined. We then apply the convolution method in Section 3.2 to figure out the worst-case voltage noise and the load current pattern. The simulation time step is set to 10ps and $T$ in Eq. 3.9 is set to 0.1ms. The worst-case voltage noise of the PDN is shown in Fig. 3.11(a), and its corresponding input current pattern is shown in Fig. 3.11(b). Fig. 3.11(c) shows zoom-in view at the peak voltage noise at $T = 0.1ms$ with the high-frequency switching current pattern.

Another way to quickly estimate the worst-case voltage noise is from the standard LC tank discussed in Section 3.3.2. As shown in Fig. 3.9, a three-stage PDN model is decomposed into three LC tank models in different frequency regions. Each tank contributes to a portion of the worst-case noise which can be calculated from Eq. 3.46 - 3.50. We also observe that there is a noise cancellation effect between two neighboring tanks, which means that the sum of voltage noises from all three tanks exceeds the actual worst-case noise from Eq. 3.10. The amount of noise cancellation can be estimated by the impedance valley between two peaks. For example, $Z_{valley1}$ between peak m1 (Tank A) and m2 (Tank B) is 15.5$m\Omega$, $Z_{valley2}$ between peak m2 and m3 (Tank C) is 10.8$m\Omega$. Thus, the estimated worst-case noise can be expressed as,

$$\tilde{V}_{max} = V_{tankA} + V_{tankB} + V_{tankC} - I_{max} \times (Z_{valley1} + Z_{valley2}),$$

(3.65)

where $V_{tankA}$ is the peak noise from Tank A, etc. $I_{max} = 1(A)$ from Eq. 3.7.
Figure 3.11: (a) Worst-case peak noise of a complete PDN path, (b) Worst-case load current pattern, (c) The zoomed-in view for the worst peak noise on PDN, (d) The zoomed-in view for the worst-case load current pattern.
We list the noise contribution of each tank of Fig. 3.11(a)-(c) to the worst-case noise in the first case of Table 1. \( V_{\text{tankA}} \), \( V_{\text{tankB}} \) and \( V_{\text{tankC}} \) are the worst-case noise of the three standard LC tanks decomposed from the complete PDN path. \( Z_{\text{valley1}} \) and \( Z_{\text{valley2}} \) are extracted from the output impedance profile. \( \hat{V}_{\text{max}} \) is the estimated worst-case noise upper bound from Eq. 3.65. The prediction error compared with the exact results \( V_{\text{max}} \) in Eq. 3.10 is listed in the last column. Compared to the exact result from Eq. 3.63, the estimated result from three LC tank models has an estimation error of 6.80%, which provides designer quick design guidelines to optimize the noise from each LC tank. The method provides quicker prediction than the Eq. 3.10 method in Lemma 1 when a PDN system contains more than two tanks and the impedance peaks are from 100Hz to 10GHz, requiring a small time step with a long time series for simulation, which results in a memory-hungry and time-consuming calculation. By decomposing cascaded LC tank into several standard LC tanks in the frequency-domain, the worst-case noise time can be greatly reduced.
Table 3.1: The worst-case noise prediction of three complete PDN cases.

<table>
<thead>
<tr>
<th>Cases</th>
<th>$V_{tankA}$ (V)</th>
<th>$V_{tankB}$ (V)</th>
<th>$V_{tankC}$ (V)</th>
<th>$Z_{valley_1}$ ($\Omega$)</th>
<th>$Z_{valley_2}$ ($\Omega$)</th>
<th>$\tilde{V}_{max}$ (V)</th>
<th>$V_{max}$ (V)</th>
<th>error (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>I (Fig. 3.9)</td>
<td>0.0678</td>
<td>0.1350</td>
<td>0.1437</td>
<td>0.0155</td>
<td>0.0108</td>
<td>0.3202</td>
<td>0.2998</td>
<td>6.80%</td>
</tr>
<tr>
<td>II (Fig. 3.12(a))</td>
<td>0.0545</td>
<td>0.0568</td>
<td>0.1717</td>
<td>0.0093</td>
<td>0.0060</td>
<td>0.2677</td>
<td>0.2597</td>
<td>3.08%</td>
</tr>
<tr>
<td>III (Fig. 3.12(b))</td>
<td>0.1545</td>
<td>0.1447</td>
<td>0.1468</td>
<td>0.0041</td>
<td>0.0050</td>
<td>0.4369</td>
<td>0.3927</td>
<td>11.26%</td>
</tr>
</tbody>
</table>
Two extra PDN cases are analyzed to check the accuracy of our proposed prediction method. The circuit models of those two cases are shown in Fig. 3.12(a) and (b). The impedance profiles of the three test cases are diversified in shape in order to test the robustness of our method. Meanwhile, we exclude the case where one tank has the dominant anti-resonance peak in the impedance profile, which resembles to a standard LC tank. For those cases, the estimation error is very small. On average, the estimation error of $V_{\text{max}}$ for three cases is 7%.

3.5 Case Study: Power Distribution Network Design Optimization with On-Die Voltage Dependent Leakage Path

3.5.1 Voltage Dependent Leakage Resistance Model

On-die leakage current comes from three main contributors: subthreshold leakage, gate leakage and band-to-band leakage (BTBT) [50]. Gate leakage has been substantially reduced as the high-k dielectrics in massive CMOS production and band-to-band leakage is relatively small compared to the other two. Therefore, we focus on subthreshold leakage in this section.

Subthreshold leakage is a weak inversion current between source and drain in a MOS transistor when the gate voltage is below the threshold voltage $V_t$. In digital design, we can analyze the subthreshold leakage by setting the gate voltage $V_g = Gnd$ for NMOS and $V_g = V_{dd}$ for PMOS. The weak inversion current $I_{ds}$ is a function of the threshold voltage $V_t$. $V_t$ is mainly determined by two factors.

- Body effect: $V_t = V_{t0} + \gamma(\sqrt{\phi_s} + V_{sb} - \sqrt{\phi_s}) \approx V_{t0} + k\gamma V_{sb}$, where $\phi_s = 2\sqrt{\ln \frac{N_A}{n_1}}$, $\gamma = \frac{t_{ox}}{\varepsilon_{ox}} \sqrt{2q\varepsilon_{si}N_A} = \frac{\sqrt{2q\varepsilon_{si}N_A}}{\varepsilon_{ox}}$ and $k\gamma = \frac{\gamma}{2\sqrt{\phi_s}}$. 
Figure 3.12: Two PDN cases to test the proposed prediction method.

- Drain-induced barrier lowering (DIBL): \( V_t = V_{t0} - \eta V_{ds} \), where \( \eta \) is on the order of 0.1.

Therefore, the subthreshold leakage can be expressed as,

\[
I_{ds} = I_{ds0}e^{\frac{V_{gs}-V_t}{n\nu_T}}\left(1 - e^{-\frac{V_{ds}}{\nu_T}}\right),
\]

(3.66)

where \( I_{ds0} = \beta v_T^2 e^{1.8} \), \( n = 1.3 \sim 1.7 \), \( \nu_T = \frac{kT}{q} \), \( V_t = V_{t0} + kV_{sb} - \eta V_{ds} \) and \( \beta = \mu_0 \frac{V_{ox} W}{L} \). (All the parameters are explained in [68].) By setting \( V_{ds} = V_{dd} \), it can be inferred that \( I_{ds} \) is superlinear proportional to the supply voltage.

The leakage resistance \( R_{leak} \) becomes a function of \( V_{dd} \),

\[
R_{leak} = \frac{V_{dd}}{I_{ds}}.
\]

(3.67)
We compare the theoretical model from Eq. 3.66 with an industrial 28nm HPm NMOS Spice model. We set the voltage of each port of NMOS: \( V_d = V_{dd}, \)
\( V_s = Gnd, \)
\( V_g = Gnd \) and \( V_b = Gnd \). The nominal \( V_{dd} \) is 0.9V and the operating temperature is set to 25 deg C. The results are shown in Figure 3.13. As the supply voltage is swept from 0.1V to 1.3V, we observe that \( R_{leak} \) first increases when \( V_{dd} < V_t \), reaches a peak value when \( V_{dd} \approx V_t \) and then decreases when \( V_{dd} > V_t \). Results show that the theoretical model from Eq. 3.66 can accurately match the industrial model when \( V_{dd} > 0.5V \).

Figure 3.13(b) shows that the leakage resistance of a single transistor is on the order of \( 10^7 \Omega \). Meanwhile, the transistor count of a single high performance CPU had topped 5 billion in 2012 [4]. Suppose 10% of transistors contribute to the on-die leakage, the equivalent full-chip leakage resistance can be found on the order of \( 100 \text{m}\Omega \). Since \( 18 \text{m}\Omega \) target impedance for a 1GHz chip with \( 1cm^2 \) die area in 2012 (Figure 1.1), the ratio of leakage resistance over target impedance can be approximate to five. To cover all the possible leakage resistance over target impedance cases in various IC designs, we analyze such resistance ratio in a wide range (from 1 to 100) in this section.

### 3.5.2 RLC Tank Model with Leakage Resistance

We discuss the impact of leakage resistance on PDN noise of the RLC tank model in this subsection. Figure 3.14 shows a complete PDN path for system-level analysis. Previous studies show that RLC tank model is a basic element of PDN and the worst-case noise is a summation of the worst-case noise from each individual tank [73]. Traditionally, the on-chip load is modelled as a current source (Figure 3.14(a)). Here we model the on-chip load as a current source in parallel with a constant leakage resistor \( R_{leak} \) (Figure 3.14(b)) or a current source in parallel
Figure 3.13: (a) Leakage current vs supply voltage (b) Equivalent leakage resistance vs supply voltage

Figure 3.14: A circuit diagram characterizes the impedance of PDN. On-chip load can be modelled as (a) a single current source, (b) a current source with constant leakage resistor, (c) a current source with voltage-dependent leakage resistor.
Figure 3.15: A RLC tank model with leakage resistance

with a voltage-dependent leakage resistor $R_{\text{leak}}(v(t))$ (Figure 3.14(c)).

Figure 3.15 shows a RLC tank model with leakage resistance $R_3$. Its impedance profile $Z(s)$ in Laplace domain can be expressed as,

$$Z(s) = \frac{s^2LCR_2 + s(R_1R_2C + L) + \frac{R_1}{s(LC + s(R_1 + R_2)C + 1)}}{s^2LC + s(R_1 + R_2)C + 1 + R_3(s)}.$$

(3.68)

The PDN noise $v(t)$ is calculated from the convolution of the load current $i(t)$ and the system impulse responses $h(t)$,

$$v(t) = \int_0^\infty h(\tau)i(t - \tau)\,d\tau$$

(3.69)

$$\forall t : 0 \leq i(t) \leq a$$

where $h(t)$ is from the inverse Laplace transform of $Z(s)$. Numerically, the worst-case noise (voltage droop) $V_{\text{max}}$ can be obtained by setting $i(t - \tau) = a$ when $h(\tau) > 0$ and $i(t - \tau) = 0$ when $h(\tau) \leq 0$. We analyze the problem by setting $R_3$ as a constant value or a voltage-dependent variable. The design objective is
to minimize the worst-case noise. We also define the overshoot of a PDN to be $\text{min}(v(t))$.

**Constant Leakage Resistance**

If $R_3$ is set to a constant, Eq. 3.68 is simplified to a second-order system. When the leakage resistance is much greater than the impedance of the rest circuit (e.g. two order of magnitude difference), the leakage path can be ignored and the worst-case noise can be predicted from [73]. Otherwise, the leakage path needs to be included in the worst-case noise calculations.

For example, we extract a RLC tank with $(C = 0.1 \mu F, L = 0.1 nH)$ from a PDN. The upper bound of $i(t)$ is set to 1. From various combinations of $R_1$ and $R_2$, we search for the minimum worst-case noise from Eq. 3.69 in Matlab. Simulation results show that the minimum value of the worst-case noise is 0.0282V, where the $R_1 = 0.018 \Omega$ and $R_2 = 0.022 \Omega$. The peak impedance is 35.1m$\Omega$ without leakage resistance $R_3$. Based on this peak impedance range, we sweep the corresponding $R_3$ from 30$\Omega$ to 30m$\Omega$. As $R_3$ decreases, we find the worst-case noise monotonically drops as well. When $R_3$ falls in the same magnitude of the original target impedance without $R_3$, $R_2$ gradually decreases and $R_1$ drops dramatically for the minimal worst-case noise. Our observation of the minimum value of the worst-case noise and its corresponding optimal $R_1$ and $R_2$ are shown in Figure 3.16.

**Voltage-Dependent Leakage Resistance**

$R_3$ is modelled as a function of the voltage at the load $(V_{dd} - v(t))$ (Eq. 3.67) in this subsection. The nominal voltage $V_{dd}$ is set to 0.9V and the tolerance of supply noise is set to ±10% of $V_{dd}$. We keep the same parameters as the previous case: $C = 0.1 \mu F, L = 0.1 nH, R_1 = 0.018 \Omega, R_2 = 0.022 \Omega$ and increase the upper
Figure 3.16: (a) The optimal value $R_1$ and $R_2$ (with minimum worst-case noise) as leakage $R_3$ decreases. (b) The minimum worst-case noise of a RLC tank as leakage $R_3$ decreases.

bound of $i(t)$ to 3.17A to scale up the noise to $V_{noise} = 0.09V$.

Eq. 3.8 cannot be directly applied to calculate $v(t)$ in this case as the impulse response of the system $h(t)$ changes dynamically due to the variations from the leakage resistance. Instead, we use the Backward Euler method to analyze this model. We set inductor current $i_L(t)$ and capacitor voltage $v_C(t)$ as two variables and derive two equations from Figure 3.15. $R_3$ is updated in each time step according the current supply voltage level.

$$\begin{align*}
L \frac{di_L}{dt} + i_L R_1 &= v_C + C \frac{dv_C}{dt} R_2 \\
\frac{di_L}{dt} + i_L C \frac{dv_C}{dt} + \frac{1}{R_3(t)} (L \frac{di_L}{dt} + i_L R_1) &= i(t)
\end{align*}$$ (3.70)

Figure 3.17 shows how leakage resistance $R_3$ changes in real-time as the load current $i(t)$ changes. Assume $R_3 = 300m\Omega$ at nominal voltage $V_{dd} = 0.9V$.

We compare the PDN noise with same load current pattern for both constant leakage resistance model and voltage-dependent leakage resistance model.
We set $R_3$ at the nominal voltage equal across all the models.

Our results are shown in Figure 3.18. Voltage noise is divided into two categories: overshoot and droop. Results show that the constant $R_3$ model underestimates voltage droop/overshoot for more than 16% compared to voltage-dependent model when $R_3$ approaches the impedance of the rest circuit without $R_3$.

We also observe that when $R_3$ is set to the value at $V_{dd} - I_{avg} \times DCR$ in the constant leakage model, where $I_{avg}$ is the average load current and $DCR$ is DC resistance of PDN, it provides similar noise value as the voltage dependent model. It slightly underestimates the droop and overestimate the overshoot (both differences are less than 2%). This approximation method can greatly reduce the simulation time since there is no need to update $R_3$ in Eq. 3.70 for each time step.

### 3.5.3 A Complete PDN Path with On-Die Leakage

A complete PDN path with on-die leakage is set up from Figure 3.14(c). Its impedance profile is shown in Figure 3.19 with different leakage resistors. As the
leakage resistance $R_3$ drops, the magnitude of all the impedance peaks is reduced.

Suppose that $R_3$ is 300 $m\Omega$, we compare the results of the voltage noise between the constant and voltage-dependent leakage models in time-domain in Figure 3.20. The constant leakage model at $V_{dd}$ underestimates the peak voltage noise 5% compared to the voltage-dependent leakage resistance model. Figure 3.21 shows the voltage noise (droop and overshoot) with different leakage resistance models from Figure 3.19. The constant leakage at $V_{dd}$ model underestimates the maximum voltage droop(overshoot) for up to 16% (25%) compared to voltage-dependent model, while the constant leakage at $V_{dd} - I_{avg} \times DCR$ model underestimates the droop for only 2% and overestimates the overshoot up to 3%.

3.6 Summary

In this chapter, we define the ratio of the worst-case voltage noise and the maximum impedance of PDNs. We analyze LC tank models in real PDN
Figure 3.19: Impedance profile of a complete PDN path with various leakage resistance values.

Figure 3.20: The peak voltage noise (droop) of a complete PDN path in time-domain
Figure 3.21: Voltage noise of a complete PDN path with different leakage resistance models

structures. The maximum ratio $\gamma$ for LC tank is proved to be 1.5 when the resistors $R_1 = R_2$ and the quality factor $Q \to 0$. We analyze the worst-case noise of a complete PDN path and demonstrate that $\gamma$ is more than 1. In addition, we propose a method to estimate the worst-case noise of a complete PDN path through the analytical solution of several LC tanks. Our results contradict the assumption of the well-known “target impedance” design methodology. From the results, we conclude the necessity of studying the shape of output impedance in addition to the target impedance.

Future power distribution network requires additional attention to leakage resistance as the on-die leakage current keeps increasing. In the last section of this chapter, we propose to design and optimize the power distribution network with the consideration of constant and voltage-dependent leakage resistance path. We demonstrate that the leakage resistance can effectively affect the optimal resistor values in RLC tank model, when it is close to the same scale of the target impedance.
Chapter 4

Worst-Case Noise Area

Prediction of On-Chip Power Distribution Network

We propose a prediction of the worst-case noise area of the supply voltage on the power distribution network (PDN). Previous works focus on the worst-peak droop to sign off PDN. In this chapter, we (1) study the behavior of circuit delay over the worst-area noise, (2) study the worst-case noise area of a lumped PDN model, (3) develop an algorithm to generate the worst-case current for general PDN cases, and (4) predict the longest delay of a datapath due to power integrity. Experimental results show that the worst-area noise induces additional delay than that of the worst-peak noise.
4.1 Background

The aggressive advances in process technology increase the current demand and tighten the design rules. Such variation causes transistor delay [59], clock jitter [53] and many other negative effects, which degrade the overall performance [32]. As a result, PDN analysis becomes an important research topic [61]. PDN noise comes from the DC resistance and loop inductance of power/ground lines, which results in \( IR \) drop and inductive noise \( (L \frac{di}{dt}) \) at the load [55].

Figure 4.1 shows a typical PDN that consists of a voltage regulator module (VRM), PCB/package loop parasitics and on-die power grid with decoupling capacitors. A successful PDN design requires the power/ground loops presenting acceptable impedances at all frequencies of interest.

Many previous works focused on the worst voltage drop in time-domain [18, 21, 34] and in frequency-domain [67, 36, 51] PDN analysis. Kouroussis et al. [41] proposed a vectorless approach for PDN integrity verification. This was later extended by Ferzli et al. [23] to a geometric approach for early estimation. Smith et al. [61] developed a method to systematically characterize the PDN noise. Ketkar et al. [33] studied micro-architecture based framework for PDN analysis. Chiprout [17] discussed pre-silicon stimulus and post-silicon activity generation to excite the
worst-case voltage drop. Abdul Ghani et al. [25] verified the PDN using node and branch dominance. Swaminathan et al. [30] used power transmission line to reduce the PDN noise.

Traditional PDN analysis concentrates on limiting the peak voltage drop. By applying constant supply voltage minus peak voltage on slow-slow (ss) corner transistors, designers may figure out the maximum drop that the critical path can tolerate to close the timing. However, this leads to an over-design as the duration of the peak drop of supply noise may be very short in real applications. Figure 4.2 shows two periodic supply voltage noise patterns applied to a datapath. The nominal delay of the circuit under $V_{dd} = 1V$ is $D_0^1$. The dash curve has a peak voltage drop of 0.25V and noise area of 0.025T, which induces 1.11$D_0$ signal delay. The dot curve has a peak voltage drop of 0.2V and noise area of 0.066T, which induces 1.23$D_0$ signal delay. Due to larger noise area, the dot curve induces 11% larger delay, despite its 20% smaller peak noise.

\[ D_0 \approx 100\text{ps} \] according to our HSPICE simulation with 45nm PTM HP model [75].
In this chapter, we focus on the prediction of the worst-area noise of a PDN under a certain time window and the worst-case load current profile which generates the worst-area noise. We then predict the maximum circuit delay under such voltage noise profile. The importance of the noise area estimation on PDN analysis have been proposed and discussed by Intel [59] and Hashimoto’s group on device level [52]. However, to the best of our knowledge, none of prior works provides quantitative analysis on the impact of noise area over the performance. Moreover, there is no prediction about the worst-case noise area.

4.2 Problem Formulation

We formulate the problem as maximizing the voltage noise area by designing current wave. A general PDN system, as Figure 4.1 shown, is characterized by the impulse response on the load node, \( i(t) \) (Figure 4.3(a)). Based on \( h(t) \) and a window size \( T \), we design the current stimulus such that the voltage response has the maximum noise integral (area) within all possible intervals of length \( T \) on the time domain.

Current stimuli \( i_k(t) \) at node \( k \) are caused by circuit activities. We lumped all the on-die load into a single load current \( i(t) \) for our analysis. As part of transistors are active at each time, the magnitude of \( i(t) \) varies within a range. The range is application dependent and can be approximated through the system-level simulation or post-silicon measurement. The assumptions of current constraints and zero transition time are used in many previous works [41, 23]. We follow the assumption of zero transition time and bound the total current demand by \( i(t) \in [0, 1] \) in the rest of the chapter.

The voltage noise \( v(i, t) \) of the PDN system is the convolution of \( i(t) \) and
\( h(t) \) as Eq. 4.1.

\[
v(i, t) = \int_{0}^{+\infty} h(\tau)i(t - \tau)d\tau \text{ s.t. } i(t) \in [0, 1], \ t \geq 0
\] (4.1)

Note that we can scale \( v(i, t) \) accordingly once the upper bound of \( i_k(t) \) is obtained.

The window size \( T \) is a constant, which refers to one clock cycle or other critical time period, in order to correlate with overall system performance. We slide the window along the timing-axis of \( v(i, t) \). The area of noise at each time \( t \) is defined as \( A(i, t) \), which is the integral of \( v(i, t) \) in \( [t - T, t] \).

\[
A(i, t) = \int_{t-T}^{t} v(i, t)dt = \int_{t-T}^{t} \int_{0}^{+\infty} h(t - \tau)i(\tau)d\tau
\] (4.2)

The maximum voltage noise area of \( A(i, t) \) under window size \( T \) is defined as \( A_w \). Current stimuli and time causing \( A_w \) are defined as \( i_w(t) \) and \( t_w \), respectively. Similarly, we define the worst-case voltage response as \( v_w(t) \), on which \( A_w \) is obtained at \( t_w \).

\[
A_w = \max_{i, t} A(i, t) = A(i_w, t_w) = \int_{t_w-T}^{t_w} v_w(t)dt
\] (4.3)

We can develop an algorithm to solve the above problem in linear time, based on the simplifications as below.

- **Binary-Valued Worst Current:** We set \( i_w(t) \) as a binary-valued function \((0 \lor 1)\).

- **Current Decomposition:** For each load current, \( i_w(t) \) can be decomposed into a series of step inputs \( s(t - t_k) \) with constant amplitude \((\pm1)\) and monotonically increased phase delay. Here \( s(t) \) is a step input and \( t_k \) is
Figure 4.3: An example of PDN system with (a) the impulse response $h(t)$, (b) the step response $V_s(t)$, (c) the ramp response $R_s(t)$ (integral of $V_s(t)$) and (d) the noise area function $A_s(t)$. 
the phase delay of the \( k^{th} \) step input. Without loss of generality, suppose that \( \{t_0, t_1, \ldots \} \) is in ascending order.

\[
i_w(t) = \sum_{k=0} (-1)^k s(t - t_k) = \sum_{k=0} (-1)^k s_k(t) \quad (4.4)
\]

To generate \( i_w(t) \), we need to calculate the phase delay \( (t_k) \) of every step input \( (s_k) \).

- **Voltage Area Responses of Single Step Input** \( A_s(t) \): Figure 4.3(b) shows an example of the voltage response \( V_s(t) \) with a single input \( s_k(t) \). We observe that the integral within window size \( T \) on the step response can be formulated as a ramp response \( R_s(t) = \int_0^t V_s(t) \, dt \), as shown in Figure 4.3(c). We substitute Eq. 4.4 into Eq. 4.2 and define \( A_{s_k}(t) = A(s_k(t), t) \) as follows.

\[
A_{s_k}(t) = \int_{t-T}^t \int_0^{+\infty} h(t - \tau)(-1)^k s(\tau - t_k) \, d\tau \, dt
= \int_{t-T}^t (-1)^k V_s(t - t_k) \, dt
= (-1)^k (R_s(t - t_k) - R_s(t - T - t_k)) \quad (4.5)
\]

From Eq. 5.2, we can derive \( A_s(t) \) by setting \( t_k = 0 \) thus \( A_{s_k}(t) = A_s(t - t_k) \), which is illustrated in Figure 4.3(d). It corresponds to the definite integral of \( V_s(t) \) in \([t - T, t]\), as shown by the shaded area of Figure 4.3(b). Based on the definition of \( A_s(t) \), the optimum phase delay sequence \( \{t_0, t_1, \ldots \} \), and the optimum window location \( t_w \), we can obtain the worst-case noise area \( A_w \) as follows.

\[
A_w = \sum_{k=0} A_{s_k}(t_w) = \sum_{k=0} A_s(t_w - t_k) \quad (4.6)
\]

Based on all the above definitions and simplifications, we formulate our problem as a linear-constrained linear optimization, which is concisely defined as
below.

- **Input:** $h(t)$ and window size $T$.
- **Output:** \{ $t_0, t_1, \ldots$ \} and $t_w$, calculate $i_w(t)$ by Eq. 4.4.
- **Objective:** $A(i_w, t_w) = A_w$.
- **Constraint:** $i_w(t) \in [0, 1]$, $\forall t \in [0, +\infty)$.

### 4.3 Worst Noise Area Prediction of RLC tank:

**Analytical Solution**

A typical PDN is a complex circuit model which can be approximated as the cascaded RLC tank models \cite{67, 73}. We study the worst-case voltage noise area of an RLC tank model. We derive the closed-form expressions of the noise area from the ramp response of the model. The relations among noise area, quality factor, decaps $C$ and its ESR $R_2$ are studied.

Let $A(s)$, $H(s)$ and $I(s)$ denote the Laplace transform of $A(i, t)$, $h(t)$ and $i(t)$, respectively. Eq. 4.2 can be written as

$$A(i, t) = \int_t^{t+T} v(i, t)dt \xrightarrow{\text{Laplace}} A(s) = \frac{H(s)I(s)}{s} \quad (4.7)$$

Figure 4.4 shows a standard RLC tank. $R_1$ and $L$ are used to model the parasitic resistance and inductance of the PDN interconnects. $C$ and $R_2$ represent a decap with ESR $c$.

The impedance profile of Figure 4.4 can be written as

$$Z(s) = \frac{s^2LCR_2 + s(R_1R_2C + L) + R_1}{s^2LC + s(R_1 + R_2)C + 1} \quad (4.8)$$
The quality factor, $Q$, and the resonant frequency, $\omega_0$, are

$$Q = \frac{1}{R_1 + R_2} \sqrt{\frac{L}{C}}, \quad \omega_0 = \frac{1}{\sqrt{LC}} \quad (4.9)$$

For a normal PDN design with limited cost budget, $Q \geq 0.5$ and the RLC tank is underdamped. In the case of $Q < 0.5$, the PDN is over-designed with excessive decoupling capacitors which is not the scope of this chapter.

To derive the expressions for the worst-case noise area, we first study the step and ramp response of the model.

**Lemma 4** The step response of an underdamped RLC tank is

$$V_s(t) = R_1 + 2e^{-\alpha t}[A\cos(\beta t) - B\sin(\beta t)] \quad (4.10)$$

where $\alpha = \frac{\omega_0}{2Q}$, $\beta = \sqrt{\omega_0^2 - (\frac{\omega_0}{2Q})^2}$, $A = \frac{1}{2}(R_2 - R_1)$, $B = R_2 \frac{1}{2} \sqrt{(1+Q_1^2)-(Q_2+\frac{1}{Q_1})}$, $Q_1 = \frac{1}{R_1} \sqrt{\frac{L}{C}}$, $Q_2 = \frac{1}{R_2} \sqrt{\frac{L}{C}}$.

**Lemma 5** The ramp response of an underdamped RLC tank is,

$$R_s(t) = \int_0^t V_s(t)dt = R_1 t + \frac{1}{\beta} [K_1\cos(\beta t) + K_2\sin(\beta t)]e^{-\alpha t} \quad (4.11)$$
where \( K_1 = \frac{R_1(Q^2Q_2^2-Q^2+2QQ_2-Q_2^2)}{QQ_2(Q-Q_2)} \sqrt{1 - \frac{1}{4Q^2}} \), \( K_2 = -\frac{R_1(4Q^3Q_2^2-Q^2+2QQ_2+Q_2^2)}{2QQ_2(Q-Q_2)} \).

The ramp response \( R_s \) is derived from the integral of \( V_s \). Based on \( R_s \), the results lead to the following theorem.

**Theorem 3** Given a window size \( T \), the worst-case voltage noise area \( A_w \) of an underdamped RLC tank is,

\[
A_w = \sum_{k=0}^{n} A_{s_k}(t_w) = \sum_{k=0}^{n} A_s(t_w - t_k)
\]  

(4.12)

where \( t_w \) is set to a relatively large value where \( h(t) \approx 0 \) and \( t_k \) is the time (phase delay) where local peaks/valleys of \( A_s \) occur, solved by equating the derivatives of \( A_s \) to zero. \( A_s \) can be expressed as follows

\[
A_s(t) = \begin{cases} 
R_s(t) - R_s(t - T) & : t > T, \\
R_s(t) & : t \leq T.
\end{cases}
\]  

(4.13)

Since \( A_s(t) \) is a piecewise-defined function upon the region of \( t \) (Eq. (4.13)), we can derive the results of \( t_k \) from the following two cases, (1) \( t > T \) and (2) \( t \leq T \).

1. For \( t > T \), local peaks/valleys \( t_k \) are

\[
t_k = \begin{cases} 
\frac{1}{\beta}(\arctan(\frac{A-X}{B-Y}) + k\pi) & : \frac{A-X}{B-Y} \geq 0 \\
\frac{1}{\beta}(\arctan(\frac{A-X}{B-Y}) + (k+1)\pi) & : \frac{A-X}{B-Y} < 0
\end{cases}
\]  

(4.14)

where \( k = 0, 1, ..., n, t_k > T, X = e^{\alpha T}(A\cos(\beta T) + B\sin(\beta T)), Y = e^{\alpha T}(A\sin(\beta T) + B\cos(\beta T)) \).

2. For \( t \leq T \), local peaks and valleys \( t_k \) occur at \( R_s'(t) = V_s(t) = 0 \), which
are the solutions of a transcendental equation,

\[ R_1 + 2e^{-\alpha t}[A\cos(\beta t) - B\sin(\beta t)] = 0. \]  

(4.15)

Because \( \alpha > 0 \), \( t_k \) occurs limited times when \( t \leq T \). Plugging the results of Eq. (4.14), (4.15) back into Eq. (4.12), \( A_w \) can be derived.

### 4.4 Worst Noise Area Prediction for PDN Cases:

#### Algorithmic Solution

We propose an algorithm to find the worst-case noise area for a general PDN profile extracted from the commercial tools. The pseudo-code of our method is presented in Algorithm 1. We use Figure 4.3 to illustrate each intermediate signal during the optimization. From the load current assumption in Section 4.2, we can decompose \( i(t) \) into \( n \) step inputs with constant amplitude \( \pm 1.0 \). To calculate \( i_w(t) \) we only need to determine the phase delay of each step input. Given arbitrary impulse response \( h(t) \) and window size \( T \), our algorithm is able to output \( t_w \) and all \( t_k \) such that \( A_w \) is achieved.

**Design of Algorithm:** The algorithm can be described as follows. Firstly, we convolute \( h(t) \) (Figure 4.3(a)) with step input \( s(t) \) and obtain the step response \( V_s(t) \) (Figure 4.3(b)), then calculate the noise area function \( A_s(t) \) (Figure 4.3(d)). To approach \( i_w(t) \), we need to maximize (minimize) the contributions of all positive (negative) step inputs, which is no larger (smaller) than the sum of all peaks (valleys) of \( A_s(t) \). Secondly, we extract all the peaks and valleys of \( A_s(t) \) into \( A_s(t_{pv}) \). The leftmost and rightmost element of \( A_s \) will also be added to \( A_s(t_{pv}) \) if they are peaks. As every negative step input is sandwiched by two positive step
Algorithm 1 \([i_w, t_w, A_w] = GetWorstCase(h, T)\)

1: **INPUT:** Impulse response \(h\) (length \(n\)), window size \(T\)
2: **OUTPUT:** Worst-case current wave \(i_w\), window coordinate \(t_w\), noise area \(A_w\)
3: Set \(V_s\) as the step response of \(h\), \(A_s[k]\) as the definite integral of \(V_s\) in \([k, k+T)\)
4: Set \(A_s(t_{pv})\) as peaks and valleys of \(A_s\), \(|t_{pv}| = 2m - 1\)
5: Set \(A_w = \sum_{i=0}^{m-1} A_s(t_{pv2i}) - \sum_{i=0}^{m-2} A_s(t_{pv2i+1})\)
6: Set \(t_{cur} = 0\) and \(t_w = x_0 = t_{pv2m-2}\)
7: for all \(x \in t_{pv}\) in reverse order do
8: \(t_{new} = t_{cur} + (x - x_0)\)
9: if \(x\) is a peak then
10: \(i_w[t_{cur} : t_{new}] = 1\)
11: else\n12: \(i_w[t_{cur} : t_{new}] = 0\)
13: end if
14: Set \(x_0 = x\) and \(t_{cur} = t_{new}\).
15: end for
16: return \([i_w, t_w, A_w]\)

inputs, we have each valley in \(A_s(t_{pv})\) be sandwiched by two peaks on both sides. Suppose there are \(m\) peaks thus \(m - 1\) valleys extracted, we have \(|t_{pv}| = 2m - 1\).

Using \(t_{pvj}\) to denote the \(j^{th}\) element of \(t_{pv}\), \(A_w\) is calculated at line 5 as

\[
A_w = \sum_{i=0}^{m-1} A_s(t_{pv2i}) - \sum_{i=0}^{m-2} A_s(t_{pv2i+1}) \tag{4.16}
\]

Thirdly, \(t_w\) is to the time of the last peak \(t_{pv2m-1}\) to make enough space for all step inputs to be correctly shifted. We calculate the phase delay \(t_k\) for each step input \(s_k(t)\), and construct \(i_w(t)\) as the superposition of them. Specifically, \(t_k\) is determined by the parity of \(k\) as below.

- **k is even:** Let \(x = m - \frac{k}{2}\), shift the \(k^{th}\) step input \(s_k(t)\) by aligning the \(x^{th}\)
  peak of \(s_k(t)\) to \(t_w\). We have \(t_k = t_{pv2x}\).

- **k is odd:** Let \(x = m - \frac{k+1}{2}\), shift the \(k^{th}\) step input \(s_k(t)\) by aligning the \(x^{th}\)
  valley of \(s_k(t)\) to \(t_w\). We have \(t_k = t_{pv2x}\).
Figure 4.5(a) demonstrates the method by which we determine the phase delay of each step input, notice that \( s_k(t) \) is actually aligned to the \( t_w \) axis at \( t_{pv2m-1-k} \). Figure 4.5(b) shows how we construct \( i_w(t) \).

**Proof of Optimality:** Given arbitrary \((h(t), T)\), our algorithm always outputs \( i_w(t) \) and \( t_w \), with maximum noise area \( A_w \).

**Theorem 4** *Our algorithm is optimum on maximizing \( A_w \).*

The proof of Theorem 4 can be found in Section S1.

**Analysis of Complexity:** The overall complexity of our method is \( O(n) \), as there are only finite operations included in Algorithm 1, while all of them are no more complex than linear. Here \( n \) is the length of the vector of the discretized PDN impulse response \( h(t) \). The value of \( n \) represents a trade-off between accuracy and efficiency of the optimization.

The proposed worst-case current prediction can figure out the worst-case peak noise and the worst-case noise area for general PDN cases.
4.5 Experimental Results

We implement our algorithm in Matlab R2013a. The circuit performance is simulated by HSPICE D-2013.03-SP1. Our test datapath is extracted from ISCAS85 benchmark circuit with 0.13um standard spice model. All the experiments, including both the optimization and the simulation, are executed on a Windows 7 machine with an Intel i7 3.4GHz quad-core CPU and 16GB memory. We design our experiments as follows.

- We study the relation of the circuit delay and the supply voltage noise area.
- We analyze the delay of a datapath under the worst-peak and the worst-area noise for a standard RLC tank model.
- We compare the results of the worst-peak and worst-area noise prediction between RLC tank analytical solutions and algorithmic solutions for complete PDN paths with cascaded RLC tanks.
- We measure the delay of a datapath under the worst-area noise of a complete PDN path extracted from commercial software tools.

4.5.1 Circuit Delay vs Supply Noise Area

The relation between the delay of a datapath and the supply noise area is investigated in this subsection. The test datapath is a customized circuit modified from C432 of ISCAS85 circuit. Delay between one input and output port are measured under various supply noise areas as shown in Fig. 4.6. The supply voltage fluctuates from 0.76V to 1.2V. The negative voltage area means the majority noise from droop, while positive represents the majority noise from overshoot. The end
to end delay under constant 1V is normalized to 1. Results show that the delay increases quadratically as the voltage droop area increases.

![Graph](image)

**Figure 4.6**: Normalized delay of a datapath under different supply voltage noise area. (The delay under constant $V_{dd} = 1V$ is normalized to 1.)

### 4.5.2 Critical Path Delay under Worst-Area and Worst-Peak Supply Noises of an RLC Tank

We create a RLC tank model as shown in Figure 4.4, where $R_1 = 10m\Omega$, $l = 0.25nH$, $C = 33nF$ and $R_2 = 12m\Omega$. The nominal voltage and window size $T$ are set to 1V and 17ns, respectively. The simulation time step is set to 0.5ns. Using Algorithm 1, We generate the worst area/peak load current, the worst area/peak voltage response and the voltage noise area responses as shown in Fig. 4.7. The worst peak noise is obtained by setting the window size to the minimum time step, i.e., $T = 0.5ns$. Time causing the worst-case $t_w$ for both the worst-area and worst-peak case are aligned to 500us in Fig. 4.7. The load
current beyond 500\(\mu s\) are set to 1. Fig. 4.7(a) confirms that the worst-peak load current is a constant square waveform with a frequency of \(\beta\), while the worst-area load current is a piecewise-defined function. The segment before 499.983\(\mu s\) is a constant square waveform with a frequency of \(\beta\). The segment between 499.983\(\mu s\) and 500\(\mu s\) is determined by the solution of Eq. 4.15. Fig. 4.7(b) demonstrates the voltage response waveform for the worst-peak and the worst-area noise. Fig. 4.7(c) compares the voltage noise area of worst-peak and worst-area response under the same targeted window size \(T = 17\text{n}\mu s\).

![Figure 4.7: Load current, voltage noise and voltage area of the worst-case peak and area of a standard RLC tank model, \(T = 17\text{n}\mu s\), (Nominal voltage 1V is superimposed in (b) and (c)).](image-url)
We apply the waveforms between 499.9\(\text{us}\) and 500.1\(\text{us}\) from Fig. 4.7(b) as the supply voltages for the datapath used in the previous subsection. The delay of the datapath under constant 1V is 16.2\(\text{ns}\). For the delay measurement, we send the input pulse every 100\(\text{ps}\) and record the delay at the output port as shown in Fig. 4.8. (Exp. 1 means that the input pulse starts at 499\(\text{us}\). Exp. 1000 means that the input pulse starts at 500\(\text{us}\).) Simulation results show that the maximum delay under the worst-area supply noise is 17\(\text{ns}\), while the maximum delay under the worst-peak supply noise is 16.9\(\text{ns}\). Our results confirm that the worst-area noise causes a worse circuit delay compared to the worst-peak noise.

![Figure 4.8: The delay of the datapath under the worst-area and worst-peak noise of a standard RLC tank model (\(T = 17\text{ns}\))](image)

### 4.5.3 Worst-Area and Worst-Peak Noise of Multi-Stage Cascaded RLC Tanks

We use a multi-stage cascaded RLC tanks to model a complete PDN path. We study three multi-stage cascaded RLC tank PDN cases to compare the results from Theorem 3 and Algorithm 1. The circuit diagram of three cases are shown
in Fig. 4.9 and the parameters are listed in Table 4.1.

The multi-stage cascaded RLC tank can be decomposed into multiple single RLC tank circuits in different frequency regions. (An example is given to show Case I in Table 4.1) are decomposed into three RLC tanks in Fig. 4.10.

Each tank contributes to a portion to the worst-peak and the worst-area noise. By applying Theorem 3 and Claim 5 in [73], we calculate the noise contribution of each tank and estimate the global noise peak and area as shown in Table 4.2. The RLC tank decomposition method provides a quick prediction on the worst area and peak noise from impedance profile directly. However, it tends to
overestimate the voltage peak noise and voltage noise area due to the cancellation effect between neighbouring tanks. We observe a relatively large estimation error for Case II, which is because the impedance peaks of its first two tanks are close to each other. On average, the prediction error of RLC tank prediction method is 7.75% for the worst-peak noise and 12.3% for the worst-area noise.
Figure 4.11: The impedance profile of a complete PDN path
Table 4.2: Comparison of the worst-case noise prediction between the RLC tank decomposition method and Alg. 3 results. $T = 10\text{ns}$ for $A_w$.

<table>
<thead>
<tr>
<th>Cases</th>
<th>Tank 1</th>
<th>Tank 2</th>
<th>Tank 3</th>
<th>Tank 1,2 Valley</th>
<th>Tank 2,3 Valley</th>
<th>Total Est.</th>
<th>Alg. 3</th>
<th>err(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Case I $V_{\text{peak}}(V)$</td>
<td>0.1592</td>
<td>0.1263</td>
<td>0.1742</td>
<td>-0.008</td>
<td>-0.005</td>
<td>0.4467</td>
<td>0.4151</td>
<td>7.23%</td>
</tr>
<tr>
<td>Case I $A_w(V \times \text{ns})$</td>
<td>1.592</td>
<td>1.263</td>
<td>0.1366</td>
<td>-0.08</td>
<td>-0.05</td>
<td>2.8616</td>
<td>2.571</td>
<td>11.3%</td>
</tr>
<tr>
<td>Case II $V_{\text{peak}}(V)$</td>
<td>0.1614</td>
<td>0.0838</td>
<td>0.2406</td>
<td>-0.023</td>
<td>-0.012</td>
<td>0.4508</td>
<td>0.4050</td>
<td>11.31%</td>
</tr>
<tr>
<td>Case II $A_w(V \times \text{ns})$</td>
<td>1.614</td>
<td>0.838</td>
<td>0.7206</td>
<td>-0.23</td>
<td>-0.12</td>
<td>2.8226</td>
<td>2.363</td>
<td>19.45%</td>
</tr>
<tr>
<td>Case III $V_{\text{peak}}(V)$</td>
<td>0.0678</td>
<td>0.1047</td>
<td>0.1397</td>
<td>-0.016</td>
<td>-0.011</td>
<td>0.2852</td>
<td>0.2724</td>
<td>4.70%</td>
</tr>
<tr>
<td>Case III $A_w(V \times \text{ns})$</td>
<td>0.678</td>
<td>1.047</td>
<td>0.300</td>
<td>-0.16</td>
<td>-0.11</td>
<td>1.755</td>
<td>1.653</td>
<td>6.17%</td>
</tr>
</tbody>
</table>
4.5.4 Critical Path Delay under Worst Noise Area Fluctuation: a Test Case

We study the worst-area noise ($T = 12.5\text{ns}$) of a complete PDN path and the maximum detapath delay under the worst-area noise from a industrial design. The board model is extracted from Cadence Allegro Sigrity Power SI 16.6 and the package model is extracted from Ansoft Q3D 12.0. A fine on-die power grid model is used to simulated the die. The impedance profile of the complete PDN is shown in Fig. 4.11.

Plugging the impedance profile and $T$ into Algorithm 1, the worse-peak and worst-area voltage response are shown in Fig. 4.12. Because the voltage droop of the complete PDN path is slightly high under our maximum current assumption (1(A)), we increase the nominal voltage to $1.15(V)$. Simulation results show that the worst-peak noise is $1.15 - 0.7779 = 0.3721(V)$ and the worst noise area $A_w$ is $1.15(V) \times 12.5(\text{ns}) - 12.21(V \times \text{ns}) = 2.165(V \times \text{ns})$. 
Figure 4.12: The worst-peak and worst-area current, voltage response and voltage area response ($T = 12.5\, \text{ns}$) of a complete PDN path. (d-f) shows the expanded view of (a-c) at the peak droop point.
The datapath extracted from C432 of ISCAS85 is slightly modified for the new window size by removing some circuitry. The results of delay measurement are shown in Fig. 4.13. We observe $0.22\text{ns (1.8\%)}$ extra delay for the worst-area noise for this complete PDN path case. The comparison of the worst-area and worst-peak noise of this case are listed in Table 4.3.

![Graph showing delay under worst-area and worst-peak supply noise for a complete PDN path ($T = 12.5\text{ns}$)]

**Figure 4.13**: The delay under worst-area and worst-peak supply noise for a complete PDN path ($T = 12.5\text{ns}$)

<table>
<thead>
<tr>
<th></th>
<th>Worst-Peak</th>
<th>Worst-Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>Max Voltage Area ($V^*\text{ns}$)</td>
<td>1.695</td>
<td>2.165</td>
</tr>
<tr>
<td>Delay of Datapath (ns)</td>
<td>12.33</td>
<td>12.55</td>
</tr>
</tbody>
</table>

**Table 4.3**: Comparison of the worst-peak and the worst-area noise for a complete PDN path ($T = 12.5\text{ns}$)

### 4.6 Summary

In this chapter, we predict the worst-case voltage noise area and measure its impact on the circuit performance. We propose an analytical solution for RLC tank cases and an algorithm to find the worst-case current generation for general
PDN cases. Our study shows that circuit delay is better correlated with the worst-area noise than the worst-peak noise. The former introduces on 1.8% additional propagation delay than the latter from our empirical validation under a complete PDN path.

Chapter 4, in full is a reprint of the material as it appears in "Worst-Case Noise Area Prediction of On-chip Power Distribution Network", by Xiang Zhang, Jingwei Lu, Yang Liu, and Chung-Kuan Cheng in Proceedings of ACM/IEEE International Workshop on System Level Interconnect Prediction 2014. The thesis author was the primary investigator and author of the paper.

S1. Proof of Optimality on the Phase Delay of the Worst-Case Current

The worst-case current $i_w(t)$ is a binary-valued function switching between 0 and 1. Based on this assumption, we prove that our algorithm could generate the optimum phase delay $t_k$ for every step input $s_k(t)$, such that the superposition equals $i_w(t)$, as Theorem 4 shows. Fig. 4.5 shows that our algorithm determines $t_k$ by the peak-to-valley distances in $A_s(t)$. Thus our target is to prove the correctness of Eq. (4.17), which is equivalent to the optimality of our algorithm as Theorem 4 shows.

$$A_w = \sum_{i=0}^{m-1} A_s(t_{p_i}) - \sum_{i=0}^{m-2} A_s(t_{v_i})$$

(4.17)

where $t_{p_i}(t_{v_i})$ represents the $i^{th}$ elements of peaks(valleys). We prove the optimality by sequentially introducing the following lemmas. In the rest of the section, we assume $i_w(t)$ is decomposed into $N$ step inputs.
Lemma 6 \( \exists \{x_0, x_1, \ldots\} \), s.t. \( A_w = \sum_{k=0}^{N-1} (-1)^k A_s(x_k) \)

Proof 4 Based on Eq. 4.4, we can have \( i_w(t) \) decomposed into \( N \) step inputs with constant amplitude \( \pm 1 \). Positive step inputs alternate with negative step inputs. Without loss of generality, suppose that the first step input is positive, and we have \( i_w(t) = \sum_{k=0}^{N-1} (-1)^k s(t - t_k) \). Let \( x_k = t_w - t_k \) and we have the lemma proved.

Lemma 6 shows that worst-case noise \( A_w \) equals the sum of a set of functional values sampled on \( A_s(t) \), each with alternative sign of \( \pm 1 \). Let \( X = \{x_0, x_1, \ldots, x_{N-1}\} \). As \( A_w = \max_{i,t} A(i,t) \), we need to maximize the amount of positive components in \( A_s(x_k) \) while minimize negative components, which leads to Lemma 7.

Lemma 7 \( A_s(x_0) \) and \( A_s(x_{N-1}) \) must be positive.

Proof 5 We prove this by contradiction. Suppose that the sign of \( A_s(x_0) \) is negative. We can simply remove \( x_0 \) from \( X \) thus reduce \( |X| \) to \( N - 1 \). Meanwhile, \( A_w \) will be increased by \( A_s(x_0) \), which contradicts to the fact that \( A_w \) is maximum. As a result, we can prove that \( A_s(x_0) \) is positive. The proof to the fact that \( A_s(x_{N-1}) \) is positive can be obtained in the similar way.

Lemma 7 shows the boundary conditions for \( A_s \) on \( X \). We divide \( A_s(t) \) into a series of uphill and downhill regions.

Definition 1 An uphill region (downhill region) corresponds to an interval on \( A_s(t) \) with monotonically increasing (decreasing) functional values.

As Figure 4.14 shows, each uphill region is sandwiched by two downhill regions, vice versa. Suppose that there are \( m_p \) peaks and \( m_v \) valleys in \( A_s(t) \), thus totally there are \( m = m_p + m_v \) locally extreme points. The two end points of an
uphill (downhill) regions are peak and valley (valley and peak), respectively. As a result, there are totally $m - 1$ regions on $A_s(t)$. For the $j^{th}$ region $r_j$, we have $r_j = [t_{pv_j}, t_{pv_{j+1}}].$

**Figure 4.14:** Downhill region $r_{j-1}$ is sandwiched by peak $pv_{j-1}$ and valley $pv_j$. Uphill region $r_j$ is sandwiched by valley $pv_j$ and peak $pv_{j+1}$, etc.

**Lemma 8** $\forall j \in [0, m - 1], \exists k \in [0, N - 1], s.t. t_{pv_j} = x_k.$

**Proof 6** We prove this by contradiction. Suppose that there is no $x_k$ in $X$ which equals the index of the $j^{th}$ extreme point $pv_j$. Without loss of generality, let us make the following assumptions.

- Suppose that $pv_j$ is a valley, which is sandwiched by two regions $r_{j-1}$ and $r_j$, as Figure 4.14 shows.

- Suppose that $x_k$ is the sampling point which is the closest to $t_{pv_j}$, and $x_k > t_{pv_j}$. Thus we have $t_{pv_j} \in (x_{k-1}, x_k)$.

- Suppose that $x_{k-1}$ corresponds to a negative step input $s_{k-1}(t)$, while $x_k$ corresponds to a positive step input $s_k(t)$.

We divide all possible local sampling cases in the two neighboring regions of $pv_j$, $r_{j-1}$ and $r_j$, into two categories.
• If \( x_{k-1} \in r_{j-1} \), we can shift \( x_{k-1} \) rightwards to \( t_{pv_j} \), thus increase \( A_w \) by \( A_s(x_{k-1}) - A_s(t_{pv_j}) \), which contradicts to the fact that \( A_w \) is maximum.

• If \( x_{k-1} \notin r_{j-1} \), there must be no sampling point at \( pv_{j-1} \). We can increase \( A_w \) by adding one positive point at \( pv_{j-1} \) and one negative point at \( pv_j \), without changing the sign of any previous sampling points. This also contradicts to the fact that \( A_w \) is maximum.

Here we get the proof based on the above assumptions. As our proof and assumptions are general, the proofs for other conditions can be obtained in a similar way (e.g., \( pv_j \) is a peak, \( x_k \) corresponds to a positive step input \( s_k(t) \), etc.) and are ignored here.

We define \( X_j \) to be the cluster of sampling points located in \( r_j \). The two boundary points, \( t_{pv_j} \) and \( t_{pv_{j+1}} \), are also included in \( X_j \). Suppose that \( X_j \) is an uphill region, we define the noise area contribution of \( r_j \) to \( A_w \) as

\[
A_{jw} = \sum_{k=t_{pv_j}}^{t_{pv_{j+1}}} A_s(x_k).
\]

**Lemma 9** \( A_w \) is maximum only if \( A_{jw} \) is maximum, \( \forall j \in [0, m - 1] \).

**Proof 7** The proof is straightforward. As both \( t_{pv_{j-1}} \) and \( t_{pv_j} \) are included in \( X_j \) according to Lemma 8, we can only select or deselect the internal sampling points of \( r_j \), which is independent with other regions. As a result, \( X_j \) is an optimum substructure for \( X \), and we have Lemma 9 proved.

Based on Lemma 9, we only need to conduct local maximization of \( A_{jw} \) on each \( X_j \), and a global maximization of \( A_w \) is achieved, as Eq. (4.18) shows.

\[
A_w = \sum_{j=0}^{m-1} A_{jw} - \sum_{j=1}^{m-2} A_s(t_{pv_j})
\] (4.18)
Lemma 10 $A_w^j$ is maximum when $X_j = \{t_{pv_{j-1}}, t_{pv_j}\}$.

Proof 8 We illustrate our proof in Fig. 4.15. Assume that there are $n'$ sampling points in $X_j$ where $X_j = \{x'_0, x'_1, \ldots, x'_{n'-1}\}$ in ascending order. From Lemma 8 we know that $x'_0 = t_{pv_{j-1}}$ and $x'_{n'-1} = t_{pv_{j-1}}$. Therefore, $n' = |X_j|$ is an even number, as $X_j$ starts from a negative sampling point and ends at a positive point.

\[
A_w^j = \sum_{k=0}^{n'-1} (-1)^{k+1} A_s(x'_k) \\
= \sum_{k=1}^{\frac{n'-1}{2}} (A_s(x'_{2k-1}) - A_s(x'_{2k})) + A_s(t_{pv_{j+1}}) - A_s(t_{pv_j}) \\
\leq A_s(t_{pv_{j+1}}) - A_s(t_{pv_j}) \quad (4.19)
\]

The last step of Eq. (4.19) holds because $r_j$ is an uphill region with monotonically increasing functional values. Therefore, we have $A_s(x'_{k_1}) \leq A_s(x'_{k_2})$, $\forall 0 \leq k_1 <
$k_2 \leq (n' - 1)$. From Eq. 4.19 we have $A'_{w} \leq A_{s}(t_{p_{v_{j+1}}}) - A_{s}(t_{p_{v_{j}}})$, which proves the lemma.

Based on all the above proved lemmas, we finally obtain the following equation which proves Eq. (4.17) thus Theorem 4 and shows that our algorithm is optimum.

$$A_{w} = \sum_{j=0}^{m_{p}-1} A_{s}(t_{p_{j}}) - \sum_{j=0}^{m_{v}-1} A_{s}(t_{v_{j}}) = \sum_{k=0}^{N-1} (-1)^{k} A_{s}(x_{k})$$

(4.20)
Chapter 5

Enhancing Off-Chip Communication Throughput from Power Lines

This chapter presents power line communication (PLC) to reuse some of the power pins as dynamic power/signal pins for data transmissions to increase the off-chip bandwidth during SOC low performance state. The number of available pins in ball grid array (BGA) for modern system-on-chips (SOCs) is one of the major bottlenecks to the performance of the processors, for example many-core enabled Internet of Things (IoT) devices, where the package size and PCB floorplan are tightly constrained. A commercial SOC package allocates more than half of the pins for power delivery, resulting in less available IO pins for signaling. We observe that the requirement for the number of power and ground (P/G) pins is driven by the highest performance state and the worst design corners, while SOCs are in lower performance state for most of the time for battery life and thermal considerations. The proposed method provides 15Gbps additional bandwidth per hybrid pin pair,
while providing minimum impact to the original power delivery network (PDN) design.

5.1 Background

As the silicon technology continues to shrink deep into sub-micron region, the requirement for performance and bandwidth increases. Meanwhile, the package size of SOCs remains similar as more functions are added to the silicon die and PCB manufacturing technology has been moderately improved, e.g., BGA ball to ball pitches are reduced from 0.4mm in 2012 to 0.3mm in 2016 in industry [57], while on-die technology node shrinks from 28nm planar silicon technologies to 10nm Finfet. Thus, the gap between off-chip bandwidth and on-chip bandwidth becomes even larger.

Many researchers in industry and academia have been working on addressing the limitations of the off-chip communications. 3D die-stacked technology [10, 14] and Package on Package (POP) [71] method have been proposed to expand the communication bandwidth from Z-axis. The concerns of those methods are 1) the cost increase on die and PKG manufacturing due to additional complexity for 3D integration, 2) heat accumulation (thermal issue) [48] and 3) Z-height constraints on the 3D integrations, as the-state-of-the-art mobile devices and laptops are very strict on PCB thickness for user experience. Proximity communication [76, 20] is investigated to improve off-chip communication through capacitive communication from package to package proximity, which requires advanced DFM (design for manufacturing) PCB rule for precise pick and placement to control the variance of the capacitance. Engin and Swaminathan proposed power transmission line [22], which eliminates the signal line and uses power net as a transmission line
Table 5.1: Ball allocation for a commercial SOC [1, 2]

<table>
<thead>
<tr>
<th>CPU</th>
<th>GPU</th>
<th>Core</th>
<th>MEM</th>
<th>PAD</th>
<th>PLLs</th>
<th>MISC Pwrs</th>
<th>GND</th>
<th>etc.</th>
</tr>
</thead>
<tbody>
<tr>
<td>28</td>
<td>36</td>
<td>40</td>
<td>53</td>
<td>38</td>
<td>35</td>
<td>56</td>
<td>283</td>
<td>425</td>
</tr>
</tbody>
</table>

for the signal. However, the idea requires strict rules on power net layout and is limited by point to point communication. The shape of the actual PDN plane is layout-dependent, which is hard to achieve controlled impedance on the power planes. Meanwhile, the IR drop for power transmission line increases linearly when the current scales up. Chen et al. proposed to increase off-chip bandwidth for DRAM access by using switchable pins [15, 16]. The proposed architecture is to dynamically explore the surplus pins for power delivery in the memory intensive phases for providing extra bandwidth for memory/IO access. The implementation requires four external switches per switchable pin and greatly increases the layout complexity on PCB. Zhang et al. demonstrated the feasibility to implement single channel data communication on PDNs [74].

The state-of-the-art SoC comes with multiple performance modes for power savings and performance balance on multiple voltage domains. The difference of the voltage margins among various performance states can be as large as 500mV for mobile application processors (APs), so the PDN requirement can be significantly different. Traditionally, all the system-level PDN design and analysis are based on the highest performance mode (or the worst-case), resulting in more than enough power/ground pins allocated for the normal mode. Table 5.1 shows a package ball allocation for a commercial AP, where 57.24% out of 784 BGA balls are used for power delivery, leaving less than 40% pins for off-chip communications.

In this chapter, we propose a hybrid power and signal pins method to serve for power delivery and signal communication depending on the performance state, where we extend the previous work in [72] to support multi-channel data
transmission on a PDN simultaneously. The proposed architecture increases the off-chip communication bandwidth, while maintaining no additional cost to the system level design. Our study shows that the communication bandwidth can be greatly improved by adding notches on PCB power planes and using separate package bump/ball connections for hybrid pins.

5.2 Design Overview

The proposed PLC reuses some power pins of core voltage rails for data communications in low performance state, leaving only a few dedicated power pins connected to the on-die power grid to meet the PDN requirement. The design target for PLC is to have the least modifications on the existing layout, while minimizing the coupling noise of data communication and power delivery noise, and maximizing the eye diagram and bit rate for data transmission. Considering that the PDN specification is usually in a range from DC to 2GHz, our targeted data communication frequency range is set to 2GHz to 50GHz.

Two types of SOC power pins are addressed for PLC. One is for the low current voltage domain for the dedicated macro blocks, such as for IO physical layers (PHYs), e.g. MIPI [5] and USB/HSIC [6], or for the noise sensitive rails, such as analog voltage for cameras or PLLs. Those rails are only powered upon request, and usually consume one ball/bump per rail, which are categorized as PLL rails in Table 5.1. Therefore, those pins for data transmission can be re-purposed in certain states. The benefit is that the dedicated traces on board and package are already allocated and the current requirement is small (in mA range). Thus, only a small head switch needs to be added to enable this function on the die level.

The other type is for high current voltage domains, e.g., CPU, GPU, core
logic and memory rails, which consume multiple power and ground pins for each rail and tie together on PCB/PKG through planes. Those rails usually support multiple performance states and the number of P/G pins is targeted for the highest performance state. Nevertheless, SOCs are in lower performance state for most of the time to save power. As a result, we propose to reuse some of the power pins (balls and bumps) for data communication in low performance state, leaving only a few dedicated power pins connected to the on-die power grid to meet the PDN requirement.

In this chapter, we focus on the design and analysis of PLC for high current rails, as it consumes the majority of power pins and can be a major resource to improve off-chip communication bandwidth. Figure 5.1 illustrates the high-level diagram of the proposed PLC. There are four main components, including voltage regulator module (VRM), SOC, off-chip driver/receiver and decoupling capacitors. VRM provides voltage for the SOC rails. Off-chip drivers/receivers communicate to SOC die through differential hybrid ball/bump pairs. As data communication and power delivery share the same conductors (copper) on PCB, we propose to use differential signaling (two pins for each channel) to minimize the noise to the dedicated power pins. The common mode voltage of the differential signals is set to the corresponding nominal voltage of the power plane. A layout modification is also required for package design to support PLC, where separate connections on package are made for the dedicated power bumps/balls and the hybrid ones. There are two operational modes for the hybrid pins, namely signal mode and power mode. In signal mode, the on-die power switches are turned off and the hybrid pins are used for off-chip communications. In power mode, the switches are turned on and the hybrid pins are connected to the main power grid. With the design challenge of the data communications of differential pairs and power
Figure 5.1: High-level overview of the proposed power line communication (PLC) on PDN.

delivery share the common conductors, part of PDN margin is compensated for better eye diagram.

5.2.1 On-Die Implementation

Figure 5.2 depicts the schematic of the proposed on-die circuitry of a differential hybrid power/signal pair, which is a modified circuitry from [15]. Two power switches are needed for each pair. $R_{dson}$ of the switch can be as low as $1.8\, \text{m}\Omega$ at an area overhead of $2601\, \mu m^2$ [16].

In high performance (power) mode, both power switches are turned on, and the hybrid pins are connected to the main power rails. In signal mode, the power switches are turned off and the signal buffers are enabled in one direction according to the read/write operations. A multi-stage buffer can be placed for signal lines to amplify I/O signals to compensate the parasitic capacitance of the switch. A tunable on-die termination (ODT) resistor is provided for better signaling. A
Continuous-Time Linear Equalizer (CTLE) is added to improve the eye diagram of the receiver. The design and performance of the CTLE will be discussed in the following sections.

The impact of parasitic capacitance \(C_{gs}, C_{gd}, C_{bs}\) and \(C_{bd}\) of switch is considered as the main limiting factor for the eye height of signal mode as the drain of switches is shorted to power grid. While total parasitic \(C\) can be reduced by decreasing the size of transistors, \(R_{dson}\) increases which weakens the function of hybrid pins during power mode. The capacitance breakdown is shown in Figure 5.3.

In signal mode, since \(V_{gs} = 0\) in cut-off region, \(C_{gs} = C_{gd} = 0\) and \(C_{gb}\) is

\[
C_{gb} = C_0/2 = WL \cdot C_{ox}
\]  

(5.1)

where \(C_{ox}\) is the capacitance per unit area of the gate oxide, \(L\) and \(W\) are the channel length and width, respectively.

The diffusion capacitance between source (drain) and body contributes parasitic capacitance across the the depletion region.

\[
C_{sb} = AS \cdot C_{jsb} + PS \cdot C_{jbssw}
\]

\[
C_{jsb} = C_J(1 + \frac{V_{sb}}{\Psi_0})^{-M_J}
\]  

(5.2)

\[
C_{jbssw} = C_{JSW}(1 + \frac{V_{sb}}{\Psi_{SW}})^{-M_{JSW}}
\]

where \(C_{jbs}\) (\(C_{jbssw}\)) is the capacitance of the junction between body and the bottom (side walls) of the source, \(C_J\) and \(C_{JSW}\) are the junction capacitance at zero bias, \(M_J\) and \(M_{JSW}\) are the junction grading coefficient, and \(\Psi_0\) and \(\Psi_{SW}\) depend on the doping levels.

Similar parasitic capacitance applies to drain as well, dependent on \(AD\), \(PD\) and \(V_{db}\). Equivalent relationships hold for both PMOS and NMOS transistors.
with different doping levels. It should be also noted that capacitances are voltage-dependent.

Our studies show that by adding series resistors on gate of the switches can minimize the impact of the gate capacitance. However, the capacitance between source and drain of the switch cannot be compensated by series resistors, which is translated to DC resistance in power mode.

5.2.2 Package Implementation

The proposed PLC requires a package layout change on power delivery to improve SI of the hybrid pin in signal mode. Figure 5.4(a) shows the bumps and balls connections of power rails on a original package layout (Z-axis is enlarged for better illustration). A solid power fill on Layer 3 connects all the PWR bumps and balls through vias. For the modified package for PLC, a dedicated trace/plane is cut from the original power plane to connect each hybrid bump/ball pair, which
creates the void areas on Layer 3 (1x minimum trace width spacing) as shown in Figure 5.4(b). The dedicated PWR bumps and balls are connected through a smaller plane on Layer 3. With the dedicated traces for each hybrid bump/ball pair, the differential signals can pass through packages with less attenuation. However, the additional void area on the power plane increases the parasitic inductance and resistance of PDN.

### 5.2.3 PCB Implementation

Figure 5.5 shows a four-layer PCB layout for PLC. The top and bottom layers are solid ground planes. The off-chip driver/receiver (P1-P4) with two channels and a 14x14mm SOC are located on Layer 1. In the center of top-left region, there are 7x7 P/G balls in checkerboard pattern allocated for a single power domain to minimize the loop inductance. Among them, four leftmost power pins are for two pairs of hybrid power pins for PLC, which connects to the SOC package balls,
Figure 5.4: A four-layer package (a) with the original shared power plane (b) with separate power planes for dedicated and hybrid pins for PLC.

Figure 5.5: An overview of the four-layer PCB test layout model for PLC.

while the rest balls are defined as dedicated power pins for noise observations. The port definition for S-parameter model is highlighted on Layer 1. All the following simulations follow this port definition. Layer 3 is allocated for signal transmission and Layer 2 is defined as the power plane.

When the off-chip driver sends out the differential signal from P1 and P2 for CH1, signal first travels through a Layer 1 to 3 via to the trace on Layer 3. The differential traces are loosely coupled on Layer 3 and the trace width is set to meet 100ohms of differential impedance \(Z_{\text{diff}}\). The main power plane stays on Layer 3. The differential signal traces and power plane are connected through
micro-vias from Layer 2 to 3. P5, P6, P7, P8 and other dedicated power pins are connected to the power plane through micro-vias from Layer 1 to 2. Therefore, P1 and P2, P5 and P6 are the PCB to package interfaces of the PLC between off-chip driver and SOC for CH1. VRM (P9) is not shown in the figure. To improve SI for data communication, layout modifications are needed on Layer 2 to isolate the current loop for dedicated and hybrid power pins, which will be discussed in the next section.

5.2.4 PCB Model Analysis

PCB model provides the most flexibility for PLC design. In this section, we use a theoretical four-layer PCB model (Figure 5.6 ) to analyze the PCB design methodology for PLC to better illustrate the idea. The top and bottom layers are solid ground planes (yellow). The off-chip driver/receiver and VRM are located at left and right side on Layer 1. SOC is located at the center of the board. The port definition for S-parameter model is highlighted on Layer 1. All the following simulations follow this port definition. The two leftmost power pads (P1 and P2) are represented as the off-chip driver/receiver. The two leftmost power pads (P3 and P4) are the differential pins for hybrid pair which connects to the SOC package balls, while the rest pads are defined as dedicated power pins for noise observations. For every power and signal pin, a companion ground pad is provided for the return loop. The differential signal traces are loosely coupled on Layer 2 and the differential impedance ($Z_{diff}$) is set to 100ohms. The main power plane (red) stays on Layer 3. The differential signal traces and power plane are connected together through micro-vias from Layer 2 to 3. P3, P4,... and P10 are connected to the power plane through micro-vias from Layer 1 to 3. Therefore, P1 and P2, P3 and P4 are the PCB to package interfaces of the PLC between off-chip driver
Three notches are placed to Layer 3 to help improve SI of data transmission. We will study the SI/PI impact of location and size of those notches in the following sections. The PCB stackup is shown in Figure 5.7.

### 5.3 Signal Integrity Investigation for PCB Model

In this section, the optimization of PCB layout model for PLC is studied from the model in Figure 5.6. The goal is to maximize the magnitude and bandwidth of differential forward voltage gain \(S_{dd21}\) from P1 and P2, to P3 and P4 on PCB. There are a few parameters that can be tuned on the layout. Figure 5.8 shows an expanded view of power plane on Layer 3. Our layout studies
Figure 5.7: The stackup of the test PCB layout.

are based on Mentor Expedition, Ansys Siwave and HFSS 2014, Sigri##ty 16.61 and Advanced Design System 2013.12. An Intel Xeon W3550 processor with 20GB memory computer is used for layout extractions and simulations.

The marked parameters in Figure 5.8 are the major factors for SI and PI. The width of the power fill \((w)\) is set to 5.2mm to mimic a typical power plane for a mobile device layout. Parameters are defined as \(a=\) the length of two side notches, \(b=\) the length of the middle notch, \(d=\) the distance from the edge of the side notch to the center of the middle notch and \(e=\) the distance between the two differential (hybrid) pins. The width of the notch \((w_n)\) and the length of via \((l_{via})\) have been studied as well. Considering the signal wavelength \((\lambda)\) of 50GHz on a microstrip is 3mm, the signal discontinuity caused by any layout change under \(\lambda/20\) (0.15mm) can be omitted. The minimum \(w_n\) is determined by PCB vendors, which is 50um under current technology. The maximum length of signal via \(l_{via}\) is 0.14mm according to the stackup. The length of the differential signal traces \((l_{trace})\) on PCB Layer 2 has also been examined. Simulation results show that as long as the \(Z_{diff}\) is controlled impedance, no SI difference is observed with different \(l_{trace}\) (Assuming a typical PCB size for mobile devices).
5.3.1 Middle Notch Effect

In this section, we keep the middle notch between two hybrid differential power/signal pins and remove all the rest notches as Figure 5.9 shows. All the other PCB layers remain no change. From Figure 5.9(a) to 5.9(d), we monotonically decrease the length of the middle notch (parameter b). In Figure 5.9(e), the middle notch is totally removed. $S_{dd21}$ are measured from off-chip driver (P1 and P2) to the SOC hybrid pins (P3 and P4) at PCB level for the five test cases. Figure 5.10 shows $S_{dd21}$ of the five layouts.

We observe that the first valley of $S_{dd21}$ moves towards higher frequency with a deceasing middle notch length. The corresponding wavelength ($\lambda$) of the first valley is equal to the average electrical length from one hybrid pin to the other. Eq 5.3 shows the relationship between the frequency of first valley ($f_{valley}$) and b.

$$f_{valley} = \frac{c}{\sqrt{\varepsilon_r} \frac{1}{2b + \alpha \cdot w}}$$  \hspace{1cm} (5.3)

where c is the speed of light, and $\varepsilon_r = 4.4$ is the dielectric constant of PCB. $\alpha$ is
a coefficient equal to 1.662 for the four notch cases. Table 5.2 shows the relation of \( b \) versus the calculated \( f_{\text{valley}} \) from Eq. 5.3 and from HFSS results for the four notch cases. In general, the mismatch of the calculation and simulation results is within 2%.

<table>
<thead>
<tr>
<th>Case</th>
<th>( b ) (mm)</th>
<th>( f_{\text{valley}} ) from Eq. 5.3</th>
<th>( f_{\text{valley}} ) from HFSS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 5.9(a)</td>
<td>13.08</td>
<td>4.110GHz</td>
<td>4.104GHz</td>
</tr>
<tr>
<td>Figure 5.9(b)</td>
<td>6.64</td>
<td>6.525GHz</td>
<td>6.506GHz</td>
</tr>
<tr>
<td>Figure 5.9(c)</td>
<td>3.14</td>
<td>9.586GHz</td>
<td>9.910GHz</td>
</tr>
<tr>
<td>Figure 5.9(d)</td>
<td>1.51</td>
<td>12.27GHz</td>
<td>12.51GHz</td>
</tr>
</tbody>
</table>

Two conclusions can be drawn from this experiment. 1) \( b \) determines the cut-off frequency of PLC. The longer \( b \) is, the lower \( f_{\text{valley}} \) is. 2) The magnitude of \( S_{dd21} \) gets reduced if the middle notch is too short or removed.

### 5.3.2 Surrounding Notch Effect

In this section, six test cases are studied with the same \( b \) and different surrounding notches as shown in Figure 5.11. The first four test cases focus on varying the length of side notches (parameter \( a \)). Case 4 and 5 focus on reducing parameters \( d \) and \( e \) compared to Case 3.

Figure 5.12 shows \( S_{dd21} \) for the above six cases. We observe that all six cases have a much wider bandwidth and higher gain compared to the previous five cases in Figure 5.10. Simulation results shows that as long as \((a > b)\), there is no significant impact on \( S_{dd21} \) by tuning parameters \( a \), as \( f_{\text{valley}} \) of Layout (0)-(3) are almost overlapping. We also notice that Case 5 has the largest bandwidth and the highest gain as its smallest parameter \( d \) among the six cases, which can be explained as follows. 1) The average electrical length is reduced due to the surrounding notches, which causes the signal to be transmitted in a more concentrated manner. 2) The
Figure 5.9: Five PCB test cases with different length of the middle notch on Layer 3.
Table 5.3: The length of the side notch vs $f_{valley}$

<table>
<thead>
<tr>
<th>Case</th>
<th>d (mm)</th>
<th>$f_{valley}$ from Eq. 5.4</th>
<th>$f_{valley}$ from HFSS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 5.9(a)</td>
<td>0.677</td>
<td>34.204GHz</td>
<td>34.13GHz</td>
</tr>
<tr>
<td>Figure 5.9(b)</td>
<td>0.526</td>
<td>38.125GHz</td>
<td>38.75GHz</td>
</tr>
<tr>
<td>Figure 5.9(c)</td>
<td>0.400</td>
<td>42.124GHz</td>
<td>41.74GHz</td>
</tr>
</tbody>
</table>

characteristic impedance of the power plane is increased due to the surrounding notches, thus reducing the reflection of the impedance mismatches from trace in Layer 2 to plane on Layer 3. 3) By adding side notches, Eq 5.3 can be modified as,

$$f_{valley} = \frac{c}{\sqrt{\epsilon_r}} \frac{1}{2b + \alpha \cdot 2 \cdot d}$$

(5.4)

where w is replaced by 2*d, $\alpha \approx \sqrt{2}$, and b=1.13(mm). Other parameters are the same as Eq 5.3. The difference of $f_{valley}$ between Eq. 5.4 and simulations is less than 2%.
Figure 5.11: Six PCB test coupons with different size of the surrounding notches.
Figure 5.12: $S_{dd21}$ of the six test cases in Figure 5.11.
5.3.3 Analysis of PCB Model with industrial SOC Package Footprint

In industrial SOC package footprint, creating an artificial notch on PCB can greatly decrease the performance of PDN. However, in Figure 5.5 we observe that the beauty of checkerboard pattern between P/G balls can naturally create the needed notches for PLC. By decreasing the via connections to the power plane for the hybrid pins, the bandwidth and peak of $S_{dd21}$ can be greatly increased as shown in Figure 5.13. Eq 5.5 shows the expressions of the frequency for the peak $S_{dd21}$. The peak magnitude of $S_{dd21}$ is proportional to $1/C$, where $C$ is parasitic capacitance of the two pins.

Figure 5.13: $S_{dd21}$ of two channels from the original and the modified power plane.
Table 5.4: Power pin impedance change for PLC

<table>
<thead>
<tr>
<th>Items</th>
<th>CH1 two pins</th>
<th>CH2 two pins</th>
</tr>
</thead>
<tbody>
<tr>
<td>Original Pin Resistance (mΩ)</td>
<td>3.01, 3.16</td>
<td>3.11, 3.23</td>
</tr>
<tr>
<td>Modified Pin Resistance (mΩ)</td>
<td>6.24, 6.37</td>
<td>6.17, 6.27</td>
</tr>
<tr>
<td>Original Pin Inductance (fH)</td>
<td>327.45, 340.37</td>
<td>347.59, 340.05</td>
</tr>
<tr>
<td>Modified Pin Inductance (fH)</td>
<td>1211.0, 1268.8</td>
<td>1101.2, 1155.9</td>
</tr>
</tbody>
</table>

\[
f_{peak} = \frac{c}{\sqrt{\varepsilon_r}} \frac{1}{2D} \tag{5.5}\]

where \(c\) is the speed of light, \(\varepsilon_r = 4.4\) is PCB dielectric, and \(D\) is electrical distance between two hybrid pins.

5.4 Power Delivery Network Analysis

The PDN overhead of PLC comes from PCB, package and die level. Table 5.4 shows the pin resistance and inductance change between Figure 5.13(a) and Figure 5.13(b) from PCB level. With the pin changed to support PLC, effective DC resistance and inductance are increased by 3mΩ and 700fH.

The PDN design overhead from PKG is caused by the separate power planes for each hybrid ball and bump pair. Depending on the different design and the selection of the location hybrid pair, the PDN impedance peak increase varies. The minimum pitch in state-of-the-art package design is 10um. Since power plane is 350um wide between two neighboring BGA assignment, the increase of resistance and inductance is 3%. Figure 5.14 shows a package layout where we cut the original whole power plane into five separate pieces to accommodate two hybrid ball and bump pairs. Light (green) is for ground and dark (red) is for power. Four separated planes are cut from the main power planes to accommodate each hybrid pin. The
upper side vias are drilled down the balls and the bottom side via are connected up to the bumps. During the design, we intentionally select the balls and bumps from the outside ring of the power plane in order to minimize side effect to PDN.

The on-die PDN overhead is from the extra power switches for each bump of the hybrid pin. Larger size of the power switch increases silicon area overhead and input capacitance for PLC, thus degrading power performance of PLC. Under PTM 22nm HP model [7], we select $R_{dson}$ for each switch to be $120m\Omega$ with an area overhead of $781\, \mu m^2$.

### 5.5 PLC to PDN Noise Mitigation Analysis

As PLC shares the same conductor with PDN, we need to consider the noise coupled from the hybrid pins to the dedicated power pins in signal model. Conversely, the PDN noise to the data communication is less of a concern, as differential signaling is designed to cancel out the common mode noises. Layout 1 in Figure 5.12 is used for this study. As the PCB layout is symmetrical with respect to the middle notch, only the noise on the upper half of the power plane is
studied. The noise probe points are displayed in Figure 5.15. The max differential peak-to-peak voltage from the aggressors (hybrid pins) is set to 1V.

Two cases of the coupling noise are studied. 1) Off-chip driver is transmitter and SOC hybrid pair is receiver. 2) Off-chip driver is receiver and SOC hybrid pair is transmitter. The maximum absolute noise value observed at each probe point is listed in Table 5.5. We compare the noise results without on-board decoupling capacitors (decaps) and with four 0.01uF decaps connected at P9, P14, P18 and P22. Without decaps, P9 observes the worst noise because it has the maximum difference of the distance to the positive and negative pins (P3 and P4). The middle probe points (P6, P11, P15 and P19) have the lowest noise because their distance to differential hybrid pins is relatively the same. The coupling noise to a probe point can be greatly reduced by adding a small 0.01uF decap. We observe a higher voltage noise when SOC hybrid pins are transmitters. This analysis shows that decaps for the dedicated power pins can substantially minimize the coupling noise from PLC.

5.6 Case Study: A Complete Power Delivery and Data Communication Path

The effect of the on-die circuit to the performance of hybrid pins in signal mode is discussed in this section. Firstly, we demonstrate PLC on a complete PDN from PCB, package to die by investigating the eye diagram for pseudo-random binary sequence (PRBS) bit streams during the signal mode. The noise immunity of PLC is analyzed. Secondly, the impedance profile of the complete PDN path with PLC in power mode is demonstrated. The PCB model is in Figure 5.13(b). Package model is a modified industrial model for our proposed PLC. Die model is
Figure 5.15: The probe points for the noise coupled from the data transmission of hybrid pins to dedicated power pins.

Table 5.5: The maximum coupling noise at each probe point

<table>
<thead>
<tr>
<th>Probe Point</th>
<th>SoC RX</th>
<th></th>
<th>SoC TX</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>No decap</td>
<td>4 decaps</td>
<td>No decap</td>
<td>4 decaps</td>
</tr>
<tr>
<td>P6</td>
<td>5mV</td>
<td>17mV</td>
<td>8mV</td>
<td>20mV</td>
</tr>
<tr>
<td>P7</td>
<td>44mV</td>
<td>40mV</td>
<td>74mV</td>
<td>48mV</td>
</tr>
<tr>
<td>P8</td>
<td>53mV</td>
<td>30mV</td>
<td>96mV</td>
<td>43mV</td>
</tr>
<tr>
<td>P9</td>
<td>64mV</td>
<td>34uV</td>
<td>111mV</td>
<td>34uV</td>
</tr>
<tr>
<td>P11</td>
<td>3mV</td>
<td>13mV</td>
<td>5mV</td>
<td>17mV</td>
</tr>
<tr>
<td>P12</td>
<td>24mV</td>
<td>18mV</td>
<td>36mV</td>
<td>24mV</td>
</tr>
<tr>
<td>P13</td>
<td>34mV</td>
<td>18mV</td>
<td>61mV</td>
<td>26mV</td>
</tr>
<tr>
<td>P14</td>
<td>43mV</td>
<td>8.4uV</td>
<td>79mV</td>
<td>9.5uV</td>
</tr>
<tr>
<td>P15</td>
<td>4mV</td>
<td>11mV</td>
<td>6mV</td>
<td>14mV</td>
</tr>
<tr>
<td>P16</td>
<td>14mV</td>
<td>12mV</td>
<td>25mV</td>
<td>15mV</td>
</tr>
<tr>
<td>P17</td>
<td>23mV</td>
<td>15mV</td>
<td>40mV</td>
<td>16mV</td>
</tr>
<tr>
<td>P18</td>
<td>30mV</td>
<td>15uV</td>
<td>53mV</td>
<td>14uV</td>
</tr>
<tr>
<td>P19</td>
<td>6mV</td>
<td>9mV</td>
<td>7mV</td>
<td>13mV</td>
</tr>
<tr>
<td>P20</td>
<td>11mV</td>
<td>11mV</td>
<td>17mV</td>
<td>13mV</td>
</tr>
<tr>
<td>P21</td>
<td>16mV</td>
<td>10mV</td>
<td>27mV</td>
<td>14mV</td>
</tr>
<tr>
<td>P22</td>
<td>21mV</td>
<td>26uV</td>
<td>35mV</td>
<td>22uV</td>
</tr>
</tbody>
</table>
5.6.1 Eye Diagram for Signal Mode

Figure 5.16 illustrates the schematic of the data path for one hybrid pair. PRBS is generated by the off-chip driver and connected to PCB through Port 1 and 2. Manchester code is used for self clock recovery and to avoid long '1' or '0' for DC offset. Port 5 and 6 are defined as PCB interface to hybrid pins to the package model. Port 10 to 13 are the dedicated power pins. Port 9 is connected to a local PCB decap with 2.2μF to represent a typical output capacitor of VRM for CPU/GPU rails. No decap is connected to the hybrid pin path for signal communication. By looking into $S_{dd21}$ (from off-chip driver to SOC) of the combined PCB/PKG/die circuitry model, we determine 15Gbps as the bit rate for PLC using 30GHz signals with Manchester code.

![Schematic for data communication on a PDN.](image)
Figure 5.17: Eye diagram of a 30GHz (with Manchester code) PLC (a) without equalizer, (b) with equalizer.

Figure 5.18: The transfer function of the receiver equalizer.
Figure 5.17(a) shows the eye diagram of the signal received at the die level without or with the equalizer. The eye height is limited by the parasitic capacitance of the power switches from source to drain, because the drain of switches is short to the main power grid on the die level.

After investigating the transfer function of the channel, we designed a passive CTLE as shown in Figure 5.18 to improve the eye diagram. The equation of the CTLE can be expressed as follows.

\[ H(s) = \frac{k(s - z)}{(s - p_1)(s - p_2)} \quad (5.6) \]

where \( k = 4.43982e+11 \), \( z = 6.2832e+07 \), \( p_1 = 6.2832e+10 \) and \( 3.7699e+11 \), which is equivalent to the following circuit as shown in Figure 5.19.

**Figure 5.19:** The circuit diagram of the receiver equalizer.

Figure 5.17(b) shows that the eye height was recovered three times larger.
Figure 5.20: Receiver eye diagram after equalization with near-end and far-end noise source from power plane.

The receiver can use simple peak-detectors and latch to regenerate the signal back to the original waveform. A positive going pulse is detected by the positive peak-detector. When it crosses the positive voltage threshold (+Vth), it sets the latch output to logic high. The output remains high until a negative pulse crosses the negative threshold (-Vth), of the negative peak-detector, and resets the latch to logic low.

The noise immunity of PLC is also investigated by injecting two noise sources at P3 (far-end) and P7 (near-end) individually. The noise source is a sine-wave at 30GHz with an amplitude of 100mV or 500mV. Figure 5.20 shows that PLC is immune from most of the noise sources on power plane and a near end noise can be barely observed at 500mV amplitude due to P/N phase skew.
The use case when both channels transmit at 30GHz with PRBS Manchester Code simultaneously is studied. The receiver eye diagram of CH1 and CH2 with or without equalizers are shown in Figure 5.21. It can be concluded that the multiple data channels can be run on a single PDN simultaneously with negligible inter-channel noise under the proposed architecture.

5.6.2 PDN Analysis for Power Mode

The PDN during high performance mode is studied when all the hybrid pins turn on the power switches to connect with the dedicated power pins. Figure 5.22 shows the schematic of the original PDN and the modified PDN with one pair of hybrid pins. There is no notch on PCB and package on the original PDN.
Figure 5.22: Schematic for the original PDN without hybrid pins and the modified PDN with one pair.
The modified PDN is the same package and PCB model used in Figure 5.16. We assume the same value of PCB/PKG/die decaps for both cases. Figure 5.23 shows the impedance profile of the original and modified PDN for PLC. The PDN degradation is contributed by PCB and package, as a $5m\Omega$ higher impedance peak at the lowest frequency resonance is observed.

It can be inferred that with additional hybrid pairs added to PDN, the impedance peak could be further increased. Simulation results also show that increasing capacitors value on package and die can compensate the impedance peak jump due to hybrid pins. However, this would bring additional cost for the system design. As a result, designers should make judgment based upon the PDN design target and the off-chip bandwidth requirement to decide how many hybrid pairs to be added to a voltage rail.
5.7 Summary

In this chapter, we propose power line communication on industrial PDN for system-level SOC design. The bandwidth of each PLC channel can be as much as 15Gbps. Multiple power and ground pairs can be supported. The layout features and technologies to optimize the eye diagram of power line communication and maintain the PDN function are identified. The proposed PLC can substantially increase the off-chip bandwidth by re-purposing hybrid pins for signal transmission during IO-intensive benchmark.

Chapter 5, in part is a reprint of the material as it appears in "Enhancing Off-Chip Communication Throughput from Power Lines", by Xiang Zhang, Yang Liu, Ryan Coutts, and Chung-Kuan Cheng, which is submitted to IEEE Transactions on Components, Packaging and Manufacturing Technology and currently under review. This chapter also contains the content from "Boosting Off-Chip Interconnects through Power Line Communication", by Xiang Zhang, Ryan Coutts, and Chung-Kuan Cheng in Proceedings of IEEE Conference on Electrical Performance Of Electronic Packaging and Systems EPEPS 2016 and the content from "Power Line Communication for Hybrid Power/Signal Pin SOC Design", by Xiang Zhang, Yang Liu, Ryan Coutts, and Chung-Kuan Cheng in Proceedings of ACM/IEEE International Workshop on System Level Interconnect Prediction 2015. The thesis author was the primary investigator and author of the papers.
Chapter 6

Boosting Off-Chip Interconnects through Inter-Package Capacitive Proximity Communication

The chip to chip spacing for the state of the art electronic designs has been reduced due to the advances of design for manufacturing (DFM) technologies. In this chapter, we propose Inter-Package Capacitive Proximity Communication (IPCPC) to increase off-chip communication through the metal plate on the side wall of the chip packaging. We demonstrate IPCPC can transmit 20Gbps data on each channel and provide noise immunity to the coupling noise from adjacent channel.

6.1 Background

"Memory Wall", *a.k.a.*, the disparity between the rate of core performance improvement and the relatively stagnant rate of off-chip memory bandwidth, keeps
growing even larger. More and more transistors can be designed onto a single chip due to the advances of process node from 28nm planar silicon technologies to 7nm Finfet. Meanwhile, the package size of SOCs remains similar as more functions are added to the silicon die and PCB manufacturing technology has been moderately improved, e.g., BGA ball to ball pitches are reduced from 0.4mm in 2012 to 0.3mm in 2016 in industry [64, 57]. Future consumer electronic designs, including internet of things (IoT) devices, robotics, self-driving and mobile devices, require low latency and high bandwidth off-chip communications for memory access and sensor data analysis.

Researchers has been working on exploring new technologies onto system-level design to increase bandwidth, such as silicon photonics [28], wireless [70], 3D integration and System-in-Package(SiP) [64]. However, none of the methods comes without additional design for manufacturing (DFM) cost and risk, i.e. thermal, process variance and reliability. Switchable pins [16] for SOC have been proposed to dynamically allocate power pins for off-chip memory access at a cost of additional on-board external switches, bringing the extra cost to bill of materials and large PCB area overhead. Power line communication (PLC) [72] is proposed to transmit signals from power delivery networks(PDN).

In this chapter, we propose Inter-Package Capacitive Proximity Communication (IPCPC) to boost off-chip communication through the metal plates on the side wall of the package. Previous work, i.e., Proximity [20] and Capacitive [49] Communication, has been proposed to enabling chip-to-chip capacitive communication from top or bottom side of the chip. Such proposed architecture weakens the mechanical structure of the chip, which is a major reliability concern for drop and torsion test. Our proposed method originated from the teardown of smartphone [8], where we observed that DFM rule for package to package separation
is only 0.1mm, which enables the off-chip communication from the side wall of the packaging and brings no change to the mechanical structure of the package. Simulation results that IPCPC with $0.04mm^2$ parallel plates can support 20Gbps per channel bandwidth.

6.2 Design Overview
Figure 6.1: High-level overview of the proposed Inter-Package Capacitive Proximity Communication (IPCPC)
Figure 6.1 shows the high level diagram of IPCPC. The chip manufactured for IPCPC resembles to a traditional BGA flip chip, with the addition of array of metal plates exposed at the side walls for the package, e.g., four metal plates in this sample, forming four data channels to the adjacent chip. To increase the capacitance, underfill (UF) is applied to increase relative dielectric constant. The conventional UF is made of bisphenol A or bisphenol F epoxy resin to enhance the reliability of a flip chip on PCB by redistributing the thermomechanical stress between the silicon chip and PCB substrate. Typical epoxy resin structure used in UF can be found in [64], with a dielectric constant in a range of 3.8 to 4.2. On-die transmitter on one chip transmits the signal to metal plate through bumps, package buried vias, on-package trace, package buried vias and wire bonding to the surface of the metal plate, which AC coupled to the receiver chip. Same structure and channel connection is manufactured at the receiver side as well.

6.2.1 Capacitor Model Analysis

The capacitance model for IPCPC is shown in Figure 6.2. For the middle channel, $C_{26}$ is the parallel plate capacitance between two middle plates. $C_{12}$, $C_{23}$, $C_{56}$ and $C_{67}$ are the capacitive crosstalk to adjacent channels, and $C_{12} = C_{23} = C_{56} = C_{67}$ assuming the diameter of plates on two chips are the same. $C_{bg}$ is the bottom side of the plate to PCB ground plane capacitance. $C_{sg}$ is the total side-wall of the plate to PCB ground plane. $C_{26}$, can be calculated by the classic parallel plate capacitance, as shown in Eq. 6.1.

$$C_{\text{plate}} = \frac{\varepsilon_0 \varepsilon_k A}{d \text{is}}, \quad (6.1)$$
Figure 6.2: High-level overview of the capacitor model for IPCPC

where \( \varepsilon_0 \) is permittivity of free space, \( \varepsilon_k \) is dielectric constant of the material between the plates, \( A \) is the area of the plate and \( d \) is spacing between the plates. Similarly, \( C_{12} \) and \( C_{bg} \) can be estimated by Eq. 6.1. \( C_{sg} \) is a inclined plate capacitor, studied in [69], which can be extracted from Ansys Electronics Desktop Q3D/HFSS 2017. \( d \) is the distance between the plates from the two chips. \( d = 0.1mm \), following the state of the art DFM rules. \( \varepsilon_k \approx 4 \). The thickness of plates is \( t = 10um \) and the spacing between two neighboring plates on the same chip is \( p \geq 50um \), \( b \) is the distance from bottom edge of the plate to PCB ground plane. According to the above design parameters, we can estimate that \( C_{26} > 10 \times C_{12} \).

Fig. 6.3 shows \( C_{26} \) and \( C_{sg} + C_{bg} \) as a function of \( d \). Capacitance is extracted at 10GHz and \( b \) is assumed to be 0.57mm. The trend of extraction is well correlated to calculation of \( C_{26} \) from Eq. 6.1. The extraction capacitance is slightly larger due to the effect of edge E-field of the plate, which is not considered in Eq. 6.1. Figure 6.3(b) shows that plate to ground plane capacitance slightly increases as \( d \)
Figure 6.3: (a) Plate to plate capacitance $C_{26}$ vs $d$. (b) Plate to ground capacitance $C_{sg} + C_{bg}$ vs $d$.

increases.

Fig. 6.4 shows $C_{26}$ and $C_{sg} + C_{bg}$ as a function of $b$. It can be concluded that $C_{26}$ is proportional to the area of the plate. $C_{26}$ also slightly increases as $b$ increases. $C_{sg} + C_{bg}$ initially drops quickly when $C_{bg}$ is dominant, which is inversely proportional to $b$.

Figure 6.4: (a) Plate to plate capacitance $C_{26}$ vs $b$. (b) Plate to ground capacitance $C_{sg} + C_{bg}$ vs $b$. 
6.2.2 Manufacturing Tolerance

$C_{26}$ is dependent on manufacturing tolerance, as two chips (parallel plates) can be inclined ($\pm5^\circ$) or $d$ can be $\pm10\%$. From Fig. 6.4 we observe that $C_{26}$ variance is less than $\pm10\%$ as $d = 0.1\text{mm} \pm 10\%$. Meanwhile, the change of $C_{sg} + C_{bg}$ is less than $\pm5\%$.

6.3 Performance Analysis

Channel performance of the data communication is analyzed in this section. Assuming four metal plates as shown in Fig. 6.1, we build the model in ANSYS HFSS 2017 and simulate the 20GHz data communication eye diagram in Advanced Design System (ADS) 2013. The IO parasitics is extracted from IBIS model from Xilinx Virtex 7 [9]. $d = 0.1\text{mm}$, $p = 50\text{um}$ and $\varepsilon_k = 4.2$. The simulation setup is shown in Fig. 6.5. $S_{15}$, $S_{26}$, $S_{37}$ and $S_{48}$ are denoted for CH1, 2, 3 and 4, respectively. Data is transmitted at 20GHz.

6.3.1 The Size of the Metal Plate

The relation of channel signal integrity and the size of the plate are studied. The receiver eye diagram of the signal and crosstalk observed at the neighboring channel are shown in Fig. 6.6. Two sizes of plate are discussed. $b = 0.67\text{mm}$ in the model. Increasing the area of the plate from $0.04\text{mm}^2$ to $0.09\text{mm}^2$ can increase the eye height by 56%. Meanwhile, the signal to crosstalk ratio also decrease from 6.4 to 4.9. Since a larger plate area comes at a cost of reduced total parallel plates that can be used for IPCPC, it is designer’s discretion based on the requirement of bandwidth, transceiver, receiver, etc.
6.3.2 The Distance from Metal Plate to PCB GND Plane

The relation of channel signal integrity and the size of the plate are studied. Plate size is set to $0.3 \times 0.3 mm^2$. Fig. 6.7 shows that as $b$ increases, $C_{bg} + C_{sg}$ reduces, and channel signal integrity improves.

6.3.3 Transmitter Drive Strength (DS)

We further change drive strength (DS) $R_1$ from 33Ω (Fig. 6.6(c)), to 20Ω (Fig. 6.8(a)) and 50Ω (Fig. 6.8(b)). Plate size is set to $0.2 \times 0.2 mm^2$. We observe a large reflection at receiver with strong DS, and a reduced eye height margin with weak DS, which can be effectively used as a nub to improve signal integrity of IPCPC. It should be noted that Channel equalization, crosstalk compensation, coding and negative impedance control (NIC) can also be utilized to improve the channel performance of IPCPC.

Figure 6.5: Simulation setup for channel performance for IPCPC.
Figure 6.6: Eye diagrams for signal and crosstalk observed at receiver and neighboring channel. (a) Signal for $0.3 \times 0.3\text{mm}^2$ plate, (b) Crosstalk for $0.3 \times 0.3\text{mm}^2$ plate, (c) Signal for $0.2 \times 0.2\text{mm}^2$ plate, (d) Crosstalk for $0.2 \times 0.2\text{mm}^2$ plate.

6.4 Summary

We propose IPCPC, a novel off-chip data communication method through inter-package capacitive coupling at a bandwidth of 20GHz per channel. The future work will focus on improve channel performance, increase the communication density of IPCPC by considering multiple rows of plate to plate communications.

Chapter 6, in full is a reprint of the material as it appears in “Boosting Off-chip Interconnects through Inter-Package Capacitive Proximity Communication”, which is in preparation for IEEE Conference on Electrical Performance Of Electronic Packaging and Systems 2017, by Xiang Zhang, Dongwon Park and
Figure 6.7: Eye diagrams for signal with different $b$. (a) $b = 0.07\, \text{mm}$, (b) $b = 0.57\, \text{mm}$.

Figure 6.8: Eye diagrams for signal with different source drive strength (DS). (a) $R_1 = 20\, \text{ohm}$, (b) $R_1 = 50\, \text{ohm}$.

Chung-Kuan Cheng. The thesis author was the primary investigator and author of the paper.
Chapter 7

Conclusion

7.1 Summary of Contributions

In this dissertation, we study the system level PDN design and analysis, including worst-case PDN noise and prediction of single and cascaded RLC tanks, as well as PDN applications in timing analysis, leakage analysis, power line communications and capacitive communications. The contributions of this study are listed as follows.

Chapter 3 defines the ratio $\gamma$ of the worst-case voltage noise and the maximum impedance of PDNs. The RLC tank models in real PDN structures are analyzed and the general method to calculate the worst-case noise in LC tank is discussed. The closed-form expressions of the worst-case noise in standard LC tanks are shown with theoretical upper boundary is proved. We demonstrate that $\gamma$ in a complete PDN path can be greater than 1. We propose methods to predict the worst-case noise of the complete PDN path through cascaded LC tank model, and to calculate the PDN noise of a RLC tank model with voltage-dependent leakage resistance $R_{\text{leak}}(v(t))$ considered. We demonstrate the relation
of the optimal resistor value of RLC tank and leakage resistance $R_{\text{leak}}$.

Chapter 4 proposes a prediction of the worst-case noise area of the supply voltage on the power distribution network (PDN). First, we discuss the impact of the voltage noise area on the circuit performance and compare it with that of the peak voltage noise. Second, we study the closed-form expression of the worst noise area of a RLC tank case. Third, we develop an algorithm to generate the worst-case current stimulus for general PDN systems in $O(n)$ time. Last, we investigate the circuit delay under a complete PDN path and design experiments to validate our methods.

Chapter 5 presents a differential power line communication (PLC) model to reuse some of the power pins as dynamic power/signal pins for data transmissions to increase the off-chip bandwidth during SOC low performance state. The proposed architecture increases the off-chip communication bandwidth, while maintaining no additional cost to the system level design. Key design parameters are identified to optimize the performance for PLC, and the parasitic capacitance of the power gating switches to the performance of data communication is studied. The theoretical model for receiver channel equalization is utilized to improve signal integrity. The noise immunity of PLC is investigated with multi-channel data transmission simultaneously. The peak impedance change of PDN contributed by the implementation of hybrid pins is investigated.

Chapter 6 demonstrates Inter-Package Capacitive Proximity Communication to boost off-chip communication through the metal plates on the side wall of the package at a bandwidth of 20GHz per channel. First, the details of modeling and 3D extraction for the proposed architecture is demonstrated. Second, the

---

$^1$Here $n$ refers to the vector length of the discretized impulse response of the PDN system. Full worst-case voltage waveform requires additional convolution of system impulse response and worst-case current, for which the total time complexity is $O(n\log(n))$. 
performance and design tunable trade-off is discussed. Third, signal integrity and noise immunity to adjacent channel is studied.

7.2 Future Work

In state of the art circuit system design, PDN is usually over-designed with lots of redundancies (multiple caps, balls and bumps) and guard band considered for the worst-case voltage noises. One potential future direction is to define better metrics for real case PDN design and improve our proposed prediction method to save PDN design overhead to other functions in system integration. We have also been working on time-variant PDN components to dynamically mitigate the PDN noise by using adjoint network. Meanwhile, more design parameters can be added to model as well, such as the voltage derating for discrete capacitors, SOC thermal throttling impact to PDN, temperature variant on-die leakage resistance model and the global optimization for multiple voltage domain PDN design.

The other research topic is to scope out the advanced technologies to improve the performance of PLC and balance the tradeoffs to PDN while focusing on the further optimization of the on-die circuitry to minimize the impact from the parasitic capacitance of the power switches. Another direction is to use multiple rows of metal plates for high density chip to chip capacitive coupling communications.
Bibliography


