## UC Santa Cruz UC Santa Cruz Previously Published Works

### Title

Multi-Frequency Resonant Clocks

### Permalink

https://escholarship.org/uc/item/1fh3466x

### **Authors**

Guthaus, Matthew Lacara, Benjamin Lin, Ping-Yao

### **Publication Date**

2015-05-01

Peer reviewed

# Multi-Frequency Resonant Clocks

Benjamin M. LaCara, Ping-Yao Lin, Matthew R. Guthaus Department of Computer Engineering, University of California, Santa Cruz {blacara, plin11, mrg}@ucsc.edu

*Abstract*—Clock distribution networks consume a significant portion of total chip power in high-performance designs. Resonant clocks are one proposed method to lower this power in modern designs as well as a fewer required clock buffers. Recent resonant solutions are limited to optimal performance at one particular frequency which is problematic since dynamic frequency scaling is often used to lower overall system power. This paper introduces the first scheme to produce a clock distribution network with a tunable resonant frequency. Experimental results show the resonant frequency ranges from 1.2GHz to 2.6GHz while saving up to 41% of the power on the clock distribution network when compared to the non-resonant distribution.

#### I. INTRODUCTION

Power consumption continues to be a major concern for ASIC designs. Recent trends towards mobile applications have underscored the need for more power efficient devices. It has been shown that on-chip clock distribution networks (CDN) draw up to 70% of total chip power [1]. The majority of this is due to dynamic switching of sequential elements across the whole chip. Strategies such as clock gating, power gating, dynamic voltage scaling, dynamic frequency scaling and multiple threshold voltages have been used to lower static and dynamic power. Along with these, resonant approaches have been explored to reduce power costs by recycling energy on the CDN.

Previous resonant clock approaches include standing wave oscillators [2], rotary/salphasic clocks [3], [4] and resonant inductor-capacitor (LC) tanks [5]–[10]. Standing wave resonant clocks result in a constant phase but their amplitude varies depending on placement in the CDN. Conversely, rotary resonant clocks provide consistent amplitude with phase which varies depending on from where the clock is tapped. LC tanks ideally provide constant magnitude and phase which is similar to non-resonant clocks and allows for similar methodologies.

This paper is the first to:

- Explore the automated design of the first dynamically (i.e. run-time) tunable, global, resonant clock mesh in an ASIC design methodology.
- Analyze the results using high-quality passive inductor models on representative industrial clock distributions.

Section II provides a background on resonant theory, passive inductor models and mesh automation. Section III discusses our design methodology including design choices and tradeoffs of our models. Section IV showcases our experimental results and Section V concludes the paper.



Fig. 1. (a) Basic LC tank circuit contains a series component with L and  $C_s$ , and a parallel component with L and  $C_d$ . (b) Inductor metal trace measurements are square with a length no greater than  $100\mu m$ .

#### II. BACKGROUND

#### A. Resonant Theory

An LC tank is an inductor and capacitor connected in series or parallel. When an oscillating signal is applied to this circuit, an exchange of energy occurs between the magnetic field of the inductor and the electric field of the capacitor. A resonant frequency is when the inductor and capacitor experience equal reactance in series or in parallel and provide either a zero or infinite impedance. This frequency can be computed by equating inductor and capacitor impedance and solving for frequency as

$$f = \frac{1}{2\pi\sqrt{LC_s}} \tag{1}$$

where L is the size of the inductor and  $C_s$  is the clock load capacitance.

An LC oscillator will swing between positive and negative voltages which is undesirable given CMOS logic levels of 0 and  $V_{dd}$  volts. This issue is addressed by biasing the circuit with a decoupling capacitor ( $C_d$ ) at the grounded end of the inductor as shown in Figure 1(a) [5]. This capacitance must be sufficiently large to cleanly separate the series and parallel resonant frequencies by satisfying

$$\frac{1}{2\pi\sqrt{LC_d}} \ll \frac{1}{2\pi\sqrt{LC_s}}.$$
(2)

In reality, LC tank circuits are RLC circuits due to parasitic resistances in both inductors and capacitors along with clock distribution wires. At resonance, the capacitive and inductive components cancel out and leave only the resistive component. Minimizing these parasitics and the parasitics in the clock network are essential for power efficiency.



Fig. 2. (a) Inductor models include series resistances and parallel capacitances. (b) Parasitic effects of a passive inductor model in an LC tank circuit when compared to an ideal inductor in the same circuit.

#### III. DESIGN METHODOLOGY

#### A. Inductor Design

On-chip inductors are best implemented on top-level metal layers due to their low parasitic resistance. They may be single or multi-layered, square or octagonal, and may include a grounded metal layer beneath the spiral or not [11]. Each of these options affects the total amount of inductance and the quality factor (Q) of the inductor. For simplicity, we are using single-layered square inductor models with no ground plane, much like the one shown in Figure 1(b). Here, n represents the number of full turns, w is the width of the metal traces, s is the spacing between turns, and l is the length of the side.

The Q of an inductor determines the amount of magnetic energy it can store and is  $Q = \omega L/R$ . In resonant clocks, a large Q allows us to more efficiently store energy but it is less able to deal with process variation and frequency mismatch due to a narrow bandwidth. Conversely, a low Q withstands process variation at the cost of less energy efficiency.

Previous works have had Q values around 3.5-3.8 at 4GHz with areas within  $100\mu m^2$  [12], [13]. Our designs are created using Sonnet EM Suite [14] and modeled in HSPICE like Figure 2(a). We used simulated annealing to optimize the *l*, *w*, *s* and *n* values for our models for optimal area and Q. The models include parasitic capacitances and resistances. All models have a Q of roughly 8 at 1.5GHz within  $100\mu m^2$ .

Figure 2(b) shows magnitude and phase plots of the characteristic impedance of two individual LC tank circuits. One has an ideal inductor while the other uses a passive inductor model of the same value. The general behavior is clearly the same with trends towards positive and negative infinity in the magnitude and the  $180^{\circ}$  shift in the phase showing the change from capacitive to inductive characteristics. The bandwidth is increased with the passive model due to the parasitic resistance and capacitances it presents.

#### B. LC Tank Design

Our LC tank circuit design is shown in Figure 3(a) which includes a PMOS and NMOS in parallel as a pass gate to enable/disable the tank circuit. We use transistor sizes of  $0.95\mu m$  for PMOS and  $0.63\mu m$  for NMOS. This sizing ratio and circuit are influenced by AMD and Cyclos [12], [13]. Both transistors are sized as small as possible while maintaining full signal swing in transient analysis.



(b) The characteristic impedance with varying LC tank passgate sizes.

Fig. 3. (a) A passgate, pg, allows LC tank circuit being connected / disconnected from the CDN. (b) With too small pg, signal attenuates and savings are lost. With too large pg, frequency range is smaller and less flexible.

Figure 3(b) shows the characteristic impedance of the same CDN with four different global passgate sizes. Each line represents a CDN, where all LC tank passgate sizes are the same. In this, we can see that sizing the passgate to be too small will lead to significant signal attenuation which negates any benefits the LC tanks could hope to provide. Sizing the passgate to be too large leads to smaller power savings and eventually less frequency shifting flexibility.

The clock end of the LC tanks are connected to the clock mesh so that the LC tanks are connected in parallel to ground. Inductors in parallel reduce their total inductance according to  $L_{total} = L/n$ , where L is inductor size and n is the number of LC tanks. When combined with Equation 1, the mesh's resonant frequency will increase when more LC tanks are attached to the CDN.

To reach a target frequency during design, we can change the value of the inductor, the number of included LC tanks, or both. Each of our LC tanks fit within the same area so they can be swapped at no cost while honing in on one particular frequency. The number of tanks would be a parameter set by the designer, for example, utilizing a certain percentage of chip area for inductors, or only having space on specific portions of the chip [7], [15].

Once the LC tanks are placed, the decoupling capacitance is not changed regardless of how many LC tanks are attached or detached. The reason is because all of our decoupling capacitors are in parallel and capacitors in parallel add up as  $C_{total} = C_d \cdot n$ . If we put the number of tanks in Equation 1 for the decoupling capacitance, we see that the total number of inductors and the capacitors cancel according to

$$f_{decap} = \frac{1}{2\pi \sqrt{\frac{L}{n} \cdot (C_d \cdot n)}}.$$
(3)

This means that total number of tanks does not influence the



(a) Mesh buffers are driven by identical transient sources.

(b) LC tanks are evenly distributed across the mesh.

Fig. 4. Mesh buffers and LC tanks are independently placed throughout the mesh to form a resonant clock mesh.

decoupling resonant frequency.

#### C. Automated Clock Mesh Design

Our overall clock design is a clock mesh due to their resilience to both process and environmental variations and because they are common in high-performance designs. Mesh buffers are distributed in parallel on the mesh while a buffered clock tree drives the mesh buffers from a single source. For our experiments, we drive the mesh buffers with an ideal transient clock as illustrated in Figure 4(a).

Our experiments are done on a uniform mesh so mesh steps between all vertical and horizontal wires are equal [16]. The mesh step is the single largest contributor to the power consumption of the clock mesh as it controls the density of the mesh. To allow the easiest distribution, our LC tanks are only attached to mesh nodes, referred as *LC nodes*. Our LC nodes are evenly distributed across the CDN regardless of how many tanks are implemented in the design, for example, every even column by even row or every third column by third row. This is done for simplicity and is illustrated in Figure 4(b), showing a mesh with six LC tanks attached.

The LC tanks are turned off/on in groups to maintain an even distribution. For example, turning off every-other-even LC tank in an every-even distribution, and so on. Depending on the number of tanks enabled or disabled, we attain a different resonant frequency since the inductance changes while the clock capacitance is constant. Control signal for these groups are controlled by the same logic as a dynamic frequency controller and will simultaneously change a system PLL.

#### IV. EXPERIMENTAL RESULTS

#### A. Experimental Setup

Our design methodology and circuits are evaluated on the 2010 ISPD clock synthesis benchmarks [17]. We perform three case studies since these are representative of real industrial designs from Intel and IBM. The benchmark sizes range from  $91mm^2$  with 2249 sinks to  $1.7mm^2$  with 981 sinks. The clock mesh is synthesized using high-performance synthesis technique [16] in a custom synthesis tool written in C++ and using 45nm technology parameters.

#### B. Connecting and Disconnecting All LC Tanks

In Figure 5(a), we first compare the magnitude and phase of the clock mesh at a representative node when connecting and disconnecting 16 LC tanks in a benchmark. An interesting observation is that the disconnected LC tanks are still coupled through the disabled passgate which shows a parasitic resonance. This effect is more pronounced when there are large



(a) The magnitude and phase of the characteristic impedance changes when the tanks are connected or disconnected from the mesh.

(b) The magnitude and phase of the characteristic impedance change slightly at distances further away from a clock sink.

Fig. 5. Simulation using a clock mesh with 16 LC tanks confirms correct operation.

numbers of disconnected LC tanks. This is not a problem, however, because the disconnected mesh resonant frequency is more than a gigahertz above the connected mesh resonant frequency.

#### C. Effects Across the CDN in a Single State

Given parasitics of the CDN and LC tanks, we expect the resonant frequency and the resonant magnitude to be different from mesh node to mesh node. Figure 5(b) illustrates that this is not an issue in either department, and shows the characteristic impedance at a series of mesh nodes, increasingly farther away from a reference clock sink. We see a small difference in both the magnitude and the resonant frequency. The frequency drift is from 2.64GHz to 2.61GHz (1.27%) and the magnitude drift is from 52.89db to 55.08db (3.96%). The decoupling resonant frequency also varies as the distance grows, but this is not an issue as the resonant frequencies are still sufficiently far apart.

#### D. Wide-Range Dynamic Frequency and Savings

Figure 6 shows a wider range frequency shift using more configurations of LC tanks. All measurements are taken from the same representative mesh node. The result shows a frequency range of 1.39GHz between the state with all 36 LC tanks on and with only 7 LC tanks on.  $C_d$  remains constant across all states and is sufficiently far from our lowest state of resonance. Meanwhile, the parasitic resonance of disconnected LC tanks remains over one gigahertz away from the all-on state.

Each configuration saves power over a buffered clock. The power savings is measured by comparing the average power draw on a CDN with LC tanks ran at the optimal resonant frequency, referred to as "resonant power," to the same CDN at the same frequency with no LC tanks built into the design, referred to as "baseline power." We observed that the total power saved declines as the operating frequency decreases due to the increased parasitics between the fewer remaining connected LC tanks. This effect is seen with the declining magnitude in Figure 6 and the corresponding decrease in power savings in benchmark 5 of Table I. However, the resonant clock still attains a 39.8% reduction in power for only the 6.1% chip area costs on this particular benchmark.

TABLE I. POWER IS SAVED BY RESONANT CLOCKS COMPARED TO BASELINE NON-RESONANT CLOCKS ET EVERY FREQUENCY.

| ISPD 2010 Benchmark 5 |            |          |          |         |       | ISPD 2010 Benchmark 3 |          |          |         |       | ISPD 2010 Benchmark 1 |          |          |         |  |
|-----------------------|------------|----------|----------|---------|-------|-----------------------|----------|----------|---------|-------|-----------------------|----------|----------|---------|--|
| Tanks                 | Operating  | Baseline | Resonant | Savings | Tanks | Operating             | Baseline | Resonant | Savings | Tanks | Operating             | Baseline | Resonant | Savings |  |
| (#)                   | Freq.(GHz) | Pow.(mW) | Pow.(mW) | (%)     | (#)   | Freq.(GHz)            | Pow.(mW) | Pow.(mW) | (%)     | (#)   | Freq.(GHz)            | Pow.(mW) | Pow.(mW) | (%)     |  |
| 36                    | 2.65       | 202.8    | 122.0    | 39.8    | 30    | 2.2                   | 252.7    | 148.8    | 41.1    | 168   | 2.34                  | 1144.0   | 668.7    | 41.5    |  |
| 28                    | 2.18       | 162.0    | 103.7    | 35.9    | 24    | 1.85                  | 253.5    | 162.2    | 36.0    | 134   | 2.04                  | 969.7    | 681.3    | 29.7    |  |
| 21                    | 1.91       | 137.9    | 96.6     | 30.0    | 18    | 1.48                  | 213.0    | 169.1    | 21.0    | 100   | 1.75                  | 953.6    | 719.2    | 24.6    |  |
| 14                    | 1.63       | 135.3    | 109.0    | 19.4    | 12    | 1.14                  | 159.0    | 137.3    | 13.7    | 66    | 1.43                  | 788.9    | 687.4    | 12.9    |  |
| 7                     | 1.26       | 137.3    | 119.2    | 13.2    | 6     | 0.87                  | 156.6    | 143.6    | 8.3     | 33    | 1.09                  | 613.0    | 579.4    | 5.5     |  |
| 0                     | 3.72       | 233.0    | 151.9    | 34.8    | 0     | 3.58                  | 405.3    | 236.8    | 41.6    | 0     | 3.61                  | 1339.0   | 907.9    | 32.2    |  |



Fig. 6. ISPD benchmark 5 shows a representative behavior of the resonant frequency adjusting as LC tanks are disconnected.

Benchmark 3 of Table I is approximately a fourth of the size of benchmark 5. We found that the number of necessary tanks did not scale linearly with chip area. This leads to choosing a dense distribution, which resulted in thirty LC tanks, to safely shift through a gigahertz. This distribution has LC tanks taking up 19.8% of the chip area, but it does not take away from the fact that we are saving 41.1% of the CDN power at 2.2GHz, or even that we can save 13.7% at 1.14GHz. This leaves designers to choose priorities between increasing the number of LC tanks to give the frequency scaling flexibility or limiting the number due to design constraints such as chip resources.

Benchmark 1 of Table I is nearly 11 times larger than benchmark 5. While having the greatest number of LC tanks on a CDN, the LC node to mesh node ratio was the smallest and the total inductor area on the chip was only 2.61%. The total number of LC tanks was chosen in order to supply a similar frequency range in previous benchmarks. While achieving the largest power savings of 41.5%, the lower frequencies savings fell off when compared to other tests. It appears that larger benchmarks can supply the necessary area for significant power savings at higher frequencies while sacrificing power savings in states with few inductors connected.

#### V. CONCLUSION

This work presents the first dynamically tunable resonant frequency CDN. This is achieved with the use of LC tanks, connected to the CDN by passgates. This enables the removal of tanks to alter the total induction on the CDN and thus move the resonant frequency. Power savings as great as 41.5% and dynamically tunable frequency windows of greater than twice the minimum frequency are observed. The run-time

configuration removal of LC tanks can be changed to detach each individually to achieve a finer amount of tuning.

#### **ACKNOWLEDGMENTS**

This work was supported in part by the National Science Foundation under grant CCF-1053838.

#### REFERENCES

- C. Anderson, J. Petrovick, J. Keaty, and et al, "Physical design of a fourth-generation POWER GHz microprocessor," in *ISSCC*, Feb 2001, pp. 232–233.
- [2] F. O'Mahony, C. Yue, M. Horowitz, and S. Wong, "Design of a 10GHz clock distribution network using coupled standing-wave oscillators," in *DAC*, June 2003, pp. 682–687.
- [3] V. Chi, "Salphasic distribution of clock signals for synchronous systems," *IEEE Trans. on Comp.*, vol. 43, no. 5, pp. 597–602, May 1994.
- [4] B. Taskin, J. Demaio, O. Farell, M. Hazeltine, and R. Ketner, "Custom topology rotary clock router with tree subnetworks," *TODAES*, vol. 14, no. 3, May 2009.
- [5] S. Chan, P. Restle, K. Shepard, N. James, and R. Franch, "A 4.6GHz resonant global clock distribution network," in *ISSCC*, Feb 2004, pp. 342–343.
- [6] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, "Resonant clocking using distributed parasitic capacitance," *JSSC*, vol. 39, no. 9, pp. 1520–1528, Sept 2004.
- [7] M. Guthaus, "Distributed LC resonant clock tree synthesis," in *ISCAS*, May 2011, pp. 1215–1218.
- [8] Z. Yu and X. Liu, "Implementing multiphase resonant clocking on a finite-impulse response filter," *IEEE Trans. VLSI syst.*, vol. 17, no. 11, pp. 1593–1601, Nov 2009.
- [9] C. Ziesier, S. Kim, and M. Papaefthymiou, "Resonant clock generator for single-phase adiabatic systems," in *ISLPED*, Aug 2001, pp. 159– 164.
- [10] M. R. Guthaus, G. Wilke, and R. Reis, "Revisiting automated physical synthesis of high-performance clock networks," *TODAES*, vol. 18, no. 2, pp. 31:1–31:27, Apr. 2013.
- [11] C. Yue and S. Wong, "On-chip spiral inductors with patterned ground shields for Si-based RF ICs," JSSC, vol. 33, no. 5, pp. 743–752, May 1998.
- [12] V. Sathe, S. Arekapudi, A. Ishii, C. Ouyang, M. Papaefthymiou, and S. Naffziger, "Resonant-clock design for a power-efficient, high-volume x86-64 microprocessor," *JSSC*, vol. 48, no. 1, pp. 140–149, Jan 2013.
- [13] V. Sathe, J. Kao, and M. Papaefthymiou, "Resonant-clock latch-based design," JSSCC, vol. 43, no. 4, pp. 864–873, April 2008.
- [14] Sonnet website. [Online]. Available: http://www.sonnetsoftware.com
- [15] X. Hu, W. Condley, and M. Guthaus, "Library-aware resonant clock synthesis (LARCS)," in DAC, June 2012, pp. 145–150.
- [16] M. R. Guthaus, X. Hu, G. Wilke, G. Flach, and R. Reis, "Highperformance clock mesh optimization," *TODAES*, vol. 17, no. 3, pp. 1–17, Jul 2012.
- [17] C. N. Sze, "ISPD 2010 high performance clock network synthesis contest," in *ISPD*, 2010.