Title
A 25-Gb/s 5-mW CDR/Deserializer in 65-nm Technology

Permalink
https://escholarship.org/uc/item/4vc123z8

Author
Jung, Jun Won

Publication Date
2012

Peer reviewed|Thesis/dissertation
A 25-Gb/s 5-mW CDR/Deserializer in 65-nm Technology

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering

by

Jun Won Jung

2012
© Copyright by
Jun Won Jung
2012
Recent studies indicate that the input/output (I/O) bandwidth of serial links must increase by 2 to 3 times every two years so as to keep up with the demand for higher data rates. In order to manage such bandwidths with reasonable power consumption, an efficiency of around 1 mW/Gb/s for the overall transceiver is targeted, necessitating a much smaller value for each building block.

The latches, demultiplexers and frequency dividers comprising a broadband receiver consume the lion’s share of the power. Current-steering circuits run at high speed but draw considerable static power, whereas rail-to-rail CMOS circuits can avoid static bias but at the cost of speed.

This work describes the development of a 25-Gb/s clock and data recovery (CDR) circuit and a deserializer that, through the use of “charge steering” and other innovations, achieve a twenty-fold reduction in the power dissipation with respect to the prior art. Realized in 65-nm CMOS technology, an experimental prototype draws 5-mW from a 1-V supply, exhibiting an integrated clock jitter of 1.52 ps,rms and a jitter tolerance of 0.5 unit interval (UI) at a jitter frequency
of 5 MHz.
The dissertation of Jun Won Jung is approved.

Mario Gerla

Sudhakar Pamarti

Alan Laub

Behzad Razavi, Committee Chair

University of California, Los Angeles

2012
To my parents ...
# Table of Contents

1 Introduction .................................................. 1  
   1.1 Motivation ............................................... 1 
   1.2 Organization ........................................... 6 

2 Charge-Steering Circuits ...................................... 8  
   2.1 Concept .................................................. 8 
   2.2 Charge-Steering Latch .................................. 10  
      2.2.1 Operation .......................................... 10 
      2.2.2 Small-signal Gain and Swing Calculation ............ 14 
      2.2.3 Design Consideration ................................ 15 
   2.3 Charge-Steering Flipflops ............................... 23  
      2.3.1 Design Issues ...................................... 23 
      2.3.2 NRZ Charge-Steering Latch ......................... 24 
      2.3.3 Cascading NRZ Charge-Steering Latches .......... 27 
      2.3.4 The Proposed Charge-Steering FF .................. 28 
   2.4 Comparison with Other Circuit Topologies .............. 31  
      2.4.1 Current-Steering Circuits ......................... 31 
      2.4.2 Rail-to-Rail Circuits .............................. 32 

3 Design of a 25-Gb/s 5-mW CDR/Deserializer in 65-nm CMOS Technology .................................................. 35  
   3.1 CDR ...................................................... 35
LIST OF FIGURES

1.1 Power consumption of data center [1] .......................... 2
1.2 Trends for Aggregate microprocessor I/O Bandwidth [2] ..... 3
1.3 Publications in I/O transceivers ................................. 4
1.4 Generic broadband receiver ................................. 4
1.5 Current-steering latch ................................. 5
1.6 Power consumption of the state-of-the-art CDRs ............... 6
2.1 Elements in current steering and charge steering ............... 9
2.2 Transformation of current steering to charge steering ........... 9
2.3 Operation of charge steering ................................ 11
2.4 Example of charge-steering latch ........................... 11
2.5 Simulated waveform of charge-steering latch .................. 12
2.6 Power scaling of charge-steering latch ...................... 13
2.7 Input/output characteristics of RZ charge-steering latch ...... 16
2.8 Design parameters of charge-steering latch ................... 16
2.9 Power consumption with different load capacitance .......... 17
2.10 $V_B$ and $V_{out}$ when $C_T = 40fF$ ......................... 18
2.11 Circuit model for output swing ............................... 18
2.12 $\Delta V_{out}$ as a function $C_T$ .............................. 19
2.13 Power consumption due to the variation of $C_T$ ............... 20
2.14 Eye diagram of $V_{out}$ ................................. 20
2.15 Effects of $W_5$ ........................................ 21
3.12 Conventional XOR gate ........................................ 44
3.13 Error XOR gate ................................................. 45
3.14 Reference XOR gates ........................................... 45
3.15 V/I converter and loop filter ................................. 46
3.16 The second DMUX level ....................................... 48
3.17 1:2 DMUX with charge-steering latches .................. 49
3.18 Simulated waveforms of clocks and 1:2 DMUX ........... 50
3.19 (a) Rail-to-rail latch, (b) operation of the latch, (c) new latch, (d) simulated speed of the dividers, (e) simulated power consumption of the dividers. ........................................... 51
3.20 RZ-to-NRZ conversion with RS latch ....................... 53
3.21 (a) RZ-to-NRZ conversion, (b) proposed comparator. .... 54
3.22 (a) 1:2 DMUX with RZ-to-NRZ conversion (b) Circuits in one arm 55
3.23 Simulated waveforms of RZ-to-NRZ conversion ............ 56
3.24 Complete architecture ........................................ 58
3.25 Locked phase noise profile for jitter calculations. ......... 59
3.26 (a) P-N LC VCO (b) N-only LC VCO .......................... 60
3.27 Two scenarios of driving the load by the VCO. ............. 61
3.28 Simulated VCO characteristics ............................... 62
3.29 Simulated recovered clocks from VCO ...................... 62

4.1 Die photograph of the prototype ............................. 64
4.2 Die photograph of the core .................................... 65
4.3 Picture for test setup ........................................... 65
4.4 The measured 25-Gb/s input data ........................................ 66
4.5 Basic test setup for CDR .................................................. 67
4.6 Test setup for jitter transfer .............................................. 67
4.7 Conversion of one sideband to AM and FM, respectively ............ 68
4.8 Test setup for jitter tolerance ........................................... 69
4.9 VCO characteristics ...................................................... 70
4.10 The recovered clock spectrum ........................................... 70
4.11 The recovered data ....................................................... 71
4.12 Jitter transfer ............................................................ 71
4.13 Jitter tolerance .......................................................... 72
4.14 Summary of the CDR performance ..................................... 73
List of Tables

3.1 Published demultiplexers ........................................ 47
3.2 Simulated power dissipation ...................................... 57
4.1 Performance summary ............................................... 73
Acknowledgments

Recalling my Ph.D. years, I have to admit that I was fortunate to have all the opportunities, supports and people in UCLA. These enabled me to go through numerous challenges that I had encountered.

First of all, I would like to express my gratitude to my advisor, Professor Razavi. It was a great honor to study for the Ph.D. degree under his guidance. His enthusiasm and intuition inspired me to set high standards and to move towards my goals. While he patiently advised me to move in the right direction, I also gained valuable knowledge from his lectures. His lectures on circuits not only teach in-depth knowledge, but also stimulate students’ curiosity to delve into the topic. I also would like to thank Professor Laub, Pamarti, and Gerla for serving on my committee.

It was a great experience to work with the members of our research group: Joung Won, Ali, Sedigheh, Woods, Bibhue, Marco, Hegong, Hyuk, Chuangkang, and Joseph. I would like to thank to all of the group members. I shared knowledge and had valuable discussions with them. In particular, Joung Won spent a lot of time with me discussing many topics, and could share the measurement setup without any trouble. Also, Ali helped me design the VCO in the CDR. Picnic time and gatherings with group members are another pleasant memory for me in my Ph.D. years. I also thank TI and Realtek technology for supporting my research and TSMC for chip fabrication.

I was also fortunate to meet many friends who helped me and shared memories at UCLA. I would like to thank all these friends: Dong-U, Brandon, Jintae, Kyungsu, Minjae, Jongsun, Brian, Henry, Wonho, and Albert. I also appreciate my old friends: Younghun, Sangjoon, Seungkeun, Junyeol, Sangbum, and
Yonghak. They encouraged me to work hard and sometimes helped me refresh.

I appreciate my younger sister, Seungyeon, who completed the Ph.D program at Harvard University during the same period. She tried to encourage me and emailed me with good quotes from books occasionally.

Lastly, I would like to thank my parents for their sacrifices and trust. Even when my father had an accident at a mountain, they wanted me to focus on the research. I hope for them to always be healthy so that I can return their love and support.
Vita

1979 Born, Seoul, South Korea.

2004 B.S. (Electrical Engineering), Seoul National University, Seoul.

2008 Intern, Sabio Lab, Mountain View, California.

2008 M.S. (Electrical Engineering), UCLA, Los Angeles, California.

2008 Intern, Magma Design Automation, San Jose, California.

2009 Intern, Samsung Electronics, Korea.

2008–2010 Teaching Assistant, Electrical Engineering Department, UCLA.

2008–present Research Assistant, Electrical Engineering Department, UCLA.

Publications

CHAPTER 1

Introduction

1.1 Motivation

With the rapid proliferation of wireless devices such as cell phones, GPS, and tablets, the power consumption of chips in these devices has become an inevitable issue. This is obvious because they are operated with a battery. As more functions are being added in one mobile device, the problem is getting worse. A single device must now support multiple RF communications, multi-core processors, memories, and sensors, while the capacity of the battery remains relatively constant due to the small form factor of mobile devices. Multi-level low-power optimization from the physical layer to the top layer has been extensively developed to address the issue in these applications.

However, this issue is not straightforward in high-performance chips such as server applications, since they have dedicated supplies. As the technologies evolve, power efficiency becomes more critical in those areas. The power density also scales up, causing heat that requires serious cooling. The heat generation must be managed below the level that a cost-effective cooling system can handle. Another good example is data centers for internet. This is the most active place where wireline communication occurs. According to [1], data centers consumed 61 billion kWh of electricity in 2006, 1.5% of all U.S. electricity consumption. As shown in Fig. 1.1, the power consumption of data centers has rapidly increased,
and this trend will continue as the demand explodes. Recent increases in cloud computing services will also accelerate the demand. The power consumption of high-performance chips thus must be minimized as well.

In both cases, microprocessors require high-bandwidth interfaces to communicate with memory, co-processors, and peripheral components. As we add more computing power and functions with technology scaling, system I/O bandwidth scales accordingly. In wireline communication and particularly serial links, the aggregate bandwidth continues to increase at a rate of 2 to 3X every two years[2] as shown in Fig. 1.2. In order to manage such bandwidth with reasonable power consumption, designers are targeting an efficiency of 1 mW/Gb/s for entire transceivers. Moreover, per-pin data rates have also increased rapidly, as the pin counts have increased at a relatively moderate rate. As the data rate increases, more stringent equalization and clocking are necessary, making it more difficult to achieve this efficiency at higher data rate.
One way to investigate people’s actual interests is to check what they have published. Figure 1.3 shows recent publication trends in I/O transceivers. Designers are pursuing low power solutions at a data rate of 10 Gb/s to 30 Gb/s rather than higher speed. Recently, IEEE 802.3 community has been trying to define a 4-lane 100-Gb/s backplane PHY[3]. 25-Gb/s receivers are therefore a good research area for next generation high-speed I/O. A receiver must not only support a BER of better than or equal to $10^{-12}$, but also achieve the target power efficiency.

Figure 1.4 shows a generic broadband receiver consisting of an analog front end (possibly including an equalizer), a CDR loop, and a demultiplexer (DMUX). The CDR circuit comprises phase detector (PD), a low-pass filter (LPF), and a voltage-controlled oscillator (VCO). We observe that the PD, the DMUX, and the frequency dividers incorporate nearly a dozen latches, potentially consuming
a large amount of power. It is therefore desirable to develop high-speed low-power latches and minimize their number in a receiver.

Conventional high-speed latches are designed in current steering, as shown in Fig. 1.5. The trade-off between power and speed poses limits on this topology. With a given capacitance C, load resistor R sets both output bandwidth and swings adversely, and hence the power consumption rises. It is necessary to devise new circuit topology that allows low power operation at the similar speed
of current steering.

Figure 1.5: Current-steering latch

The choice of the latch topology is governed not only by its intrinsic speed and power drain but also by its environment: (1) The received data typically does not have rail-to-rail swings and may impose severe power or intersymbol interference (ISI) penalty if it is amplified to such levels; the latches must thus operate with moderate data amplitudes (e.g., \( \sim 400 \, \text{mV}_{pp} \) single-ended). The important implication here is that the data cannot easily sample the clock, dictating PD topologies in which the clock samples the data. (2) The clock can provide nearly rail-to-rail swings if the CDR circuits employs an LC oscillator, but the power consumed by clock buffers (\( \approx fCV_D^2 \)) may become prohibitively large.

Figure 1.6 plots the power consumption of state-of-the-art CDRs in various technology nodes and data rates. They all lie above the 1-mW efficiency envelope. Particularly, at 25 Gb/s, the power consumption of the prior arts need to be reduced by a factor of more than one magnitude to achieve an efficiency of 1 mW/Gb/s for entire transceivers.
1.2 Organization

This dissertation describes the design of 25-Gb/s 5-mW CDR/Deserializer in 65-nm CMOS technology. Achieving the power efficiency for the next generation I/O, this work exploits new circuit techniques and topologies.

Chapter 2 introduces a charge steering concept that allows low power operation at high speeds. It discusses its basic operation and compares a charge-steering flipflop with other circuit techniques. Chapter 3 discusses the CDR architecture, showing the trade-off between the architectures and their operations. Chapter 4 describes the design of CDR/Deserializer in detail. It tailors the CDR so that it can operate with charge-steering circuits. The deserializer also includes RZ-to-NRZ conversions and uses charge-steering latches in addition
to low power circuit solutions in both circuit and architecture levels. Chapter 4 presents experimental results. Chapter 5 summarizes the dissertation and offers some ideas for future works.
CHAPTER 2

Charge-Steering Circuits

2.1 Concept

The use of charge steering can be traced back to regenerative BiCMOS comparators introduced in the early 1990s [4, 5]. In this work, we extend the idea to non-regenerative and flipflop (FF) circuits, exploit charge steering to realize high-speed phase detectors and demultiplexers, and architect the CDR and the deserializer so as to circumvent this technique’s drawbacks.

As discussed in the introduction, the conventional design of high-speed circuits is current steering. Current-steering circuits or current-mode logics (CMLs) normally have a current source that provides a constant flow of charge. An input differential pair steers this flow to load resistors, evaluating output level. Current and load resistors are the key elements that define current steering. This topology can operate at relatively high speed with moderate swings. However, due to the constant current, this topology consumes more power than dynamic logic such as rail-to-rail circuits that do not draw current most of the time.

On the other hand, the basic idea of charge steering is to steer charge rather than current. In other words, we want to steer the constant ‘amount’ of charge rather than the constant ‘flow’ of charge. To define the constant amount of charge, we need to modify the current source so that it can provide a well-defined charge to a differential pair. One way to accomplish this is to set a fixed voltage
swing across a capacitor. The new source cannot produce the proper output level with load resistors anymore. Instead, the load capacitance defined by parasitics from the devices and loading from the next stages can replace the resistors. The ratio between the charge provided and the load capacitance can define the output swings.

In summary, we can transform the conventional current steering circuit for a differential pair to steering charge as shown in Fig. 2.2. Current is replaced with charge. The tail current source is replaced with a capacitor. Load resistors
are replaced with capacitors. We also need a few reset switches. We will see the
detailed operation of the charge steering circuit in the next section.

2.2 Charge-Steering Latch

2.2.1 Operation

The operation of a charge-steering circuit is divided into two modes in order to
realize the charge steering described in previous section. As illustrated in Fig.
2.3, when clock is low, the circuit enters the reset mode, and the switches are
connected so that clock goes low, $C_T$ is discharged to ground, $C_D$’s are precharged
to $V_{DD}$, and the differential pair is off. During this mode, the components reach
the ready state for charge-steering operation. When clock is high, the circuit
enters the evaluation mode, and the switches are configured so that the output
nodes X and Y are released and $C_T$ is switched into the tail of the differential pair,
drawing currents from $C_D$’s until node B rises to about one threshold below the
higher input level. The input differential voltage is amplified during this period.
$C_T$ and the peak value of $V_B$ define the amount of charge for evaluation, and the
output swings at nodes X and Y settle to a level specified by load capacitance and
charge, as if the outputs of current-steering circuits settles. The circuit therefore
operates with moderate signal swings similar to those in CML circuits. These
modes are repeated every clock cycle.

Operating with moderate signal swings and consuming power for only a frac-
tion of the clock cycle, charge steering affords a design style faster than rail-to-rail
logic and less power-hungry than current steering. However, CSL must deal with
two issues: (1) The switches need a rail-to-rail clock and (2) it requires reset a
phase, unlike CML, generating return-to-zero (RZ) outputs. These issues will be
Figure 2.3: Operation of charge steering
addressed in the next chapter.

Figure 2.4: Example of charge-steering latch

Figure 2.4 shows one design example of the charge-steering latch. 2-Gb/s PRBS data with 300-mV swing and 850-mV common mode level is applied as an input, and the clock is at full rate. At a 2-Gb/s data rate, the full-rate clock generally looks square rather than sinusoidal in 65 nm, but as data rate increases.
clock becomes similar to sinusoidal waveform, which may degrade performance in speed. To cover the worst-case scenario, a sinusoidal clock is therefore used for this simulation.

Figure 2.5: Simulated waveform of charge-steering latch
The simulated waveforms are shown in the Fig. 2.5. When the clock is low, the tail capacitor is discharged to ground and both output nodes are precharged to $V_{DD}$, entering the reset mode. When the clock is high, the tail capacitor is connected to the input differential pair and charged up to about one threshold below the higher input level. The output is latched during this period, resulting in about a 400-mW swing. At this data rate, the circuit draws about 23 $\mu$W.

It is interesting to see how the power consumption scales, since the circuit mainly consumes dynamic power. Using the design in Fig. 2.4, we vary the input data rate and clock frequency at the same time. Figure 2.6 reveals another nice feature of this topology: its dynamic power simply scales with frequency. This means that we can reuse the design for lower frequencies as well. Similarly, we can scale all of the components as the load capacitance changes.

![Power Scaling](image_url)

Figure 2.6: Power scaling of charge-steering latch
2.2.2 Small-signal Gain and Swing Calculation

The design of CSL circuits demands simple, intuitive expressions quantifying the performance. To estimate the differential output voltage swing of the CSL latch shown in Fig. 2.2(b), let us assume simple square-law MOS devices and, neglecting subthreshold conduction, note that $V_p$ takes infinite time to reach $V_{CM} - V_{TH}$, where $V_{CM}$ denotes the input common-mode (CM) level and $V_{TH}$ the threshold voltage of $M_1$ and $M_2$. We wish to determine the time, $\Delta T$, necessary for $V_P$ to rise to $V_{CM} - V_{TH} - \Delta V$, where $\Delta V$ is somewhat small and arbitrary and, as seen below, eventually unimportant. Merging $M_1$ and $M_2$ and viewing the composite device as a source follower, one can prove that $\Delta T$ is given by [6]:

$$\Delta T \approx \frac{C_T}{2\mu_n C_{OX} \frac{W}{L}} \left( V_{CM} - V_{TH} - \Delta V \right).$$

(2.1)

The average current drawn by $C_T$ during this time is equal to

$$I_{av} = \frac{(V_{CM} - V_{TH} - \Delta V) C_T}{\Delta T} = \frac{1}{2} \mu_n C_{OX} \frac{W}{L} (V_{CM} - V_{TH}) \Delta V.$$  

(2.2)

Also, the overdrive voltage of $M_1$ and $M_2$ varies from $V_{CM} - V_{TH}$ to $\Delta V$, yielding an average roughly given by $(V_{CM} - V_{TH} + \Delta V)/2$. The average transconductance of the input transistors thus emerges as

$$g_{m,avg} \approx \frac{2I_D}{V_{GS} - V_{TH}} \approx \frac{\mu_n C_{OX} \frac{W}{L} (V_{CM} - V_{TH}) \Delta V}{V_{CM} - V_{TH} + \Delta V}.$$  

(2.3)

For a small differential input, this transconductance produces a proportional differential current for $\Delta T$ seconds, generating a differential output voltage equal
to

\[ V_{out} \approx \frac{g_{m,avg}V_{in}\Delta T}{C_D} \]
\[ \approx 2\frac{V_{CM} - V_{TH} - \Delta V C_T}{V_{CM} - V_{TH} + \Delta V C_D} V_{in}. \]  

The small-signal voltage gain is therefore given by

\[ A_V \approx \frac{2C_T}{C_D}. \]  

if the circuit is allowed infinite time for charge steering.

The upper bound on the output swing occurs when the input differential voltage is large enough to keep one transistor off for most of the charging period, a desirable condition in latch design. In this case, \( C_T \) draws most of its charge from one of the load capacitors, yielding in the limit a differential output voltage of approximately

\[ V_{out} = \left( \frac{V_{CM} + V_{in}/2 - V_{TH}}{C_D} C_T \right) - 0.4 \frac{C_T}{C_D} (V_{CM} - V_{TH}). \]  

The foregoing derivations are verified by circuit simulations. Figure 2.7 plots the output voltage as a function of the input voltage along with the prediction made by Eqs. (2.4) and (2.6). Despite the oversimplified square-law model and the use of averages we observe a reasonable agreement.

### 2.2.3 Design Consideration

In the previous section, we discussed the general operation of the charge-steering latch. This section will focus on how to design the circuit in detail. There are six design parameters, as shown in Fig. 2.8: input differential pair, \( M_{1,2} \), load capacitance, \( C_D \), tail capacitance, \( C_T \), two reset switches, \( M_{3,4,5} \), and one evaluation switch, \( M_6 \). Each design element must be carefully chosen to make the circuit operate properly.
Figure 2.7: Input/output characteristics of RZ charge-steering latch.

Figure 2.8: Design parameters of charge-steering latch

Let us assume that most of the charges are steered from one of the output nodes in the rest of section. The size of the input differential pair for current steering must be chosen to steer most of current, i.e., $V_{in}$ of $\sqrt{2}V_{ov1,2}$ [6]. The same principle can be applied for charge steering. In addition, for high speed operation, we may increase the width of transistors to reduce the resistance in the signal path, because $M_{1,2}$ work as switches in the evaluation phase. However, we cannot increase the size indefinitely because of two reasons: (1) it also increases
the load capacitance, reducing output swing and (2) the previous circuit suffers from the increased input capacitance.

Load capacitance is normally given by the subsequent building block or the specification. But, it is important to understand the effect of \( C_D \), because we can optimize circuits in the system level and manage the input capacitance of the following stages. As shown in the Eq. 2.5 and 2.6, the output swing is inversely proportional to \( C_D \). On the other hand, power consumption is relatively constant, because the charge drawn from one period depends on \( C_T \) and \( V_B \). \( \Delta V_B = V_{CM} + V_{in}/2 - V_{TH} \) is mainly defined by the higher input level and the threshold of input transistors. Figure 2.9 shows that power consumption changes only about 0.9% with a factor of two increase in \( C_D \). In a similar manner, current-steering circuits also consume almost constant power and decrease output swing as the load resistors increase.

![Figure 2.9: Power consumption with different load capacitance](image)

Note that the Eq. 2.6 is valid in moderate output swing where \( V_B \) is below the lower output level, as shown in Fig. 2.5. As the \( C_T \) increases, \( \Delta V_{out} \) increases and enters the region where \( V_B = V_{DD} - \Delta V_{out} \). In this region, \( V_B \) rises below \( \Delta V_B \), as shown in Fig. 2.10, and hence the swing at node B is not constant any more, making the Eq. 2.6 invalid. To find an equation for this region, we can model the settling of output node with a switch resistance and two capacitors as
described in Fig. 2.11.

Figure 2.10: $V_B$ and $V_{out}$ when $C_T = 40fF$

Figure 2.11: Circuit model for output swing

In the reset mode, $C_T$ is discharged to ground and $C_D$ is precharged to $V_{DD}$. In the evaluation mode, these two capacitors are connected through a switch resistance. With moderate output swing, after node B rises up to $\Delta V_B$, $R_{SW}$ becomes so large that the capacitors are disconnected and node X or Y stays above $\Delta V_B$. In the other region, $R_{SW}$ stays low, until $V_B = V_{X,Y}$, meaning that $V_B \leq \Delta V_B$. Charge sharing occurs between $C_T$ and $C_D$, thus inducing another expression for output swing as seen in Eq. 2.7. The output swing is not linearly
proportional to $C_T$ any more.

$$
\Delta V_{out} = \frac{C_T}{C_D + C_T} V_{DD}, \quad \Delta V_{B,NEW} = \frac{C_D}{C_D + C_T} V_{DD}
$$ (2.7)

Figure 2.12 shows the simulated $\Delta V_{out}$ as functions of $C_T$. The linear region is up to about $C_T$ of 10 fF. The output swing is linearly proportional to $C_T$. After this point, gain starts to decrease and output swing follows the Eq. 2.7.

![Figure 2.12: $\Delta V_{out}$ as a function of $C_T$](image)

Since the amount of charge drawn from the supply scales with $C_T$, power consumption also increases linearly up to 10 fF as shown in Fig. 2.13. The power consumption shows a pattern different from Fig. 2.12, because, at some point, $V_B$ rises to such a small level that both input transistors are on, drawing additional current. It is better to avoid this region because it is less power efficient.

The reset switches, $M_{3,4}$, precharge output nodes during the reset mode. They must be wide enough to settle the output nodes to $V_{DD}$ within the half of the clock period. Otherwise, the circuit would suffer from the inter-symbol interfer-
Figure 2.13: Power consumption due to the variation of $C_T$

Figure 2.14: Eye diagram of $V_{out}$

ence(ISI). Figure 2.14 compares the eye diagrams of $V_{out}$ in two different cases, showing the ISI in smaller switches.

Another reset switch, $M_5$, discharges $C_T$ to ground during the reset mode. It must also be wide enough to reset the node B to ground within a given period. Figure 2.15 shows the simulated waveforms, comparing two different settling
Figure 2.15: Effects of $W_5$

cases. When $C_T$ cannot be discharged to ground, the effective $\Delta V_B$ decreases and hence the output swing also decreases. In order to compensate for the output swing, we could increase the $C_T$. However, this is not recommended, since $\Delta V_B$ will become more prone to PVT variations.

$M_6$ is the only switch that is on during the evaluation phase. In this period, this switch works as a part of $R_{SW}$ in Fig. 2.11. As discussed in the input differential pair, this switch also needs to be wide enough for high-speed operation, so that node B will rise up to $\Delta V_B$ like $V_B$ waveform for $W_6 = 8\text{um}$ in Fig. 2.16. Otherwise, it will reach a lower voltage level than $\Delta V_B$, reducing output amplitude, as shown in Fig. 2.16. Similar to $M_5$, this is not desirable because of PVT variations.
In summary, for a given load capacitance, speed, and output swing, the design procedure for a charge-steering latch is as follows: (1) choose $W_{3,4}$ so that it can precharge the given $C_D$, (2) find $C_T$ that provide the required output swing, (3) choose $W_{1,2}$ for complete steering, (4) set the $W_6$ for the speed, (5) choose $W_5$ for discharging $C_T$, and lastly (6) adjust $C_T$ and $W_{3,4}$, considering the parasitics added to output nodes and node B.
2.3 Charge-Steering Flipflops

2.3.1 Design Issues

While saving considerable power, charge steering does face a number of issues that make the design challenging. First, to drive the tail and output switches in Fig 2.2(b), a rail-to-rail clock is necessary, demanding that clock generation and latch design be co-optimized. Second, a CSL stage spends about one-half of the clock period, $T_{CK}$, in the reset mode, producing a return-to-zero (RZ) output. This attribute may be considered an advantage or a disadvantage. The reset operation actively removes ISI, a point of contrast to the “passive” continuous-time decay in CML circuits. However, it also demands a dedicated fraction of the clock cycle, tightening the timing budget for amplification and latching. Moreover, the RZ output must be converted to non-return-to-zero (NRZ) format at some point.

Figure 2.17: Design example of charge-steering FF
The RZ output issue manifests itself when two CSL stages must be cascaded. Consider, for example, the master-slave flipflop shown in Fig. 2.17(a). If $CK_1$ and $CK_2$ are simply complementary, then the slave stage begins to sense when the master outputs begin to reset. Thus, if the reset operation happens to be faster than the sense operation (e.g., in the slow-NMOS, fast-PMOS corner of the process), then the slave may produce a small differential output.

The above difficulty can be remedied by more complex clocking. Depicted in Fig. 2.17(b) is an example where $CK_1$ and $CK_2$ are offset by about one-quarter of the clock period so that the master provides unreset outputs to the slave for $T_{CK}/4$ seconds. However, the generation and buffering of such clock phases at high frequencies consume substantial power.

### 2.3.2 NRZ Charge-Steering Latch

It is possible to avoid the reset mode by merging it with the sense mode. This requires that the input and output nodes be the same! Depicted in Fig. 2.18(a), such a topology provides an NRZ output. In the sense mode, switches $S_1$ and $S_2$ are on, allowing $X$ and $Y$ to track the input, and $S_3$ is on, discharging $C_T$. When
$S_1$-$S_3$ turn off and $S_4$ turns on, the circuit begins to regenerate, thus amplifying $V_X - V_Y$ and holding the result.

We wish to estimate the small-signal voltage gain of this latch in the regeneration mode. Consider the simplified circuit shown in Fig. 2.18(b), where $R_{on}$ represents the on-resistance of $S_4$. To determine the upper bound on the gain, let us assume that (1) the latch begins with a small imbalance, $V_{XY_0}$, and (2) $M_1$ and $M_2$ are so wide that their gate-source voltage varies negligibly while $C_T$ charges. The soundness of these assumptions is checked below.

We now write the tail current as

$$I_T(t) = \frac{V_{CM} - V_{GS}}{R_{on}} \exp \left( -\frac{t}{R_{on}C_T} \right),$$

(2.8)

where $V_{CM}$ denotes the input CM level. In the design used here, the transistors mostly operate in the subthreshold region, exhibiting a transconductance of $g_m \approx I_D/(\zeta V_T)$, where $\zeta$ is related to the subthreshold slope and given by $1+C_d/C_{ox}$ ($C_d$ is the depletion region capacitance under the channel). Since $I_{D1} \approx I_{D2} \approx I_T/2$, the time variant transconductance of each transistor is estimated as

$$g_m(t) = \frac{1}{2\zeta V_T} \frac{V_{CM} - V_{GS}}{R_{on}} \exp \left( -\frac{t}{R_{on}C_T} \right).$$

(2.9)

We also express the regeneration action by the following equations:

$$-C_D \frac{dV_X}{dt} = g_{m1}V_Y$$

(2.10)

$$-C_D \frac{dV_Y}{dt} = g_{m2}V_X,$$

(2.11)

and hence

$$C_D \frac{dV_{XY}}{dt} = g_m V_{XY},$$

(2.12)

where $g_{m1} = g_{m2} = g_m$. It follows from (2.9) and (2.12) that

$$C_D \frac{dV_{XY}}{V_{XY}} = \frac{V_{CM} - V_{GS}}{2\zeta V_T R_{on}} \exp \left( -\frac{t}{R_{on}C_T} \right) dt.$$  

(2.13)
Integration of both sides for $t = 0$ to $t = \infty$ yields

$$C_D \ln \frac{V_{XY\infty}}{V_{XY0}} = \frac{V_{CM} - V_{GS}}{2\zeta V_T} C_T,$$

(2.14)

and, therefore,

$$\frac{V_{XY\infty}}{V_{XY0}} = \exp \left( \frac{C_T V_{CM} - V_{GS}}{C_D \frac{2\zeta V_T}{} } \right).$$

(2.15)

The maximum output swing occurs if $V_{XY0}$ is large enough to keep one transistor off. In this case, no regeneration takes place and the output swing is given by Eq. (2.6).

![Figure 2.19: Input/output characteristics of NRZ charge-steering latch.](image)

Figure 2.19 plots the simulated output voltage of the circuit as a function of the initial imbalance. The result predicted by Eq. (2.15) is also plotted with the assumption that the voltage drop across $R_{ON}$ linearly varies from $V_{CM} - V_{GS}$ to zero. We note a reasonable argument.
2.3.3 Cascading NRZ Charge-Steering Latches

In view of the cascading issues illustrated in Fig. 2.17, we may contemplate a flipflop employing the above NRZ latch instead. As shown in Fig. 2.20(a), such a master-slave topology could, in principle, operate with only complementary clocks because it does not require a dedicated reset time. Unfortunately, this approach suffers from severe charge sharing between the master and slave nodes, introducing substantial ISI in random data. We recognize that for a random input sequence, the previous state at $X_2$ may be the opposite of the present state.
at $X_1$, causing a twofold reduction in the signal amplitude if the capacitance at these nodes are equal. Figure 2.20(b) shows the simulated waveforms at the four nodes, revealing severe corruption.

One may wonder if the master in Fig. 2.20(a) can be chosen 5 to 10 times larger than the slave so as to make the charge sharing negligible. However, the remaining ISI could be a problem, and the common-mode level degrades after cascading. Moreover, this scaling introduces large device sizes in the master latch, causing other issues for the circuits that drive it.

### 2.3.4 The Proposed Charge-Steering FF

The foregoing studies lead to the proposed charge-steering FF shown in Fig. 2.21 as a viable candidate. Here, the master is realized as the NRZ latch, thus avoiding the reset phase, and the slave as the original RZ latch, thus avoiding charge sharing. The circuit can therefore operate with complementary clocks. When clock is low as shown in Fig. 2.21 (a), switches $S_1$ and $S_2$ are on, the input is sampled on nodes X and Y, and the slave latch enters the reset mode. Next, when the clock goes high, as shown in Fig. 2.21 (b), $S_1$ and $S_2$ are turned off, and the cross-coupled pair and the slave latch are clocked. This technique provides two simultaneous amplifications in the two latches and large robust swings at the output. This master latch cannot be used in current-steering topology, because its operation is based on capacitance on nodes X and Y instead of resistors. While the output of the master latch is in NRZ form, the output of FF remains in RZ form.

Figure 2.22(a) also shows the transistor widths and capacitance values as a design example for an input data rate of 25 Gb/s and a clock frequency of 12.5 GHz. Every transistor has a minimum channel length of 60 nm. A sinusoidal
clock is used for the simulation. Figure 2.22(b) plots the circuit’s simulated waveforms. With a single-ended input swing of 300 mV_{pp}, the master produces a swing of about 340 mV_{pp} and the slave, about 500 mV_{pp}. The FF consumes 158 \mu W from a 1-V supply at this rate. It is possible to reduce the power by “linear” scaling of all of the devices [7], but at the cost of a higher offset. According to simulations, the above design exhibits an input-referred offset of about 6 mV, a comfortable value for input swings of a few hundred millivolts.

The proposed FF topology proves useful in the design of phase detectors and (de)multiplexers. However, it still produces RZ data, requiring additional
Figure 2.22: Design example of charge-steering FF techniques at the architecture level.
2.4 Comparison with Other Circuit Topologies

2.4.1 Current-Steering Circuits

A simple analysis can quantify the advantages of charge steering over current steering. Output swing, $\Delta V$, data rate, $r_b$, and load capacitance are given for this analysis as shown in Fig. 2.23. Assuming that charge or current is steered completely to one side, we can find equations for current drawn from the supply.

For current steering circuits, the output bandwidth must be about 0.7 times the bit rate, $r_b$, and the tail current, $I_{SS}$ in Eq. 2.16 is chosen to provide an output of $\Delta V$. For charge-steering circuits, the average current drawn from the supply is equal to the charge provided to each capacitance divided by the bit period and hence equal to the bit rate times $C\Delta V$ as shown in Eq. 2.17. This means that, for a given condition, charge steering saves power by a factor of $1.4\pi$, about 4.4, with respect to current steering. In practice, circuits do not steer 100% charge or current, and the actual power consumption for the given condition would increase a little for both cases.

\[
\frac{1}{2\pi RC} = 0.7r_b, \quad I_{SS} = \frac{\Delta V}{R} \implies I_{SS} = 2\pi(0.7r_b)C\Delta V \quad (2.16)
\]

\[
I_{avg} = \frac{C\Delta V}{T_b} = r_bC\Delta V \quad (2.17)
\]
Let us repeat the CSL design of Fig 2.22 in CML with the same power consumption (160 $\mu$W), supply voltage (1 V), and output swing ($\approx$400 mV$_{pp}$ single-ended). Each latch therefore has a current budget of 80 $\mu$A, requiring a load resistor of 5 k$\Omega$. Choosing width of 0.8 $\mu$m for the transistors in the signal path and 0.23 $\mu$m for the clocked devices, we obtain the eye diagram shown in Fig. 2.24(b) if the FF has a fanout of two. Eye height decreases to 200 mW because of the limited bandwidth, and small device dimensions lead to large input-referred offset.

2.4.2 Rail-to-Rail Circuits

Rail-to-rail circuits could be another candidate for low power operation, since they also do not consume static power. However, the expected power consumption is higher than charge steering, because its output swing is rail-to-rail. Moreover, repeating the design in rail-to-rail logic is more difficult as the data swings at these rates are typically a few hundred millivolts and rail-to-rail logic requires the input with rail-to-rail level as well, meaning that we need level conversion circuits in the front. Figure 2.25(a) shows a rail-to-rail example of FF in the receiver. A rail-to-rail FF consumes 645 $\mu$W, while each input swing conversion circuit already consumes 331 $\mu$W. It consumes more power than charge steering, and the simulated eye diagram in Fig. 2.25(b) shows that it suffers from severe ISI.
Figure 2.24: A design example of current-steering FF
Figure 2.25: A design example of rail-to-rail FF
CHAPTER 3

Design of a 25-Gb/s 5-mW CDR/Deserializer in 65-nm CMOS Technology

3.1 CDR

A clock and data recovery circuit is an essential building block in various high-speed wire-line communication receivers such as the backplane serial link, the chip-to-chip interconnect and the optical link. Its core functions are to extract a clock that have certain phase relationship with respect to the data and to retime the data with the extracted clock, removing jitter.

The operation modes of CDR can be categorized into burst mode and continuous mode. A burst-mode CDR is used in a point-to-multi point application, in which multiple senders transmit bursts of data with a silence time slot between bursts. Burst-mode data transmission often requires very fast acquisition time in order to meet the low network latency requirement within short preamble bits. Whereas, a continuous mode CDR is used in point-to-point systems, in which a long and continuous stream of bit is transmitted. This transmission does not require a fast acquisition time, but may require a stringent jitter characteristics.

Many types of CDR have been employed for different applications. There are PLL-based [8], DLL-based [9], phase-interpolator-based [10], injection-lock-based [11], oversampling-based [12] topologies, etc. The choice of topology depends on
the application and the specification. Among these choices, this work focuses on the pll-based topology shown in Fig. 1.4. This is because the objective is to find low-power solutions for high-speed applications such as the 100-Gb/s Ethernet mentioned in Chapter 1.

Low power optimization also requires exploration in architecture levels. Each architecture has its own advantages and trade-offs. In the full-rate architecture, VCO recovers clock at the same frequency as the incoming data rate, and the phase detector (PD) retimes with the recovered clock. There are two common full-rate CDRs, Hogge phase detector [13] and Alexander phase detector [14] as shown in Fig. 3.1.

Figure 3.1: Full-rate phase detectors: (a) Hogge PD and (b) Alexander PD

A Hogge PD samples with full-rate clock and generates two pulses, proportional pulse $X$ and reference pulse $Y$, from two XORs. The width of proportional pulses varies linearly with the input phase difference, indicating that the circuit operates as a linear PD. In contrast, the reference pulses exhibit constant width equal to clock period at every data transition. The difference of these pulses is the output of PD, which drives loop filter and a VCO. Under the locked condition, $X$ and $Y$ produce equal pulsewidth.

Alexander PD exhibits a bang-bang characteristic. It requires four FFs to
sample the data at multiple points in the vicinity of expected transitions, thus providing ‘early’ and ‘late’ information of the clock phase. Outputs of $FF_1$, $FF_2$, and $FF_3$ are three consecutive samples spaced by half of the clock period. XOR-ing shown in Fig. 3.1 produces ‘early’ pulses $Y$ and ‘late’ pulses $X$ exclusively, depending on whether the clock leads or lags. Both pulsewidths are equal to the clock period and vanish under locked condition.

As the speed approaches limits of technology, half-rate operation becomes more attractive. In the full-rate architecture, a master-slave FF must regenerate data for only a half of the clock period, while the regeneration time of the half-rate system is twice as long as that of the full-rate system, as depicted in Fig. 3.2, thereby relatively relaxing speed constraints in the half-rate architecture.

![Timing diagram for full rate and half rate](image)

**Figure 3.2** Timing diagram for full rate and half rate

As discussed in the previous section, a half-rate PD can also be implemented in different architectures. Figure 3.3 shows a half-rate bang-bang phase detector. The FFs sample the three consecutive point near the data transition as described in the Alexander PD operation. The difference is that the clock is half rate and the quadrature phase is used to sample the data with the same interval as the full-rate counterpart. Under the locked condition, $CK_Q$ samples the data at the zero crossing.

However, this PD requires quadrature VCO as depicted in Fig. 3.4. Quadrature VCOs are inferior because the phase noise of them is typically 3 to 5 dB
Figure 3.3: Half-rate detector with quadrature clock phase

higher than a single oscillator, due to two reasons [15]: (1) Q is degraded, since
the oscillation departs from resonance, and (2) the flicker noise of the coupling
transistors increases the phase noise at low frequency offsets. Moreover, quadra-
ture VCOs require two oscillators that occupy a large area. Particularly, in LC
VCO, this problem becomes more prominent. As we will see in the chapter 5,
most of the die area is occupied by the inductor of a single VCO.

Figure 3.4: Block diagram of quadrature VCO

Another example of the half-rate PD is shown in Fig. 3.5 [16]. This example
is a linear PD and produces reference and error (proportional) pulses like the
Hogge PD, but it requires only complementary clocks and hence is attractive
for our purpose. The details of its operation are described in Fig. 3.6. \( L_1 \) and
\( L_2 \) are the master latches of two FFs and XORing their outputs, \( X_1 \) and \( X_2 \),
generate error pulses at every data transition. Error pulse width varies linearly
with input phase difference and becomes a quarter of the clock period under the
locked condition. Two outputs of FFs, $D_{out1}$ and $D_{out2}$, are used to generate reference pulses whose width is equal to a half of the clock period at every data transition. The ratio of the pulsewidth under the locked condition is 1:2, and the ratio of output swing therefore needs to be 2:1. Note that $D_{out1}$ and $D_{out2}$ are demultiplexed, which is an inherent advantage of the half-rate operation.

![Figure 3.5: Linear half-rate phase detector](image)

The natural question would be why not decrease the rate further, such as quarter-rate CDR [17]. This may relax the speed constraint more, but it causes other problems: (1) it introduces more building blocks, and (2) requires at least

![Figure 3.6: Operation of half-rate phase detector](image)
quadrature phases, making the VCO design more difficult and (3) these together create complex clock routing.

3.1.1 Phase Detector

We would like to implement the phase detector in Fig. 3.5 using charge-steering latches and charge-steering FF in Fig. 2.21. But, due to RZ nature of charge-steering outputs, one issue arises with this architecture. XORing A and B does not produce correct reference pulses because the reset phase of $L_3$ and the evaluation phase of $L_4$ occur at the same time as depicted in Fig. 3.7. Generating error pulses does not have this problem because the master latches of the FFs, $L_1$ and $L_2$, produce NRZ outputs.

A simple solution is to modify the PD as shown in Fig. 3.7. By adding latches, $L_5$ and $L_6$, $Z_1$ and $Z_2$ are delayed by a half of clock period from $Y_1$ and $Y_2$, respectively. Instead of XORing $Y_1$ and $Y_2$, we can now XOR $Y_1$ with a delayed version of $Y_2$, $Z_2$, and $Y_2$ with a delayed version of $Y_1$, $Z_1$, generating two reference pulses, REF1 and REF2. REF1 and REF2 detect data transitions every two bits (80 ps) respectively and generate corresponding pulses. In order to generate complete reference pulses, REF1 and REF2 need to be added together. Adding two pulses will be described in the next XOR section.

One might wonder if using only one XOR for reference could work and simplify the design. For example, we could remove $L_5$ and REF2. This simplification also extracts phase information from the data, but loses half of that information. In the scenario described in Fig. 3.8, the circuit could miss many data transitions by removing the REF2, and consider the data as a long sequence of zeros, failing to lock.

Charge-steering latches can be used for $L_5$ and $L_6$. However, there exists the
Figure 3.7: Modified PD due to RZ outputs

Figure 3.8: Reference pulse generation

issue described in section 2.3. When $L_5$ or $L_6$ enters the evaluation phases, $L_3$ or $L_4$ enters the reset phases and begins to lose its output. To avoid the complex clocking scheme, we can insert delay in the clock path or the data path as depicted
in Fig. 3.9. If the delay is inserted in the clock path, timing of the $D_{out1}$ and $D_{out2}$ differ from A and B, and thus the pulsewidth varies with the delay and is sensitive to its variation. The delay in the data path can avoid this issue and be implemented with simple passive operation, thus obviating additional power consumption.

Figure 3.9: Inserted delays in the phase detector

Figure 3.10: Passive delay inserted in the data path

The passive delay can be implemented with a simple resistor. The resistance is about 4 $k\Omega$ in the actual prototype, and here the 4-$k\Omega$ resistor is replaced with a PMOS transistor as depicted in Fig. 3.10, because a transistor introduces less parasitic capacitance and occupies a smaller area. Figure 3.11 shows the simulated waveforms of the recovered and delayed data. Passive delay decreases output swing but provides enough sampling time for the $L_5$ or $L_6$ to operate robustly. These techniques are entailed so that the modified PD now generates proper information to extract phase information from the data. However, $D_{out1}$
and $D_{out2}$ are still in RZ form and need to be converted to NRZ data. This conversion can be accomplished at 12.5 Gb/s, but to save more power, we prefer to demux the data by another factor of two before the conversion.

![Simulated waveforms of $V_A$, $V_{A_{\text{delayed}}}$ and $D_{out1}$](image)

**Figure 3.11:** Simulated waveforms of $V_A$, $V_{A_{\text{delayed}}}$ and $D_{out1}$

In the charge-steering circuits, output capacitance is mainly the load capacitance and there is no explicit capacitor at the output nodes. On the other hand, we need to place a tail capacitor to provide the required charge. This capacitor can be implemented in two ways: (1) a metal capacitor and (2) a MOS capacitor. Metal capacitor can use layers from poly to metal 9 to make it denser, because bottom plate capacitance is not a concern for the tail capacitor. For MOS capacitors, the operating range of gate voltage is from 0 to about half of $V_{DD}$ and hence effective capacitance is low compared to the situation when it is used in the full $V_{DD}$ range. These result in similar area density for both capacitors in 65-nm technology. Finally, the metal capacitor is used here because it is more
linear and has less variation than the MOS capacitor.

3.1.2 XOR Gates

In order to complete the PD, XOR gates need to be designed for high-speed operation. Conventional CML XOR in Fig. 3.12 suffers from the headroom issue in the advanced technologies with low supply voltages and the input A and B are asymmetric resulting in different propagation delay and systematic phase offset. In contrast, the XOR in Fig. 3.13 [18] has perfect symmetry between the two inputs and relaxed headroom issue by avoiding stacking stages. The output is single-ended, which is indeed preferable for the connection with a V/I converter.

![Figure 3.12: Conventional XOR gate](image)

$V_B$ is set to the input common-mode level. When both A and B are high or low, one of the source nodes of the input pairs remains at low level, thus turning on transistor $M_1$ or $M_2$ and flowing the current to the output node. On the other hand, when only one of A and B is high, both source nodes of the input pairs rise, thus turning off $M_1$ and $M_2$ and flowing no signal current to the output node. This is XNOR operation and will function as a XOR by simply swapping the input connection in the V/I converter. This XOR works as an error XOR.
Reference XORs are configured in order to add two outputs of XORs as shown in Fig. 3.14. Two XORs can share one current mirror so that output currents can be added through the mirror transistor, instead of using two separate XORs in Fig. 3.13 and adding their voltage outputs with an additional circuit. Thus, additional power consumption is minimized compared to one reference XOR of the original architecture in [16]. Note that the XOR output bandwidth is unimportant because the subsequent voltage-to-current (V/I) converter only senses the dc content of this output.

XOR operation could be implemented in charge steering. However, error XOR must generate NRZ output, and its output is not synchronous with the clock,
and hence charge-steering XOR cannot be used as error XOR. Reference XOR is also implemented in current steering because it is desirable to have symmetric topology between error and reference XOR for robust operation even though the output of reference XOR is synchronous with the clock.

3.1.3 V/I Converter

![Figure 3.15: V/I converter and loop filter](image)

The V/I converter produces current proportional to the difference between two averages of error and reference and drives the loop filter. Figure 3.15 shows a simple OTA that works as a V/I converter and provides rail-to-rail swings for the oscillator control line. V/I converters are free from the dead zone issue because it is not necessary to switch after every phase comparison [19].

The loop filter controls the loop characteristics such as the loop bandwidth and stability. In this design, only a part of $C_2$, about a 2-pF capacitor, is implemented on chip and rest of the loop filter, $R_P$ and $C_P$, are off-chip components connected in the PCB.
3.2 Deserializer

While the half-rate PD performs one level of demultiplexing, it is typically necessary to further deserialize the data for ease of use by the subsequent processor. Moreover, the data retimed by the PD must be converted to the NRZ format at some point. These two functions are explained in this section. The deserializer is an important building block in high-speed receivers and consumes lots of power. Table 3.1 shows the power consumption of the published demultiplexers at various data rates. This indicates that demultiplexing also requires significant power. Fortunately, charge-steering circuits can save power here as well.

The recovered data in nodes $Y_1$ and $Y_2$ are already demultiplexed by a factor of two, necessitating two 1:2 DMUX as depicted in Fig. 3.16. Two 1:2 DMUX require quadrature phases at 6.25 GHz because $V_{Y_1}$ and $V_{Y_2}$ are shifted each other by $90^\circ$ in 6.25 GHz. Fortunately, a divide-by-two circuit can generate quadrature phases.

3.2.1 1:2 Charge-Steering DMUX

1:2 DMUX needs to process RZ input at 12.5 Gb/s, while demultiplexers in [21], [22], and [20] deal with NRZ data. Resettable amplifiers in [23] perform demultiplexing RZ data. These amplifiers have a hold phase to handle the reset phase.

Table 3.1: Published demultiplexers

<table>
<thead>
<tr>
<th>Reference</th>
<th>[21]</th>
<th>[22]</th>
<th>[20]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data Rate (Gb/s)</td>
<td>10</td>
<td>11</td>
<td>40</td>
</tr>
<tr>
<td>Demux Ratio</td>
<td>1:4</td>
<td>1:8</td>
<td>1:2</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>38</td>
<td>69</td>
<td>108</td>
</tr>
</tbody>
</table>

1Note that power consumption of [20] includes power of the output buffer.
of RZ data. Fortunately, charge-steering latches can replace them without any modification or additional circuits because they have a reset phase in nature and hold the evaluated output when the input is reset in the middle of the evaluation phase. Figure 3.17 depicts how two charge-steering latches with divided complementary clocks perform demultiplexing, thus saving power compared to other implementation in [21]. Note that the outputs of 1:2 DMUX are still in RZ form.

We wish to demultiplex the 12.5-Gb/s data at $Y_1$ and $Y_2$ in Fig. 3.7(a) by means of charge-steering latches driven by a quarter-rate clock. We also prefer to avoid the cascading issue described in Fig. 2.17(a) to maintain the integrity of the data. Fortunately, it is possible to realize the timing relationship of Fig. 2.17(b) at this interface. Illustrated in Fig. 3.17(a), the idea is to exploit the quadrature outputs of a divide-by-two circuit to drive the latches. Figure 3.16(b) shows the timing relationship between the clocks applied to the latches. We observe that, when $CK$ and $CK_{1/2,1}$ go high, $L_3$ and $L_7$ enter the evaluation mode, behaving
Figure 3.17: 1:2 DMUX with charge-steering latches like the master-slave configuration of Fig. ??(a), even though each is realized as the RZ latch of Fig. ??(a). Similarly, when $CK$ goes low and $CK_{1/2,Q}$ goes high, $L_4$ and $L_9$ begin to evaluate.

Figure 3.18 shows the simulation results of a 1:2 demultiplexing operation. Red waveforms show the recovered 12.5-GHz clock and the recovered 12.5-Gb/s RZ data, and blue waveforms the divided 6.25-GHz clock and demultiplexed 6.25-Gb/s RZ data, indicating output swings of about 400 mV.
3.2.2 Frequency divider

The divide-by-two circuit must operate with an input frequency of 12.5 GHz and drive four inverter buffers, each having a NMOS width of 1.2 μm and PMOS width of 2.4 μm. To generate quadrature outputs, the circuit must incorporate two identical stages in a feedback loop, e.g., two latches of the form shown in Fig. 3.19(a). However, according to simulations such a divider fails around 12 GHz.\(^2\)\footnote{The charge-steering latch cannot be used here as it needs a reset phase.}
Figure 3.19: (a) Rail-to-rail latch, (b) operation of the latch, (c) new latch, (d) simulated speed of the dividers, (e) simulated power consumption of the dividers.
To examine the above latch’s failure mechanism, consider the state depicted in Fig. 3.19(b), where $V_X = 0$, $V_Y = V_{DD}$, $D_{in} = 0$, $\bar{D}_{in} = V_{DD}$, and $CK$ goes high. Two transitions must occur: $V_Y$ must fall to zero and, as a result, $V_X$ must rise to $V_{DD}$. Note that the rise in $V_X$ is critical as it provides the overdrive for the input transistor of the other latch in the loop. The fall in $V_Y$ is less important because it simply turns off one input transistor of the next latch. We observe from Fig. 3.19(b) that during this operation, (1) $M_4$ fights the series combination of $M_{CK}$ and $M_2$, and (2) $V_X$ rises little before $V_Y$ reaches zero. Thus, $M_3$ must be, on the one hand, strong enough to rapidly charge the capacitance at $X$, and, on the other hand, weak enough not to vehemently fight the series combination of $M_{CK}$ and $M_1$ (in the next half cycle). This trade-off limits the maximum toggling speed of the divider, causing failure if $V_X$ does not rise enough in $T_{CK}/2$ seconds.

The foregoing study suggests that the speed can be improved if the rise in $V_X$ (or $V_Y$) is somehow augmented. This can be accomplished by means of NMOS source followers [Fig. 3.19(C)]. While increasing the latch input capacitance to some extent, each follower actively pulls up the corresponding output node, relaxing the above trade-off. In addition, the source followers provide an unclocked feedforward path, impressing the next state at $X$ (or $Y$) before the clock rises and the main path is activated. This feedforward action further improves the maximum speed, but at the cost of a lower bound on the toggle rate. Figure 3.19(d) plots the simulated output frequency as a function of the input frequency for the conventional and the proposed divide-by-two circuits. We note that the source followers raise the maximum speed to 14.5 GHz (while limiting the lower end to 0.4 GHz).

Another remarkable attribute of the proposed divider is that it consumes less
power than the conventional topology does [Fig. 3.19(e)]. Since the sourcefollowers reduce the rise and fall times at the output nodes, the crowbar current flowing from $V_{DD}$ to ground during transitions decreases, thereby lowering the power consumption by about 20% at 12 GHz. The power drawn by the proposed latch is now close to the minimum value of $fCV_{DD}^2$, where $C$ denotes the capacitance of the divider transistors and, more significantly, the input capacitance of the demultiplexer [e.g., $L_7$ and $L_8$ in Fig. 3.17(a)].

3.2.3 RZ-to-NRZ Conversion

With the data rate brought down by the deserializer to 6.25 Gb/s, the task of RZ/NRZ conversion becomes simpler. The conversion can be performed by applying the RZ data to a simple RS latch as shown in Fig. 3.20: when both inputs are zero, the latch maintains the previous state, and when one input goes high, the state can change. However, a rail-to-rail latch requires that the moderate output swings of $L_7$–$L_{10}$ in Fig. 3.17(a) be amplified.

A simple amplifier at 6.25 GHz would consume 1 or 2 mW, but a clocked comparator can be much more efficient amplifier and be placed before the RS latch as shown in Fig. 3.21.

More efficient amplification can be realized by means of (clocked) comparators. Illustrated in Fig. 3.21(a), the idea is to utilize the quadrature phases of the 6.25-GHz clock to drive $L_7$ and a comparator in a master-slave fashion. When $CK_{1/2,I}$ rises, $L_7$ enters the evaluation mode; 80 ps later, $CK_{1/2,Q}$ rises, allowing
Figure 3.21: (a) RZ-to-NRZ conversion, (b) proposed comparator.

The comparator to regenerate to the rails.

Owing to its low power consumption, the StrongARM comparator [24] or its modified version [25]\(^3\) is attractive here, but in 65-nm technology it does not robustly operate at 6.25 GHz. Figure 3.21(b) shows the modified, faster design: the cross-coupled PMOS devices are removed, thus reducing the capacitance at the output node and improving the speed by about 8%. In the absence of these devices, the high level at the output degrades if the input differential voltage

\(^3\)The modified version adds reset switches to the drains of the input transistor, suppressing dynamic offsets.
is not large enough to keep $M_1$ or $M_2$ off. This issue is not problematic here because $L_7$ in Fig. 3.17(a) produces a swing of more than 400 mV. According to simulations, the comparator, the inverters, and the RS latch in Fig. 3.21(a) drain a total of 148 $\mu$W at 6.25 Gb/s.

Figure 3.22 illustrates the entire path of 1:2 DMUX and RZ-to-NRZ conversion. Each half-rate recovered data stream produced by the CDR flows through one charge-steering latch, one comparator, and one RS latch. The other DMUX path uses quadrature phase for charge-steering latches and in phase for clocked comparators.

![Figure 3.22: (a) 1:2 DMUX with RZ-to-NRZ conversion (b) Circuits in one arm](image)

Figure 3.23 shows the simulated RZ-to-NRZ conversion. The red clock triggers DMUX, producing large enough output swing. The comparator is clocked by the blue quadrature phase and amplifies the data to rail-to-rail level. Finally, RS
Figure 3.23: Simulated waveforms of RZ-to-NRZ conversion
latch produces 6.25-Gb/s NRZ data.

### 3.3 Overall System

Figure 3.24(a) shows the overall CDR/deserializer architecture. The CDR loop consists of the PD described in Section 3.1.1, a V/I converter, a loop filter, and an LC VCO. For \( R_1 = 500 \, \Omega \), \( C_1 = 80 \, \text{pF} \), \( C_2 = 8 \, \text{pF} \), and \( K_{\text{VCO}} = 1 \, \text{GHz/V} \), the loop exhibits the simulated transient behavior shown in Fig. 3.24(b), locking in about 50 ns. The retimed half-rate differential data at \( Y_1 \) and \( Y_2 \) is plotted in Fig. 3.24(c). The loop bandwidth is approximately 6 MHz.

The overall system draws 5 mW, and the simulated power breakdown is shown on the table 3.2. The phase detector consumes 1.3 mW, the divider, 1.24 mW, the two demux paths 0.7 mW, the V/I converter 0.43 mW, and the VCO 1.37 mW. From the measurement results in the next chapter, the actual prototype operates with less than 5 mW.

<table>
<thead>
<tr>
<th>PD</th>
<th>Divider</th>
<th>DMUX</th>
<th>V/I</th>
<th>VCO</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.3 mW</td>
<td>1.24 mW</td>
<td>0.7 mW</td>
<td>0.43 mW</td>
<td>1.37 mW</td>
<td>5.04 mW</td>
</tr>
</tbody>
</table>

### 3.3.1 The VCO and its Interface

The VCO can draw considerable power and must therefore be designed with three considerations in mind: (1) the amount of random jitter that it introduces in the locked state, (2) the amount of load capacitance that it must drive, and (3) whether it must drive the load capacitance directly (with rail-to-rail swings) or through buffers. The relative severity of these issues depends on the frequency
Figure 3.24: Complete architecture
of operation, the jitter target, the PD clock swing and drive requirements, and
the routing capacitances.

![Diagram of phase noise profile](image)

Figure 3.25: Locked phase noise profile for jitter calculations.

Let us begin with the first issue. Suppose the locked VCO exhibits the phase
noise profile shown in Fig. 3.25, where $f_{BW}$ denotes the CDR loop bandwidth.
To obtain the rms jitter, $\Delta T_j$, we integrate the area under this plot and normalize
the result to the VCO period, $T_{CK}$. If the declining phase noise beyond an offset
of $\pm f_{BW}$ can be approximated by $1/f^2$, then the total area is equal to $4f_{BW}S_0$.
Thus,

$$\Delta T_j = \frac{\sqrt{4f_{BW}S_0}}{2\pi T_{CK}}.$$  \hspace{1cm} (3.1)

For example, to target $\Delta T_j = 1$ ps, rms with $f_{BW} = 6$ MHz, we require $S_0$ to
be less than $-96$ dBc/Hz. That is, the free-running VCO must provide a phase
noise of less than $-96$ dBc/Hz at 6-MHz offset.

It is instructive to estimate the minimum VCO supply current, $I_{SS}$, that yields
the requisite phase noise. From [26, 27], we express the free-running phase noise
of an LC VCO with one (NMOS or PMOS) cross-coupled pair as

$$S(\Delta \omega) = \frac{\pi^2}{R_P I_{SS}^2} \frac{kT}{8(\gamma + 1)} \frac{\omega_0^2}{4Q^2} \Delta \omega^2,$$  \hspace{1cm} (3.2)

where $R_P$ denotes the equivalent parallel resistance of the differential tank at
resonance, and $\gamma$ the noise coefficient of MOSFETs. For a peak-to-peak single-
ended swing, $2R_P I_{SS}/\pi$, of 1 V, $\gamma=1$, and $Q=8$, we obtain $I_{SS} \approx 6.3 \mu A$, $R_P \approx 250 \text{k}\Omega$, and hence a tank inductance of nearly 400 nH! In other words, the phase noise specification is much more relaxed than the other two issues mentioned above.

![Figure 3.26: (a) P-N LC VCO (b) N-only LC VCO](image)

To address the second and the third issues, we note that the clock in Fig. 3.24(a) must drive eight latches, the divider, and about 45 $\mu$m of interconnects in the layout—a total of approximately 270 fF. We consider the two scenarios depicted in Fig. 3.27. The minimum power that two buffers (for $CK$ and $\overline{CK}$) would consume to drive the 270-fF capacitance is equal to $2f_{CK}C V_{DD}^2 = 6.75 \text{ mW}$. It is therefore highly desirable to avoid these buffers and absorb the capacitance into the VCO tank. Allowing another 50 fF for the VCO and inductor capacitances, we choose a differential inductance of 1 nH, obtaining $R_P = QL\omega \approx 630 \Omega$ and hence $I_{SS}=2.5 \text{ mA}$ for a 1-ppp single-ended swing in Fig. 3.26(b). As shown in Fig. 3.26(a), the use of both PMOS and NMOS cross-coupled pairs permits a twofold reduction in this current, leading to a power consumption of 1.25 mW. The actual design draws 1.4 mA and employs MOS varactors along with a two-bit capacitor bank for tuning.

Both VCOs in Fig. 3.26 have the same maximum FOM for the same $V_{DD}$ and LC tank [28]. A N-only VCO in Fig 3.26(b) is optimal for the lowest phase noise.
case, whereas a P-N VCO in Fig 3.26(a) is suitable for the case when a higher phase noise is acceptable, which allows the VCO to consume less power [29].

![Diagram](image)

**Figure 3.27:** Two scenarios of driving the load by the VCO.

The key idea proposed here is that it is generally advantageous to omit the buffers and utilize their power consumption in the VCO itself. However, the absence of buffers after the VCO raises two concerns: (1) The VCO may experience coupling from the input data through the PD latches [30]. Fortunately, the large capacitance seen at each output node of the VCO suppresses this effect, yielding a (simulated) peak-to-peak jitter of 300 fs due to this coupling. (2) The interconnect resistance and the MOS gate resistance may degrade the tank Q. According to simulations, this effect raises the VCO phase noise by 0.07 dB, pointing to the direct VCO/PD interface as the preferable approach in CDR design.

Figure 3.28 and 3.29 show the simulation results of the VCO. The $K_{VCO}$ is about 1 GHz/V, and the phase noise at 1-MHz offset frequency is $-104$ dBc/Hz, providing enough margin. The simulated swings of VCO outputs reach rail-to-rail output level.

The inductor in the VCO must be designed carefully to make the VCO oscillate at 12.5 GHz and achieve low phase noise with low power consumption. As
shown in Eq. 3.2 and Eq. 3.3 [15], the phase noise and the output amplitude are proportional to $Q^2$ and $Q$ of the inductor, respectively. In order to maximize $Q$ of the inductor, metal-8 and -9 layers are used in parallel. A 2-turn octagonal inductor compromises its area and quality factor. 1-turn inductor has a higher $Q$,
but occupies large area, whereas more than 2 turns introduces excessive degradation of Q. The detailed dimensions of the 2-turn octagonal inductor are: 10-μm width of metal, 2-μm space between turns, and 72-μm distance from center to edge, resulting in an inductance of 1 nH. The quality factor of NMOS varactor Q is also important because it degrades Q of the tank. The minimum channel length is therefore used for the varactors.

\[
\Delta V_{out} = \frac{4}{\pi} I_{SS} R_P, \quad R_P = \omega L Q
\]  

(3.3)
CHAPTER 4

Experimental Results

The CDR/deserializer prototype has been fabricated in TSMC’s 65-nm digital CMOS technology and characterized with a 1-V supply. Figure 4.1 and Figure 4.2 show die photographs of the circuit and its core, respectively. The chip area is about 1.1 mm $\times$ 0.75 mm, while the core area is about 230 $\mu$m $\times$ 170 $\mu$m. Most of the core area is occupied by an inductor for the VCO.

Figure 4.1: Die photograph of the prototype
Figure 4.2: Die photograph of the core

Figure 4.3: Picture for test setup
4.1 Test Setup

The chip has been directly mounted on a printed-circuit board, with the input and output connections provided by high-speed probes. Differential 100-Ω resistor is placed on the chip for input termination, and the common-mode level of input data is set by the external bias tee. Open-drain PMOS’s are used for output buffers of clock and data.

A 4:1 MUX drive the circuit with a singled-ended swing of 300 mV_{pp} and a Centellax bit error rate (BER) tester captures its outputs. Figure 4.4 shows that the measured input data exhibits a peak-to-peak jitter of 7.1 ps and an rms jitter of 1.37 ps.

![Figure 4.4: The measured 25-Gb/s input data](image)
4.1.1 Setup for BER Test

Figure 4.5: Basic test setup for CDR

As depicted in Fig. 4.5, the 12.5-GHz clock provided from the Agilent E8257D drives the 4:1 MUX to generate 25-Gb/s data stream, while divided 6.25-GHz clocks drive 4 PRBS generators that produce 6.25-Gb/s PRBS data. The BER tester captures the recovered quarter-rate output from the prototype and measure the BER. The recovered data and clock are also captured in the oscilloscope and the spectrum analyzer.

4.1.2 Setup for Jitter Transfer Test

Figure 4.6: Test setup for jitter transfer

Generating high-frequency jitter is a challenge in test setup, especially when a specialized jitter measurement instrument, such as Agilent N4903B JBERT or
Centellax SSB32J is not available. We can create sinusoidal jitter by combining two tone whose separation is the desired jitter frequency as depicted in Fig. 4.6.

Figure 4.7 and equation 4.1 show the operation of jitter generation in clock [15]. One sideband is decomposed into Amplitude Modulation (AM) and Frequency Modulation (FM) components. The limiting operation of the 4:1 MUX and frequency divider eliminate the AM component of the one sideband. Jitter frequency is simply $\omega_c - \omega_m$, and thus we can generate arbitrary high jitter frequency with low jitter amplitude.

$$A \cos \omega_c t + a \cos(\omega_c + \omega_m) = \frac{A}{2} \cos \omega_c t + \frac{a}{2} \cos(\omega_c + \omega_m) + \frac{a}{2} \cos(\omega_c - \omega_m)$$

$$+ \frac{A}{2} \cos \omega_c t + \frac{a}{2} \cos(\omega_c + \omega_m) - \frac{a}{2} \cos(\omega_c - \omega_m)$$

(4.1)

**4.1.3 Setup for Jitter Tolerance Test**

The jitter tolerance test needs to generate a high jitter amplitude (i.e., $> 10 \text{ UI}_{PP}$). However, the previous jitter transfer setup cannot generate such a jitter amplitude. Instead, phase modulation (PM) or frequency modulation (FM) function in Agilent E8257D can generate a jitter amplitude up to 100 UI$_{PP}$ for a low frequency. Figure 4.8 shows the setup for jitter tolerance test.

It is instructive to derive the jitter amplitude from FM modulation. For the
FM modulation in the signal generator, FM rate $\omega_m$ and FM deviation $\omega_{\text{max}}$ are given, and the jitter amplitude can be derived from them. FM modulation is expressed in Eq. 4.2, where $\omega_c$ denotes the carrier frequency. Therefore, the maximum phase deviation is equal to $\frac{\omega_{\text{max}}}{\omega_m}$, which can be translated into the jitter amplitude as shown in Eq. 4.3 when $\omega_c$ is equal to the data rate.

$$x(t) = \cos(\omega_c t + \frac{\omega_{\text{max}}}{\omega_m} \cos \omega_m t) \quad (4.2)$$

$$J_A = \frac{\omega_{\text{max}}}{\omega_m} \frac{1}{\pi} \text{(UI}_{\text{PP}}) \quad (4.3)$$

### 4.2 Measurement Results

The prototype (excluding the output 50 Ω buffers) draws 4.97 mW, of which 1.4 mW is consumed by the VCO, 1.3 mW by the PD, 1.24 mW by the divider, and 0.43 mW by the V/I converter. The measured bit error rate is less than $10^{-12}$ with a PRBS of $2^{15} - 1$. Figure 4.9 shows the measured VCO characteristics, indicating a $K_{VCO}$ of 1 GHz/V.

Figure 4.10 shows the recovered half-rate clock spectrum. The locked phase noise within the loop bandwidth is around $-104 \text{ dBC/Hz}$, and the area under the plot from 100-Hz to 1-GHz offset yields an rms jitter of 1.52 ps. Figure 4.11
Figure 4.9: VCO characteristics

Figure 4.10: The recovered clock spectrum
Figure 4.11: The recovered data shows the measured quarter-rate recovered data, exhibiting an rms jitter of 2.56 ps.

Figure 4.12: Jitter transfer

The jitter transfer and tolerance of the prototype have also been measured and plotted in Fig. 4.12 and Fig. 4.13. The former indicates a loop bandwidth of about 6 MHz and the latter a tolerance of 0.5 UI$_{PP}$ at jitter frequencies as
Figure 4.13: Jitter tolerance

High as 5 MHz. To study the robustness of the circuit, the jitter tolerance is also measured with a 1.1-V supply, yielding similar results.

Table 4.1 summarizes and compares the performance of this work with that of two other CMOS examples from prior arts. The prototype retimes and demultiplexes the 25-Gb/s data, drawing 4.97 mW from 1-V supply and showing about a factor of 20 reduction in power consumption compared with the prior arts. We can also see the performance in the state-of-the-art CDR landscape in Fig. 4.14.
Table 4.1: Performance summary.

<table>
<thead>
<tr>
<th>Reference</th>
<th>This Work</th>
<th>[31]</th>
<th>[32]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data Rate (Gb/s)</td>
<td>25</td>
<td>25</td>
<td>25</td>
</tr>
<tr>
<td>Architecture</td>
<td>Linear</td>
<td>Linear</td>
<td>Bang-bang</td>
</tr>
<tr>
<td>Clocking</td>
<td>Half Rate</td>
<td>Half Rate</td>
<td>Full Rate</td>
</tr>
<tr>
<td>Demux Ratio</td>
<td>1:4</td>
<td>1:2</td>
<td>1:1</td>
</tr>
<tr>
<td>Supply (V)</td>
<td>1</td>
<td>1.1</td>
<td>1.2</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>4.97</td>
<td>98</td>
<td>99</td>
</tr>
<tr>
<td>Technology</td>
<td>65-nm</td>
<td>90-nm</td>
<td>65-nm</td>
</tr>
</tbody>
</table>

Figure 4.14: Summary of the CDR performance
CHAPTER 5

Conclusion

This work describes a half-rate clock and data recovery circuit and a deserializer that incorporate charge steering in phase detection and demultiplexing along with a new frequency divider and comparator. Realized in 65-nm technology, the overall circuit draws 5 mW from a 1-V supply, producing a clock with an rms jitter of 1.5 ps and a jitter tolerance of 0.5 UI_{pp} at 5 MHz. The circuit and architecture techniques culminate in a prototype that consumes about 20 times less power than prior art.

We also formulate the gain of both the RZ charge-steering latch and the NRZ charge-steering latch, showing the reasonable agreement with simulations. They can provide an intuition and guidelines in designing charge-steering circuits.

Four innovations that enable a power reduction by more than one order of magnitude are summarized as follows:

1. The use of charge steering can dramatically reduce the power consumption of high-speed circuits, affording a design style faster than rail-to-rail logic and less power-hungry than current steering.

2. A divider using new latches not only improves the speed of operation but also reduces the power consumption at high frequencies. At high frequencies, the power consumption of the divider is close to the minimum value of \( fCV_{DD}^2 \).
3. The new comparator removes the PMOS cross-coupled pair from the StrongARM comparator, thus reducing the capacitance at the output node and improving the speed. A moderate input swing is required for this circuit to operate robustly.

4. The new VCO interface drives the entire load capacitance, keeping loss and coupling from the input data negligible. As a result, we can remove clock buffers that consume significant power.

The techniques introduced in this work can be used in other part of wireline transceivers. For example, serializers in the transmitter also employ high-speed dividers that a new divider can serve, reducing the power consumption. Charge-steering latches can be used for driving the MUX in the transmitter and also be used in the analog front-end of the receiver such as the equalizer.
References


