Title
Design Techniques for High-Frequency CMOS Integrated Circuits: From 10 GHz To 100 GHz

Permalink
https://escholarship.org/uc/item/5rc8t3j5

Author
Deng, Zhiming

Publication Date
2010

Peer reviewed|Thesis/dissertation
Design Techniques for High-Frequency CMOS Integrated Circuits: From 10 GHz To 100 GHz

by

Zhiming Deng

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge:

Professor Ali M. Niknejad, Chair
Professor Robert G. Meyer
Professor Ming Gu

Fall 2010
Design Techniques for High-Frequency CMOS Integrated Circuits: From 10 GHz To 100 GHz

Copyright 2010
by
Zhiming Deng
Abstract

Design Techniques for High-Frequency CMOS Integrated Circuits: From 10 GHz To 100 GHz

by

Zhiming Deng
Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Ali M. Niknejad, Chair

Technology developments have made CMOS a strong candidate in high-frequency applications because of its low power, low cost and higher-level integration. However, as an essential element in an RF building block, a CMOS device is not as good as a BJT device in terms of speed and a HEMT device in terms of noise performance. Therefore, conventional low-frequency design techniques for CMOS circuits may not satisfy the requirements for high-frequency applications wherein the operating frequencies get close to the cut-off frequency of a CMOS device. This research work explores design techniques for various high-frequency circuits at 10 GHz, 60 GHz and up to 110 GHz. Individual building blocks including low-noise amplifiers, voltage-controlled oscillators, high-frequency true-single-phase-clock frequency dividers, and mm-wave amplifiers are studied thoroughly using both theoretical analysis and practical circuit designs. Related fundamental techniques, such as MOS device modeling and de-embedding techniques, are also explored. Furthermore, as a prototype of system-level integration, a Ku-band LNB front-end is implemented for the application of a satellite receiver.
To my parents
# Contents

List of Figures v

List of Tables x

I Introduction 1

II High-Frequency RFIC Designs 3

1 On the Noise Optimization of CMOS Common-Source Low-Noise Amplifiers 4
   1.1 Introduction ................................ 4
   1.2 General Design Considerations for CS LNAs .......... 6
       1.2.1 Descriptions of Modeling ..................... 6
       1.2.2 The Selection of Design Variables ............... 11
       1.2.3 Components for Power Matching ................ 11
       1.2.4 Design Constraints ........................ 12
   1.3 The Noise Optimization Techniques ................ 14
       1.3.1 Noise Factors of CS LNAs .................... 14
       1.3.2 Internal Noise Sources ....................... 16
       1.3.3 Optimizations of Noise Factors ................ 17
   1.4 Conclusion .................................... 35
   1.5 Definitions of Variables .......................... 36
   1.6 Physical Constants .............................. 36
   1.7 Theorems about NF Optimizations ................... 38
   1.8 Expressions in Noise Factor Formulation .......... 39

2 A CMOS Ku-Band Single-Conversion Low-Noise Block Front-End for Satellite Receivers 42
   2.1 Introduction .................................... 42
   2.2 Front-End Architecture ........................... 43
# 3 A 4-Port-Inductor-Based VCO Coupling Method for Phase Noise Reduction

3.1 Introduction ................................ 53
3.2 VCO Circuit Designs .............................. 54
  3.2.1 4-Port Inductor ............................. 54
  3.2.2 Interlocked-Ring Structure .................... 55
  3.2.3 LC Tank .................................. 55
  3.2.4 VCO Topologies ................................ 57
3.3 Experimental Results .............................. 58
  3.3.1 Passive Structures ........................... 58
  3.3.2 Prototype VCOs ................................ 58
3.4 Conclusion ...................................... 65

# 4 Design of CMOS True-Single-Phase-Clock Dividers Based on the Speed-Power Trade-Off

4.1 Introduction ..................................... 66
4.2 Basic TSPC Logic Family .......................... 67
4.3 TSPC Dividers and Prescalers ...................... 69
  4.3.1 Ratioless Divide-by-2 Divider ................. 69
  4.3.2 Ratioed Divide-by-2 Divider .................. 73
  4.3.3 Divide-by-2/3 Prescaler ....................... 76
4.4 Experimental Results ............................ 76
4.5 Conclusion ...................................... 82
4.6 TSPC Logic Truth Table .......................... 82

## III CMOS mm-Wave Techniques

### 5 A Layout-Based Optimal Neutralization Technique for mm-Wave Differential Amplifiers

5.1 Introduction ..................................... 88
5.2 Neutralization Technique .......................... 89
5.3 Design Approach .................................. 91
5.4 Experimental Results ............................ 94
5.5 Conclusion ...................................... 96
6 The “Load-Thru” (LT) De-embedding Technique for the Measurements of mm-Wave Balanced 4-Port Devices

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>6.1</td>
<td>Introduction</td>
<td>99</td>
</tr>
<tr>
<td>6.2</td>
<td>De-embedding Theory</td>
<td>100</td>
</tr>
<tr>
<td>6.2.1</td>
<td>DM and CM Separation</td>
<td>100</td>
</tr>
<tr>
<td>6.2.2</td>
<td>De-embedding Formula</td>
<td>102</td>
</tr>
<tr>
<td>6.2.3</td>
<td>Characterization of the Balun</td>
<td>103</td>
</tr>
<tr>
<td>6.2.4</td>
<td>Characterization of $Z_T$</td>
<td>104</td>
</tr>
<tr>
<td>6.3</td>
<td>Design Consideration</td>
<td>105</td>
</tr>
<tr>
<td>6.4</td>
<td>Measurement Verification</td>
<td>107</td>
</tr>
<tr>
<td>6.5</td>
<td>Conclusion</td>
<td>107</td>
</tr>
</tbody>
</table>

IV Conclusion

Bibliography
List of Figures

1.1 The schematics of an inductively degenerated common-source LNA with an external gate-source capacitor. Both the source impedance and the load impedance are real. ................. 5

1.2 The small signal model of a CS LNA including noise sources. ........ 6

1.3 The dependence of the MOSFET model parameters (a) normalized $\tilde{\omega_T}$, (b) $\alpha$ and (c) $b$ on the bias voltage. The transistor channel lengths are $L = 0.18 \, \mu m$, $L = 0.24 \, \mu m$ and $L = 0.5 \, \mu m$. .................... 10

1.4 The RODs and optimization results of the simple cases ($Q_{Lg} = Q_s = Q_{Ld} = \infty$, $b = 0$) with the assumptions (a) $A_{\text{max}} = \infty, T_{\text{max}} = \infty$, (b) $A_{\text{max}} = \infty, T_{\text{max}} = 0$, (c) $A_{\text{max}} > 2r^1, T_{\text{max}} = \infty$, (d) $A_{\text{max}} < 2r^1, T_{\text{max}} = \infty$. In each case, two sub-plots are generated corresponding to the condition $\frac{\tilde{\omega_T}}{2\chi_{\omega_0}} > r^1$ and $\frac{\tilde{\omega_T}}{2\chi_{\omega_0}} < r^1$ respectively. A solid line “—” with high weight is the trace of the conditional optimum $A_{\text{opt}}^T$ for each $T_0$. A “•” denotes the optimal design $\{A_{\text{opt}}^T, T_{\text{opt}}^T\}$ in an ROD. ........ 20

1.4 (Continued) ........................................ 21

1.5 An unconstrained 10 GHz CS LNA design uses ideal inductors ($Q_{Lg} = Q_s = Q_{Ld} = \infty$) and ignores $g_{mb}$ ($b = 0$). (a) The optimal noise figure varies with the capacitance ratio $\kappa$ ($V_{GS} - V_{TH} = 0.6 \, V$). (b) The optimal noise figure varies with the bias voltage $V_{GS}$ ($\kappa = 1$). (c) The drain current at the optimal design varies with the bias voltage $V_{GS}$ ($\kappa = 1$). The channel lengths are $L = 0.18 \, \mu m$, $L = 0.24 \, \mu m$ and $L = 0.5 \, \mu m$ respectively. ........................................ 23

1.6 An $I_{D}$-Constrained 10 GHz CS LNA design uses ideal inductors ($Q_{Lg} = Q_s = Q_{Ld} = \infty$) and ignores $g_{mb}$ ($b = 0$). The optimal noise figure varies with $I_{D,\text{max}}$ for (a) different $\kappa$ and (b) different $V_{GS}$. The device channel length is $L = 0.18 \, \mu m$. ......................... 26
1.7 The optimal noise figure varies with different combinations of $Q_C$, $Q_{Lg}$ and $Q_s$ in an unconstrained 10 GHz LNA design. The results are also compared to the approximate value calculated from (1.105). The capacitance ratio is fixed at $\kappa = 1$. We assume ideal inductor $L_d$ and ignore $g_{mb}$. The bias voltage is set $V_{GS} - V_{TH} = 0.6$ V. The device channel length is $L = 0.18 \mu$m.

1.8 The optimal noise figure varies with finite quality factors ($Q_C = Q_{Lg} = Q_s = Q$) in a 10 GHz $I_D$-constrained design ($I_{D,\text{max}} = 100$ $\mu$A). The capacitance ratio $\kappa$ is set to 1 and 2. We assume an ideal inductor $L_d$ and ignore $g_{mb}$. The bias voltage is set $V_{GS} - V_{TH} = 0.6$ V. The device channel length is $L = 0.18 \mu$m.

1.9 The disturbance of $F_{\text{opt}}^{(a)}$ from $F_{\text{II}(a)}^{\text{opt}}$ in an unconstrained design. The design setups are exactly the same as those of the design shown in Fig. 1.7. The perturbation is computed using (1.122).

1.10 The disturbance of $F_{\text{opt}}^{(c)}$ from $F_{\text{II}(c)}^{\text{opt}}$ in a low $I_D$-constrained design. The design setups are exactly the same as those of the design shown in Fig. 1.8. The perturbation is computed using (1.126).

2.1 The architecture of a Ku-band LNB down-converter.
2.2 The two-stage LNA.
2.3 The noise model of a cs amplifier with inductive degeneration.
2.4 The single-balanced mixer with the LO buffer.
2.5 The three-stage IF amplifier.
2.6 The chip microphotograph of the front-end.
2.7 Measured S-parameters of the stand-alone LNA.
2.8 Measured noise parameters of the stand-alone LNA.
2.9 Front-end conversion gain for the RF band and the image band (LB and HB respectively) versus IF frequencies.
2.10 Front-end SSB NF, OIP3 and output P1dB (LB and HB respectively) versus IF frequencies.

3.1 (a) A stand-alone VCO and its phase noise plot. (b) $N$ equal VCOs are coupled by connecting their outputs together. The output voltage waveform is the same as that of a single VCO but the phase noise is reduced by $10 \cdot \lg N$ (dB).
3.2 (a) A 4-port inductor. The distributed model for (b) 2-port DM operations and (c) 4-port DM operations with port 1 and 4 in phase and port 2 and 3 in phase.
3.3 A 5-bit (31-unit) coarse tuning capacitor array is connected in the interlocked-ring structure. The current directions are shown by the arrows.
3.4 (a) A 2-port $LC$ tank. (b) A cross-coupled 4-port $LC$ tank.
3.5 The schematics of three VCO topologies: (a) NVCO, (b) NCVCO and (c) CCVCO. In these plots, the interlocked-ring structures of capacitor arrays have been simplified.

3.6 The measured DM 2-port and 4-port characteristics of a 4-port inductor: (a) the inductances and (b) the quality factors.

3.7 The measured DM impedance of a cross-coupled 4-port LC tank.

3.8 The chip micrograph of the CCVCO design with the output buffer included.

3.9 The measured frequency tuning range of the CCVCO design with 6-bit coarse tuning control.

3.10 The measured phase noise of the three VCO designs: (a) NVCO, (b) NCVCO and (c) CCVCO. Their coarse tuning control codes are all set to 0x1F.

4.1 The family of CMOS TSPC logic gates. Ratioless types: (a) CC, (b) CN, (c) CP, (d) NC and (e) PC. Ratioed types: (f) NP and (g) PN. A general symbol (h).

4.2 (a) A three-stage edge-triggered TSPC DFF and (b) its symbol.

4.3 (a) A TSPC divide-by-2 divider. (b) A TSPC divide-by-2/3 prescaler. The divider is 2 when MC = 1 and is 3 when MC = 0.

4.4 The 4-phase divide-by-2 operation of two types of ratioless TSPC dividers: (a) RE-0 (PC-CC-NC) and (b) RE-1 (PC-CN-NC). The turned-off transistors are depicted in gray.

4.5 Comparisons of simulated nodal waveforms of different types of dividers. (a) Input clock signal (four phases). (b) RE-2 versus RE-1. (c) RE-3 versus RE-2. (d) RE-4 versus RE-3.

4.6 The schematics of different types of divide-by-2/3 prescalers. The ratioless type: (a) RE-1 and the ratioed types: (b) RE-2, (c) RE-3 and (d) RE-4.

4.6 (Continued).

4.7 (a) The common on-chip configuration for the testing of various DUTs. (b) The micrograph of a test chip. Pads for the supply and the mode control “MC” are not shown.

4.8 The measurement results of various types of divide-by-2 dividers. (a) Input sensitivity curves. (b) Power consumption with input power of 0 dBm.

4.9 The measurement results of various types of divide-by-2/3 prescalers. (a) Input sensitivity curves and (b) power consumption for the divide-by-2 operation (MC=1). (c) Input sensitivity curves and (d) power consumption for the divide-by-3 operation (MC=0). The input power is set to 0 dBm for power consumption measurements.
4.9 (Continued)................................. 84
4.10 The measured output waveforms of the RE-4 divide-by-2/3 prescaler, further divided by 8, with an input frequency of 18 GHz in (a) the divide-by-2 mode (MC=1, $f_{out}=18/16=1.125$ GHz) and (b) the divide-by-3 mode (MC=0, $f_{out}=18/24=0.75$ GHz). .............................. 85

5.1 The single-ended configuration and its small-signal model. ................. 89
5.2 A simplified diagram of the interdigital layout of a differential pair. .......... 92
5.3 The differential-mode small-signal model of the proposed layout structure in Fig. 5.2. .......................................................... 93
5.4 $MSG$, $G_{\text{max}}$ and the stability factor $K$ vary with neutralization capacitor $C_n$ (Mutual inductance $M$ ignored). ................................. 93
5.5 The comparisons of $G_{\text{max}}$ between different layout configurations. (a) The partial neutralization designs ($wf = 1\mu m$, $nf = 16$ and single-layer coupling). (b) The over neutralization designs ($wf = 0.75\mu m$, $nf = 16$ and multi-layer coupling). (c) Designs use the same unit cell but different numbers of cells ($wf = 0.75\mu m$, $s = 0.47\mu m$ and multi-layer coupling). For the neutralized designs ($n > 0$) in (a) and (b), $s = s_{\text{min}} + \Delta s$ and $s_{\text{min}}$ is the minimal metal spacing defined by the process. ........................................ 95
5.6 The chip micrographs of (a) a 60 GHz single-stage amplifier, (b) a 60 GHz two-stage amplifier and (c) a 110 GHz single-stage amplifier with on-chip baluns for de-embedding. .......................... 97
5.7 The $S$-parameters of (a) a 60 GHz single-stage amplifier obtained from direct measurements, (b) a 60 GHz two-stage amplifier obtained from direct measurements and (c) a 110 GHz single-stage amplifier obtained from de-embedding. .............................. 98

6.1 The mixed-mode models of (a) an ideal balun and (b) a balanced 4-port device. .......................................................... 101
6.2 (a) The measurement setup of a symmetric 4-port device using ideal baluns. (b) The equivalent mixed-mode model of the setup. ............... 103
6.3 (a) The measurement setup of two back-to-back baluns with termination loads $Z_T$ in the middle. (b) The equivalent mixed-mode model of the setup. ................................. 104
6.4 The setup for the characterization of $Z_T$ using 1-port open-short de-embedding. The lumped-element model for the de-embedding parasitics are depicted inside the dashed-line box. .......................... 105
6.5 The 3-D balun structure. .................................................. 106
6.6 The simulated mode-conversion level $S_{\text{d1}}$ and the insertion gain $S_{\text{d1}}$ of baluns with different diameters (fixed $W = 0.5\mu m$) and different widths (fixed $D = 20\mu m$). ................................. 106
6.7 The micrograph of (a) the 60 GHz differential amplifier with input and output baluns, (b) the “load” de-embedding structure and (c) the “thru” de-embedding structure. .......................... 108

6.8 The de-embedded differential-mode $S$-parameters of a differential amplifier using the LT method (solid line) are compared with the directly measured data by using a balun probe (dashed line). (a) Magnitudes and (b) phases. .......................... 109
List of Tables

1.1 The $H$−NTFs of the Equivalent Circuit of a CS LNA ............... 17
1.2 The Formula to Compute $\left. \frac{dF_{out}}{db} \right|_{b=0}$ .................... 35
1.3 The Major Variables ................................................. 37
1.4 The Physical Constants ............................................... 37
1.5 The Virtual CMOS Process Constants .............................. 37
1.6 The Intrinsic Noise Parameters ..................................... 38
1.7 The Expressions about $|U|^2$, $|E|^2$ and $UE^*$ ..................... 40
1.8 The Expressions for the Calculation of (1.122), (1.123) and (1.126) .. 41
2.1 Comparison of integrated LNB front-end performance .............. 50
3.1 VCO Performance Summary and Comparison ....................... 64
4.1 Four Types of 3-Stage Ratioless DFFs ............................. 69
4.2 The Signal Transition in the 4-Phase Divide-by-2 Operation of a Rising-Edge-Triggered Divider ......................................................... 70
4.3 Three Types of 3-Stage Ratioed DFFs .............................. 74
4.4 The Truth Table of the TSPC Logics ................................. 86
Acknowledgments

I would like to express my deep and sincere gratitude to my research advisor, Professor Ali M. Niknejad, Professor of the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley. His wide knowledge and his logical way of thinking have been of great value for me. His understanding, encouraging and personal guidance have provided a good basis for the present dissertation.

I would also like to acknowledge Professor Robert G. Meyer, Professor Elad Alon, Professor Seth Sanders, Professors of the Department of Electrical Engineering and Computer Sciences, and Professor Ming Gu, Professor of the Department of Math, all from University of California at Berkeley, for their beneficial feedback on the research and dissertation writing.
Part I

Introduction
Technology developments have made CMOS a strong candidate in high-frequency applications because of its low power, low cost and higher-level integration. However, as an essential element in an RF building block, a CMOS device is not as good as a BJT device in terms of speed and a HEMT device in terms of noise performance. Therefore, conventional low-frequency design techniques for CMOS circuits may not be satisfactory in high-frequency applications wherein the operating frequencies get close to the cut-off frequency of a CMOS device. The goal of this research work is to explore design techniques for various high-frequency circuits.

This dissertation is organized in two parts. In Part II, the proposed circuit design techniques focus on the 10-GHz applications. In Chapter 1, a thorough discussion has been focused on the optimization of CMOS common-source low-noise amplifiers. The optimization method is according to different design constraints. The analysis is based on an accurate device model and can be applied to any frequency in general. In Chapter 2, a Ku-band integrated CMOS low-noise block receiver front-end is demonstrated for the application of satellite receivers. Very low noise figure has been achieved by applying the noise optimization technique. In Chapter 3, our discussion moves to the design of nonlinear circuits. A VCO coupling technique of using a 4-port inductor is proposed for VCO phase noise reduction. Various VCO topologies are demonstrated and compared in terms of phase noise performance. This technique opens the opportunity for low phase noise performance with the supply voltage scaling down with the device sizes. In Chapter 4, design strategies for true-single-phase-clock (TSPC) dividers are discussed. They are important building blocks for high-frequency local-oscillator generation circuitries. A TSPC synthesis approach is proposed and different TSPC divider structures are compared in terms of speed and power consumption.

In Part III of the dissertation, amplifier designs at mm-wave frequencies are discussed. In Chapter 5, a layout-based neutralization method is proposed for power gain enhancement of differential amplifiers. Different from conventional neutralization methods, the new technique utilizes the coupling capacitor between signal wires and requires no additional external capacitors. Neutralization theory is also provided as a theoretical basis for the proposed method. In Chapter 6, an LT de-embedding technique is proposed to measure the differential mode characteristic of a balanced 4-port device using only 2-port measurements. This technique generally extends the measurable frequency range of 4-port devices where either differential probes or 4-port vector network analyzers are not available. This de-embedding technique supports the work presented in Chapter 5.
Part II

High-Frequency RFIC Designs
Chapter 1

On the Noise Optimization of CMOS Common-Source Low-Noise Amplifiers

In this work, we propose a general noise optimization technique for the CMOS common-source (CS) low-noise amplifiers (LNA). By directly employing the short-channel MOSFET $I-V$ characteristic and van der Ziel’s noise model, we derive design equations for the selection of the circuit design parameters, such as transistor sizes, passive component values and bias voltages, subject to various gain and current consumption constraints. We also include several side effects including finite quality factor and the back-gate transconductance into the optimization process and analyze their impact on the optimization results. Design examples based on virtual but realistic process parameters are given to verify our analysis and to give design intuition.

1.1 Introduction

CMOS technology is one of the most competitive options for radio-frequency integrated circuit designs with the advantages of low cost and the possibility of system integration. The common-source (CS) amplifier architecture, including its derivatives, gives high-gain and low-noise performance. Therefore, research on the CMOS CS low-noise amplifiers (LNA) has been an active topic. CMOS LNA design and operation at 10 GHz and 60 GHz has been demonstrated [1] [2]. As the operating frequency gets higher, there are two major concerns. One is that the power gain of each stage becomes very limited. The other is that the losses of passive devices can severely affect the performance.

Much research on CMOS CS LNA noise optimization has been done. According to the traditional noise analysis of a linear two-port network [3], a noise optimization
technique for inductively degenerated CS LNAs with constant-bias or constant-power constraints is developed in [4]. Further, [5] includes the series resistances of the inductors into the noise factor formula. In [6], an approach of adding an external capacitor between the gate and the source is proposed for current-limited designs. The selection of the degeneration inductor and the sizing of transistors in cascode LNAs are discussed in [7] based on exhaustive sweeps. [8] analyzes and compares various design strategies by deriving the full sets of noise parameters.

Nevertheless, there are still some interesting unsolved problems. First, the gain-constraint has not been considered. Second, no analysis has been given for the selection of the degeneration inductance under different design constraints. Third, design formulas including the finite quality factors of the inductors have not been derived and the effects of the finite quality factors on the noise figure degradation have not been analyzed either. To resolve these issues, the authors propose a general noise optimization technique for a CS LNA, Fig. 1.1. The chapter is organized as follows. Section 1.2 formulates the optimization problem for CS LNA designs. It describes the models for both active and passive devices, defines the design variables and introduces several optimization constraints. Section 1.3 gives a thorough discussion to the proposed optimization technique. Analytical solutions of the optimization problem are derived for the simple case with lossless inductors, the more complicated case with finite-$Q$ inductors. Finally the complete case including the body effect is analyzed.
1.2 General Design Considerations for CS LNAs

Fig. 1.2 depicts the schematic of an inductively degenerated CS LNA. Before the noise optimization methods are shown, we need to discuss some general issues about the CS LNA design. These include modeling descriptions for both active and passive components, definitions of design variables and various practical design constraints. Many of the assumptions and notations we make in this section will be used throughout this chapter.

1.2.1 Descriptions of Modeling

The drain current of an N-type MOSFET that operates in the saturation region can be described by the formula

$$I_{DSAT} = WC_{ox}\nu_{sat} \frac{(V_{GS} - V_{TH})^2}{V_{GS} - V_{TH} + mE_{sat}L}. \quad (1.1)$$

$W$ is the channel width. $C_{ox}$ is the gate oxide capacitance per unit area and it is related to the oxide thickness $T_{ox}$ by $C_{ox} = \frac{\varepsilon_{ox}}{T_{ox}}$. $\nu_{sat}$ is the saturation velocity and can be considered a constant. $m = 1 + \frac{3T_{ox}}{W_{dep,max}}$, $W_{dep,max}$ is the maximal depth of the depletion region at the channel surface. $E_{sat}$ is the saturation electric field strength which is connected to the bias condition $V_{GS}$ by the effective mobility $\mu_{eff}$,

$$E_{sat} = \frac{2\nu_{sat}}{\mu_{eff}} = \frac{2\nu_{sat}}{\mu_{eff}} \left[ 1 + \left( \frac{V_{GS} + V_{TH} + 0.2V_{TH}}{6T_{ox}E_{o}} \right)^{\eta} \right]. \quad (1.2)$$

We have used a curve-fitting equation for $\mu_{eff}$ where $\mu_0$, $\eta$ and $E_{o}$ are empirical constants [10].
Eq. (1.2) can be linearized at $V_{GS} + V_{TH} + 0.2 \approx 6T_{ox}\tilde{E}_o,$

$$E_{sat} \approx \frac{2\nu_{sat}}{\mu_o} \left[ 1 + \left( \frac{\tilde{E}_o}{E_o} \right)^\eta \right]$$

$$\times \left( 1 - \eta \frac{\tilde{E}_o}{E_o} + \eta \frac{V_{GS} + V_{TH} + 0.2V}{6T_{ox}E_o} \right). \quad (1.3)$$

$\tilde{E}_o$ is selected according to

$$\tilde{E}_o = \frac{\tilde{V}_{GS} + V_{TH} + 0.2V}{6T_{ox}} \quad (1.4)$$

wherein $\tilde{V}_{GS}$ can take any normal value that biases the MOSFET in the strong inversion region.

Substitute $E_{sat}$ in (1.1) by the linearized formula (1.3), we have

$$I_{DSAT} \approx \frac{WC_{ox}\nu_{sat}(V_{GS} - V_{TH})^2}{(1 + m\frac{L}{E_o})(V_{GS} - V_{TH}) + m\tilde{E}_{sat}L}$$

$$= WC_{ox}\nu_{sat}m\tilde{E}_{sat}L \frac{\rho^2}{(1 + m\frac{L}{E_o})\rho + 1} \quad (1.5)$$

where

$$L_o \triangleq \frac{3\mu_oE_o}{\eta\nu_{sat}} \left( \frac{E_o}{\tilde{E}_o} \right)^\eta T_{ox}, \quad (1.6)$$

$$\tilde{E}_{sat} \triangleq \frac{2\nu_{sat}}{\mu_o} \left[ 1 + \left( \frac{\tilde{E}_o}{E_o} \right)^\eta \right]$$

$$\times \left( 1 - \eta \frac{\tilde{E}_o}{E_o} + \eta \frac{2V_{TH} + 0.2V}{6T_{ox}E_o} \right), \quad (1.7)$$

$$\rho \triangleq \frac{V_{GS} - V_{TH}}{m\tilde{E}_{sat}L}. \quad (1.8)$$

Now $L_o$ and $\tilde{E}_{sat}$ do not depend on bias.

Now we can derive the model parameters according to (1.5). The transconductance $g_m \triangleq \frac{dI_{DSAT}}{dV_{GS}}$ is given by

$$g_m = \frac{WC_{ox}\nu_{sat}}{1 + m\frac{L}{E_o}} \left\{ 1 - \frac{1}{[(1 + m\frac{L}{E_o})\rho + 1]^2} \right\}. \quad (1.9)$$
Eq. (1.9) defines a saturation level, \( \frac{WC_{ox} \nu_{sat}}{1 + m L_o} \), for \( g_m \) as the bias voltage \( V_{GS} \) increases.

The gate and drain capacitances are considered bias-independent. They have the forms

\[
C_{gs} = x \cdot W LC_{ox}, \tag{1.10}
\]

\[
C_d = y C_{gs} = (xy) \cdot W LC_{ox} \tag{1.11}
\]

with \( x \) and \( y \) being constants.

Then we can derive the expression of the device cut-off frequency \( \omega_T \triangleq \frac{g_m}{C_{gs}} \),

\[
\omega_T = \omega_{sat} \left\{ 1 - \frac{1}{\left[ (1 + m \frac{L_o}{L}) \rho + 1 \right]^2} \right\} \tag{1.12}
\]

where

\[
\omega_{sat} = \frac{\nu_{sat}}{x L \left( 1 + m \frac{L_o}{L} \right)} \tag{1.13}
\]

\( \omega_T \approx \omega_{sat} \) if \( \rho \gg 1 \). It is called the high-bias condition which is satisfied for either very high bias voltages or very short short channel lengths. Fig. 1.3(a) shows how \( \omega_T \) approaches \( \omega_{sat} \) as the bias voltage increases for MOSFETs with different channel lengths.

\( \alpha \) is defined as the ratio between \( g_m \) and \( g_{ds} |_{V_{DS}=0} \). The drain current of a MOSFET working in the linear region is given by

\[
I_{DLIN} = 2WC_{ox} \nu_{sat} \frac{(V_{GS} - V_{TH} - \frac{m}{2} V_{DS}) V_{DS}}{V_{DS} + E_{sat} L}. \tag{1.14}
\]

Then we can derive \( g_{ds} |_{V_{DS}=0} \).

\[
g_{ds} |_{V_{DS}=0} = 2WC_{ox} \nu_{sat} \frac{\rho}{L_o \rho + \frac{1}{m}}. \tag{1.15}
\]

Therefore,

\[
\alpha = \frac{\left[ (1 + m \frac{L_o}{L}) \rho + 2 \right] \left( \frac{L_o}{L} \rho + \frac{1}{m} \right)}{2 \left[ (1 + m \frac{L_o}{L}) \rho + 1 \right]^2}. \tag{1.16}
\]

For low bias voltages, \( \rho \ll 1 \), and \( \alpha \) equals to the long channel limit \( \frac{1}{m} \). On the other hand, if the high-bias condition is satisfied, \( \alpha \) converges to \( \alpha_{sat} \),

\[
\alpha_{sat} = \frac{L}{2(L_o + mL)}. \tag{1.17}
\]
The convergence is shown in Fig. 1.3(b).

The body effect is also considered in the model due to the inductive degeneration. For a modern retrograde body doping profile, the threshold voltage $V_{TH}$ is linearly dependent on the bulk-source voltage $V_{BS}$,

$$V_{TH} = V_{THO} - (m - 1)V_{BS}. \quad (1.18)$$

Then the back-gate transconductance $g_{mb} \triangleq \frac{dI}{dV_{BS}}$ can be derived,

$$g_{mb} = \frac{WC_{ox}v_{sat}}{1 - \frac{m}{m-1} \left( \frac{2mL_o}{L_o} \rho + 1 \right)} \left( 1 - \left[ \frac{2mL_o}{L_o} \rho + 1 \right] \left( 1 + mL_o \rho + 1 \right) \right)^2. \quad (1.19)$$

We denote $b$ as the ratio between $g_{mb}$ and $g_m$. So

$$b = (m - 1) \left( \frac{1 + 3mL_o}{L_o} \rho + 2 \right) \left( 1 + mL_o \rho + 2 \right). \quad (1.20)$$

Similar to $\alpha$, for $\rho \ll 1$, $b$ equals to $m - 1$. For $\rho \gg 1$, it converges $b_{sat}$ (see Fig. 1.3(c)).

$$b_{sat} = (m - 1) \left( \frac{L_o + 3mL}{L_o + mL} \right). \quad (1.21)$$

From the above analysis and the corresponding plots, we conclude an important property of short-channel MOSFETs for which $\rho \gg 1$ is easily satisfied. Their model parameters $\omega_T$, $\alpha$ and $b$ are not very sensitive to the gate bias condition given that the device is biased in the strong inversion region and the drain voltage is high enough.

Finally we describe the passive components in the model. $C_e$ is an external capacitor to help save power consumption [6]. A resistor in series with $C_{gs}$ and $C_e$ is used to model the distributed resistor network with a quality factor of $Q_C$. This resistor is included by $R_g$. All the inductors in the model are on-chip spiral inductors so they have very limited quality factors. The finite quality factors of $L_g$ and $L_d$ are also modeled as series resistors and they are included by $R_g$ and $R_d$ respectively. The finite quality factor of $L_s$ together with the source junction contact resistance and the source-substrate resistance are modeled as a parallel resistor so that the DC bias condition is not affected as $Q_s$ changes.

In this work, we will give design examples to demonstrate the proposed optimization technique using a virtual but realistic CMOS process. The process parameters, together with some process-independent constants, are summarized in Section 1.6. Unless otherwise specified, these values will be assumed throughout the chapter.
Figure 1.3: The dependence of the MOSFET model parameters (a) normalized $\omega_T$, (b) $\alpha$ and (c) $b$ on the bias voltage. The transistor channel lengths are $L = 0.18 \, \mu m$, $L = 0.24 \, \mu m$ and $L = 0.5 \, \mu m$. 
1.2.2 The Selection of Design Variables

In the CS LNA shown in Fig. 1.2, there are four explicit independent variables: the bias voltage $V_{GS}$ (or $\rho$), the external gate-source capacitor $C_e$, the transistor size $W$ and the degeneration inductance $L_s$. Except $\rho$, we do not use $C_e$, $W$ and $L_s$ as design variables directly. Instead, we define three derived variables $\kappa$, $A$ and $T$:

$$\kappa \triangleq 1 + \frac{C_e}{C_{gs}}$$  \hspace{1cm} (1.22)

$$A \triangleq 2R_G\omega(C_{gs} + C_e)$$  \hspace{1cm} (1.23)

$$T \triangleq g_m\omega L_s$$  \hspace{1cm} (1.24)

Both $A$ and $T$ have physical meanings. $\frac{1}{A}$ is the quality factor of the input matching network, or equivalently $A$ is the normalized input matching bandwidth. $T$ is the loop gain of the series-series feedback configuration provided by the degeneration inductor.

The benefits of selecting $\kappa$, $A$ and $T$ as design variables will be evident in following sections wherein the optimization process can be pursued in steps, and each step can be broken into conditional optimization for a single variable.

1.2.3 Components for Power Matching

In this part, we discuss the selection of matching components $L_g$, $R_g$, $L_d$ and $R_d$ according to the requirements of power matching at both ports. The hybrid ($H$-) parameters of the CS LNA model are

$$H_{11} = R_g + j\omega L_g + \frac{2R_G}{A} \left[ \frac{1}{j} + \frac{(1 + j\frac{\omega}{\omega_T})T}{1 + \frac{1}{Q_s} + j\omega T} \right],$$  \hspace{1cm} (1.25)

$$H_{21} = \frac{\omega_T}{j\omega\kappa} \cdot \frac{(1 + b\frac{\omega}{\omega_T}T) + j\frac{1}{Q_s}}{1 + j(bT + \frac{1}{Q_s})},$$  \hspace{1cm} (1.26)

$$H_{12} = 0,$$  \hspace{1cm} (1.27)

$$H_{22} = \frac{jyA}{2RG\kappa} + \frac{1}{R_d + j\omega L_d}.$$  \hspace{1cm} (1.28)

For simplicity, we have employed a unilateral model. The input impedance and the output admittance are $Z_{in} = H_{11}$ and $Y_{out} = H_{22}$. To satisfy the power matching requirements, they need to be conjugately matched to a real source impedance $R_G$ and a real load impedance $R_L$ respectively. That is, $H_{11} = R_G$ and $H_{22} = \frac{1}{R_L}$. Therefore,
the design equations for the matching components can be solved.

\[
L_g = \frac{2R_G}{\omega_0 A_0} \left[ 1 - \frac{T_0 \left( \frac{\omega_0 \kappa}{\omega_T} - \frac{1}{Q_s} - bT_0 \right)}{1 + \left( bT_0 + \frac{1}{Q_s} \right)^2} \right],
\]

(1.29)

\[
R_g = R_G - \frac{2R_G}{A_0} \cdot \frac{T_0 \left( 1 + \frac{\omega_0 \kappa}{Q_s \omega_T} + b\omega_0 \kappa T_0 \right)}{1 + \left( bT_0 + \frac{1}{Q_s} \right)^2},
\]

(1.30)

\[
L_d = \frac{1}{\omega_0} \cdot \frac{1}{\frac{y_A_0}{yA_0 R_L^2} + \frac{2R_G \kappa}{yA_0 R_L}},
\]

(1.31)

\[
R_d = \frac{2R_G \kappa}{yA_0 R_L} + \frac{2R_G \kappa}{yA_0 R_L^2}.
\]

(1.32)

\[\omega_0 = 2\pi f_0\] is the desired operating frequency. \(A_0\) and \(T_0\) are the values of \(A\) and \(T\) at \(\omega_0\) respectively. This convention will be applied to other variables. For each given variable set \(\{\rho, \kappa, A_0, T_0\}\), the matching components are uniquely determined.

1.2.4 Design Constraints

In practice, the range of the design variables are subject to various constraints. Fixing the bias \(\rho\) and the capacitance ratio \(\kappa\), the variable pairs \(\{A_0, T_0\}\) that satisfy all constraints comprise the region of design (ROD) in a 2-dimensional plane. We will introduce several common constraints for CS LNA designs and find the ROD.

The \(Q_{L_g}, Q_C\) Constraint

A physical capacitor \(C_{gs}\) has a finite quality factor and so does \(L_g\). Hence, \(R_g\) which includes the series resistance of \(C_{gs}\) and \(L_g\) cannot be arbitrarily small. We suppose the quality factor \(Q_C\) and \(Q_{L_g}\) are given, so \(R_g\) needs to satisfy \(R_g \geq \frac{1}{Q_{C\omega_0 \kappa C_{gs}}} + \frac{\omega_0 L_g}{Q_{L_g}}\). According to (1.23), (1.29) and (1.30), one can derive a lower bound \(M\) for \(A_0\).

\[A_0 \geq M\]

(1.33)

where

\[
M \triangleq \frac{2T_0 \left( 1 + \frac{\omega_0 \kappa}{Q_s \omega_T} + b\omega_0 \kappa T_0 \right)}{1 + \left( bT_0 + \frac{1}{Q_s} \right)^2} + \frac{2}{Q_C} \\
+ \frac{2}{Q_{L_g}} \left[ 1 - \frac{T_0 \left( \frac{\omega_0 \kappa}{\omega_T} - \frac{1}{Q_s} - bT_0 \right)}{1 + \left( bT_0 + \frac{1}{Q_s} \right)^2} \right].
\]

(1.34)
Though the exact value of $Q_C$, $Q_{L_d}$ and $Q_s$ are not known until the circuits are finalized, an estimate according to a knowledge of process and rough ranges of component values is usually a good enough starting point.

The $Q_{L_d}$-Constraint

Similarly, a quality factor constraint is also applied to $L_d$. For a given $Q_{L_d}$, $R_d$ satisfies $R_d \geq \frac{\omega_d}{Q_{L_d}}$. From (1.31) and (1.32), we can derive an upper bound

$$A_0 \leq \frac{2Q_{L_d}R_G\kappa}{yR_L}. \tag{1.35}$$

The $I_D$-Constraint

According to (1.5), (1.10) and (1.23), the drain current $I_D$ can be related to $A_0$ by

$$A_0 = \frac{\kappa I_D}{I_0} \frac{(1 + m\frac{L}{L_o})\rho + 1}{\rho^2} \tag{1.36}$$

where $I_0$ is a bias-independent constant

$$I_0 \triangleq \frac{\nu_{sat}m\tilde{E}_{sat}}{2x\omega_0R_G}. \tag{1.37}$$

The $I_D$-constraint defines an upper bound $I_{D,\text{max}}$ for the drain current, then $A_0$ must satisfy

$$A_0 \leq \frac{\kappa I_{D,\text{max}}}{I_0} \frac{(1 + m\frac{L}{L_o})\rho + 1}{\rho^2}. \tag{1.38}$$

The Gain-Constraint

Gain refers to the power gain in this chapter. The power gain $G_p$ of a CS LNA satisfying conjugate matching at both ports equals the maximal power gain $G_{\text{max}} = \frac{|H_{21}|^2}{4\Re{\{H_{11}\}}\Re{\{H_{22}\}}}$.

$$G_p = \frac{R_L\omega_T^2}{4R_G\omega_0\kappa^2} \frac{\left(1 + \frac{\omega_R}{\omega_T}bT_0\right)^2 + \frac{1}{Q_s^2}}{1 + \left(bT_0 + \frac{1}{Q_s}\right)^2}. \tag{1.39}$$
The effect of $T_0$ can be examined in two extreme cases:

$$G_p|_{T_0=0} = \frac{R_L\omega_T^2}{4R_G\omega_0^2\kappa^2}, \quad (1.40)$$

$$G_p|_{T_0\to\infty} = \frac{R_L}{4R_G}. \quad (1.41)$$

In practice, $\omega_0\kappa < \omega_T$ is always true. This means a large value of $T_0$ results in a power gain degradation. For any power gain $G_p$ satisfying $G_p|_{T_0\to\infty} < G_p < G_p|_{T_0=0}$, we can find a unique corresponding $T_0$.

$$T_0 = \frac{1}{b(G_p - \frac{R_L}{4R_G})} \left\{ \left[ \left( \frac{G_p}{Q_s} - \frac{R_L\omega_T}{4R_G\omega_0\kappa} \right)^2 \right. \right.$$

$$\left. + (1 + \frac{1}{Q_s}) \left( G_p - \frac{R_L}{4R_G} \right) \left( \frac{R_L\omega_T^2}{4R_G\omega_0^2\kappa^2} - G_p \right) \right]^\frac{1}{2}$$

$$- \left( \frac{G_p}{Q_s} - \frac{R_L\omega_T}{4R_G\omega_0\kappa} \right) \right\}. \quad (1.42)$$

And given a lower bound of the power gain $G_{p,\text{min}} (1.42)$ defines an upper bound for $T_0$.

In summary, the ROD of a CS LNA design with all the above constraints can be described as

$$\text{ROD} = \{ M \leq A_0 \leq A_{\text{max}}, \ 0 \leq T_0 \leq T_{\text{max}} \} \quad (1.43)$$

$A_{\text{max}}$ and $T_{\text{max}}$ are the lowest values of various upper bounds defined above. Designers can update their values if more constraints are considered based on specific application requirements.

1.3 The Noise Optimization Techniques

1.3.1 Noise Factors of CS LNAs

We adopt the concept of noise transfer function (NTF) in our noise analysis. The NTFs are the transfer functions from different internal noise sources, $n_k$, to the outputs of the 2-port network. The type of NTFs corresponds to that of the 2-port parameter. Since we use $H-$parameters to describe a CS LNA, we also use $H-$NTFs. Now, we are going to derive the connection between $H-$NTFs and the equivalent input noise voltage $v_{n,eq}$ and the equivalent input noise current $i_{n,eq}$. In
general, the 2-port equations with NTFs are
\[
\begin{align*}
V_1 &= H_{11}I_1 + H_{12}V_2 + \sum_k H_{n1k}n_k \\
I_2 &= H_{21}I_1 + H_{22}V_2 + \sum_k H_{n2k}n_k 
\end{align*}
\] (1.44)

On the other hand, the 2-port equations with the equivalent noise voltage and current are
\[
\begin{align*}
V_1 + v_{n,eq} &= H_{11} (I_1 + i_{n,eq}) + H_{12}V_2 \\
I_2 &= H_{21} (I_1 + i_{n,eq}) + H_{22}V_2 
\end{align*}
\] (1.45)

By comparing (1.44) and (1.45), we conclude that
\[
\begin{align*}
i_{n,eq} &= U, \\
v_{n,eq} &= H_{11}U - E
\end{align*}
\] (1.46, 1.47)

with
\[
\begin{align*}
U &\triangleq \sum_k \frac{H_{n2k}}{H_{21}}n_k, \quad (1.48) \\
E &\triangleq \sum_k \frac{H_{n1k}n_k}{(1.49)}
\end{align*}
\]

With all the above information, we can represent the noise factor of the CS LNA in terms of $U$ and $E$ [9],
\[
F = 1 + \frac{4R_G^2|U|^2 - 4R_G\Re\{UE^*\} + |E|^2}{4k_BT R_G \Delta f}. \quad (1.50)
\]

We also derive the expression of another noise parameter, the minimum noise factor $F_{\text{min}}$. Though only $F_{\text{opt}}$ affects the real performance of an LNA, we would like to see the difference between $F$ and $F_{\text{min}}$ at the optimal design.
\[
F_{\text{min}} = 1 + \frac{|U|^2R_G - \Re\{UE^*\}}{2k_BT R_G \Delta f}
\]
\[
+ \frac{1}{2k_BT \Delta f} \left[ \left( |U|^2R_G - \Re\{UE^*\} + \frac{|E|^2}{2R_G} \right)^2 \\
- \left( \Re\{UE^*\} - \frac{|E|^2}{2R_G} \right)^2 - \Im^2\{UE^*\} \right]^{\frac{1}{2}}. \quad (1.51)
\]
$F$ is always higher than $F_{\text{min}}$ and they are equal if and only if both $\Re\{UE^*\} = \frac{|E|^2}{2R_G}$ and $\Im\{UE^*\} = 0$ are satisfied.

### 1.3.2 Internal Noise Sources

The equivalent circuit in Fig. 1.2 employs a MOSFET noise model from [11]. It includes the channel thermal noise $i_{ds}$ and the induced gate noise $i_g$.

The channel thermal noise constant $\gamma$ and the induced gate current noise constant $\delta$ are verified to be independent of operating frequencies, and they are not sensitive to bias conditions for high bias voltages. However, they vary with channel lengths. Short channel effects can result in larger values than their theoretical long channel values [12].

$i_{ds}$ and $i_g$ are not independent and their correlation coefficient $c$ is studied under different conditions [13]. It is purely imaginary,

$$c = \frac{i_g \cdot \bar{i}_{ds}}{\sqrt{|i_g|^2 \cdot |i_{ds}|^2}} = j c_i. \quad (1.52)$$

$c_i$ is independent of operating frequencies. Also, when biased in the saturation region, it is not very sensitive to bias voltages. Moreover, it decreases when the channel length $L$ decreases.

In our analysis, we consider $\gamma$, $\delta$ and $c_i$ as constants. To make our future expressions more compact, we define three variables $\psi$, $\chi$ and $\xi$.

$$\psi \triangleq \frac{\alpha \delta}{5k^2}. \quad (1.53)$$

$$\chi \triangleq \frac{\alpha \delta}{5k^2} + \frac{\gamma}{\alpha} - 2c_i\sqrt{\frac{\gamma \delta}{5k^2}}. \quad (1.54)$$

$$\xi \triangleq \frac{\alpha \delta}{5k^2} - c_i\sqrt{\frac{\gamma \delta}{5k^2}}. \quad (1.55)$$

The thermal noises of the resistors in the model are also considered. As a summary, all internal noise sources in a CS LNA with their power spectra and NTFs are listed in Table 1.1. So the explicit expressions of $|U|^2$, $|E|^2$ and $UE^*$ can be derived according to (1.48) and (1.49). They are summarized in Section 1.8. And $F$ is now an explicit function of $A_0$ and $T_0$. There are two terms in $F$ that depend on $A_0$. One is proportional to $\frac{1}{A_0}$ and the other is proportional to $A_0$. $N$ is defined as the ratio between the coefficients of the two terms. A rigorous definition is

$$N \triangleq \lim_{A_0 \to 0} \frac{A_0 \cdot F}{\lim_{A_0 \to \infty} \frac{F}{A_0}}. \quad (1.56)$$
### 1.3.3 Optimizations of Noise Factors

Having obtained the formula of the noise factor, we will show the proposed noise optimization technique. The general optimization procedure can be described as follows. First, we solve for the optimal pair \( \{A_{\text{opt}}, T_{\text{opt}}\} \) in the ROD that minimizes the noise factor for a fixed \( \rho \) and \( \kappa \). Then we discuss the selections of \( \rho \) and \( \kappa \).

Three cases under different assumptions will be discussed. The first is a simple case wherein we assume all passives are ideal and the back-gate transconductance is removed. In the second case, we include the finite quality factors of the passives into the optimization. In the third case, we consider the effect of \( g_{mb} \) on the optimization results.

\[
\sigma^I \triangleq \{ Q_C = Q_{L_g} = Q_s = Q_{L_d} = \infty, \ b = 0 \}
\]

Any variables with the assumption \( \sigma^I \) applied are annotated by a superscript of the roman number I. So the noise factor becomes \( F^I \)

\[
F^I = 2 + \frac{4R_G\omega_0\kappa^2}{R_L\omega_T^2} - \frac{4\chi\omega_0\kappa}{\omega_T}T_0 + \frac{2\chi\omega_0\kappa}{\omega_T} \left( A_0 + \frac{N^I}{A_0} \right)
\]

with

\[
N^I = T_0^2 - \frac{\omega_T}{\chi\omega_0\kappa}T_0 + \frac{\psi}{\chi}
\]

\( A_0 \) satisfies \( M^I \leq A_0 \leq A_{\text{max}} \) where

\[
M^I = 2T_0.
\]
To find the conditional optimal $\tilde{A}_0^1$ for a fixed $T_0 \leq T_{\text{max}}$, we solve

$$\frac{\partial F^1}{\partial A_0} = \frac{2\chi\omega_0\kappa}{\omega_T} \left( 1 - \frac{N^1}{A_0^2} \right) = 0. \quad (1.60)$$

The root of (1.60), $\sqrt{N^1}$, is not necessarily within the ROD, so $\tilde{A}_0^1$ satisfies

$$\tilde{A}_0^1 = \begin{cases} 
M^1 & \text{if } T_0 \in \theta_1^1 \\
\sqrt{N^1} & \text{if } T_0 \in \theta_2^1 \\
A_{\text{max}} & \text{if } T_0 \in \theta_3^1 
\end{cases} \quad (1.61)$$

$\theta_1^1, \theta_2^1$ and $\theta_3^1$ used in (1.61) are given by

$$\begin{align*}
\theta_1^1 & \triangleq \{ T_0 | N^1 < (M^1)^2 \leq A_{\text{max}}^2, T_0 \leq T_{\text{max}} \} \\
& = \{ T_0 | T_0 > p^1, T_0 \leq \min\{q^1, T_{\text{max}}\} \}, \\
\theta_2^1 & \triangleq \{ T_0 | (M^1)^2 \leq N^1 \leq A_{\text{max}}^2, T_0 \leq T_{\text{max}} \} \\
& = \{ T_0 | N^1 \leq A_{\text{max}}^2, T_0 \leq \min\{p^1, q^1, T_{\text{max}}\} \}, \\
\theta_3^1 & \triangleq \{ T_0 | (M^1)^2 \leq A_{\text{max}}^2 < N^1, T_0 \leq T_{\text{max}} \} \\
& = \{ T_0 | A_{\text{max}}^2 < N^1, T_0 \leq \min\{p^1, q^1, T_{\text{max}}\} \}. \quad (1.62) (1.63) (1.64)
\end{align*}$$

The expressions of $p^1$ and $q^1$ are

$$\begin{align*}
p^1 &= \sqrt{\frac{\omega_T^2}{\chi^2\omega_0^2\kappa^2} + \frac{12\psi}{\chi} - \frac{\omega_T}{\chi\omega_0\kappa}}, \\
q^1 &= \frac{A_{\text{max}}}{2}. \quad (1.65) (1.66)
\end{align*}$$

$p^1$ is the unique positive root that satisfies $(M^1)^2 = N^1$ and $T_0 < p^1$ is equivalent to $(M^1)^2 < N^1$. $q^1$ satisfies $M^1 = A_{\text{max}}$ and $T_0 < q^1$ is equivalent to $M^1 < A_{\text{max}}$. $\theta_1^1$, if it is not empty, defines a continuous domain. And $\theta_2^1 \cup \theta_3^1$ defines another continuous domain that is on the left side of $\theta_1^1$.

The conditional optimal noise factor $\hat{F}^1 = F^1|_{A_0 = \tilde{A}_0^1}$ is a multi-piece continuous function of $T_0$. The next step is to find the global optimal. We calculate $\frac{\partial \hat{F}^1}{\partial T_0}$ in a piecewise way. For $T_0 \in \theta_2^1$ and $T_0 \in \theta_3^1$,

$$\begin{align*}
\frac{\partial \hat{F}^1}{\partial T_0} \bigg|_{T_0 \in \theta_2^1} &= -\frac{4\chi\omega_0\kappa}{\omega_T} + \frac{2\chi\omega_0\kappa}{\omega_T \sqrt{N^1}} \frac{\partial N^1}{\partial T_0}, \\
\frac{\partial \hat{F}^1}{\partial T_0} \bigg|_{T_0 \in \theta_3^1} &= -\frac{4\chi\omega_0\kappa}{\omega_T} + \frac{2\chi\omega_0\kappa}{\omega_T A_{\text{max}}} \frac{\partial N^1}{\partial T_0}. \quad (1.67) (1.68)
\end{align*}$$
According to Corollary 1 in Section 1.7, (1.67) and (1.68) are both negative in their restricted domains. Hence \( \hat{F}^1 \) is monotonically decreasing in \( \theta_2^1 \cup \theta_3^1 \). For \( T_0 \in \theta_1^1 \),

\[
\left. \frac{\partial \hat{F}^1}{\partial T_0} \right|_{T_0 \in \theta_1^1} = \frac{\chi \omega_0 \kappa}{\omega_T} \left( 1 - \frac{\psi}{\chi} \cdot \frac{1}{T_0^2} \right). \tag{1.69}
\]

There exists a unique solution \( r^1 \) that makes (1.69) vanish,

\[
r^1 = \sqrt{\frac{\psi}{\chi}}. \tag{1.70}
\]

For \( T_0 \in \theta_1^1 \), \( \hat{F}^1 \) is decreasing when \( T_0 < r^1 \) and it is increasing when \( T_0 > r^1 \). So the minimal \( \hat{F}^1 \) can be achieved at \( r^1 \) if \( r^1 \in \theta_1^1 \), otherwise it can be achieved at the right boundary, \( \min\{q^1, T_{\text{max}}\} \). We show that \( r^1 > p^1 \) in Corollary 2 of Section 1.7. Hence, \( r^1 \in \theta_1^1 \) is equivalent to \( r^1 \leq \min\{q^1, T_{\text{max}}\} \). In summary,

\[
T_{\text{opt}}^1 = \min\{r^1, q^1, T_{\text{max}}\}. \tag{1.71}
\]

Then the global optimum \( A_{\text{opt}}^1 \) and \( F_{\text{opt}}^1 \) can be calculated according to (1.61) and (1.57) respectively. According to different selections of \( T_{\text{opt}}^1 \), the optimization result is said to have different critical constraints: unconstrained, \( A_{\text{max}} \)-constrained and \( T_{\text{max}} \)-constrained. The corresponding \( A_{\text{opt}}^1 \) satisfies

\[
A_{\text{opt}}^1 = \begin{cases} 
2r^1 & \text{if } T_{\text{opt}}^1 = r^1 \\
A_{\text{max}} & \text{if } T_{\text{opt}}^1 = q^1 \\
\min\{\sqrt{N^1}, A_{\text{max}}\} & \text{if } T_{\text{opt}}^1 = T_{\text{max}}
\end{cases}. \tag{1.72}
\]

In both the unconstrained and the \( A_{\text{max}} \)-constrained cases, the global optimal designs are always located on the curve defined by \( A_0 = M^1 \). This results in a well-known criterion for the selection of degeneration inductor, \( L_s = \frac{R_L}{2 \omega_0 \chi} \). It is consistent with previously published works wherein no constraints are applied to \( T_0 \). However, in the \( T_{\text{max}} \)-constrained case, this conclusion is no longer valid according to (1.72).

Fig. 1.4 shows the RODs and the corresponding optimization results of several design examples. Each example emphasizes a different design constraint.

The first example is the unconstrained design which means \( T_{\text{max}} = \infty \) and \( A_{\text{max}} = \infty \) (See Fig. 1.4(a)). The only object is to make noise factor as low as possible. The optimal design variables and the optimal noise factor are

\[
T_{\text{opt}}^{1(a)} = r^1, \tag{1.73}
\]

\[
A_{\text{opt}}^{1(a)} = 2r^1, \tag{1.74}
\]

\[
F_{\text{opt}}^{1(a)} = 1 + \frac{4R_G \omega_0^2 \kappa^2}{R_L \omega_T^2} + \frac{2 \omega_0 \kappa}{\omega_T} \sqrt{\psi \chi}. \tag{1.75}
\]
Figure 1.4: The RODs and optimization results of the simple cases \((Q_{L_0} = Q_s = Q_{L_d} = \infty, \ b = 0)\) with the assumptions (a) \(A_{\text{max}} = \infty, T_{\text{max}} = \infty\), (b) \(A_{\text{max}} = \infty, T_{\text{max}} = 0\), (c) \(A_{\text{max}} > 2r_1^1, T_{\text{max}} = \infty\), (d) \(A_{\text{max}} < 2r_1^1, T_{\text{max}} = \infty\). In each case, two sub-plots are generated corresponding to the condition \(\frac{\dot{\omega}_r}{2\chi\omega_0} > r_1^1\) and \(\frac{\dot{\omega}_r}{2\chi\omega_0} < r_1^1\) respectively. A solid line “\(\)” with high weight is the trace of the conditional optimum \(A_{0}^1\) for each \(T_0\). A “\(\)” denotes the optimal design \(\{A_{\text{opt}}^1, T_{\text{opt}}^1\}\) in an ROD.
Figure 1.4: (Continued).
The optimal noise factor in an unconstrained design is contributed by two kinds of sources. The second term in (1.75) is the contribution from the output matching network, and the third term represents the contribution from the MOSFET.

The effect of $\kappa$ can be studied separately. The third term in $F_{\text{opt}}^\dagger$ can be proportional to $\frac{1}{\kappa}$ in the most extreme condition of $c_i = 0$ and $\gamma = 0$. However, the second term is proportional to $\kappa^2$ and will dominate quickly as $\kappa$ increases, especially in high-frequency designs. Equivalently speaking, $\kappa$ has a minor effect on the noise coming from the MOSFET but can seriously harm the noise from the output matching network due to the power gain degradation. Additionally, putting down an external capacitor $C_e$ requires extra layout efforts. Therefore, a rule of thumb is to make $\kappa = 1$ in the unconstrained designs. Furthermore, the dependence of $F_{\text{opt}}^\dagger$ on $\rho$ should not be evident if the high-bias condition is satisfied. On the other hand, if $\rho$ drops, $\alpha$ will be close to the long-channel limit $\frac{1}{m}$ but $\omega_T$ will keep decreasing which results in the increase of $F_{\text{opt}}^\dagger$.

10 GHz LNA designs using devices of different channel lengths are given to demonstrate our analysis. Fig. 1.5(a) shows the negative effect of $\kappa$ on the optimal noise figures. Fig. 1.5(b) and Fig. 1.5(c) shows the effect of the bias voltage on the optimal noise figures and the drain currents respectively. Outside certain ranges, increasing the bias voltage gives very limited improvements to the optimal noise figures especially for short-channel devices, but it results in a large penalty of much higher drain currents.

To see how $F_{\text{min}}$ is different from $F$ at the optimal design, we calculate

\[
\Re\{UE^*\} - \frac{|E|^2}{2RG}\Big|_{\{\sigma^I, A^I_{\text{opt}}, T^I_{\text{opt}}\}} = 0, \quad (1.76)
\]

\[
\Im\{UE^*\}\Big|_{\{\sigma^I, A^I_{\text{opt}}, T^I_{\text{opt}}\}} = 4k_B\omega_T\frac{\omega_T\kappa\xi}{\omega_T}. \quad (1.77)
\]

Therefore, the only way to make $F$ and $F_{\text{min}}$ equal is to select an appropriate $\kappa$ so that $\xi$ vanishes. The appropriate value $\kappa = \frac{2}{c_i \sqrt{\frac{2}{5\gamma}}}$ can be derived from (1.55) and it is achievable only if it is greater than 1. Obviously, the optimal noise figure does not equal to the corresponding $F_{\text{min}}$ in general.

In the second example, the noise figure is optimized under the constraint that the power gain should be maximized. Hence, there is no degeneration inductor. This is a special case of the $T_{\text{max}}$-constrained design with $T_{\text{max}} = 0$ and $A_{\text{max}} = \infty$. The ROD degenerates to a line (see Fig. 1.4(b)). The optimal design variables and the optimal
Figure 1.5: An unconstrained 10 GHz CS LNA design uses ideal inductors ($Q_{Lq} = Q_s = Q_{Ld} = \infty$) and ignores $g_{mb}$ ($b = 0$). (a) The optimal noise figure varies with the capacitance ratio $\kappa$ ($V_{GS} - V_{TH} = 0.6$ V). (b) The optimal noise figure varies with the bias voltage $V_{GS}$ ($\kappa = 1$). (c) The drain current at the optimal design varies with the bias voltage $V_{GS}$ ($\kappa = 1$). The channel lengths are $L = 0.18$ $\mu$m, $L = 0.24$ $\mu$m and $L = 0.5$ $\mu$m respectively.
noise factor are

\[ T_{\text{opt}}^{(b)} = 0, \]  
\[ A_{\text{opt}}^{(b)} = r^1, \]  
\[ F_{\text{opt}}^{(b)} = 2 + \frac{4R_G\omega_0^2\kappa^2}{R_L\omega_T^2} + \frac{4\omega_0\kappa}{\omega_T}\sqrt{\psi\chi}. \]  

(1.78)

(1.79)

(1.80)

Comparing \( F_{\text{opt}}^{(b)} \) and \( A_{\text{opt}}^{(b)} \) with \( F_{\text{opt}}^{(a)} \) and \( A_{\text{opt}}^{(a)} \), one can conclude that inductive degeneration can reduce the noise factor by half, if the noise from the output matching network is ignored, with the price of doubling the current consumption. This is due to the well-known fact that the degeneration inductor can contribute to the real part of the input impedance without affecting noise performance. But as the operating frequency increases, the noise contribution from the output matching network gets larger and the benefits of inductive degeneration becomes less. The selection criteria of \( \kappa \) and \( \rho \) are the same as those in the unconstrained example due to the similar formulas of \( F_{\text{opt}}^{(b)} \) and \( F_{\text{opt}}^{(a)} \).

In the third example, only the \( I_D \)-constraint, \( I_{D,\text{max}} \), is applied. So \( A_{\text{max}} = \frac{\kappa I_{D,\text{max}}}{I_0} \frac{(1+mL_o)/\rho+1}{\rho^2} \) and \( T_{\text{max}} = \infty \). Depending on the relation between \( q^1 \) and \( r^1 \), there are two possibilities for the optimal solution (See Fig. 1.4(c) and (d)). Obviously, the design is unconstrained if \( I_{D,\text{max}} \) is very large. It becomes \( A_{\text{max}} \)-constraint when \( q^1 < r^1 \).

\[ T_{\text{opt}}^{(c)} \bigg|_{q^1 < r^1} = q^1. \]  
\[ A_{\text{opt}}^{(c)} \bigg|_{q^1 < r^1} = 2q^1. \]  
\[ F_{\text{opt}}^{(c)} \bigg|_{q^1 < r^1} = 1 + \frac{4R_G\omega_0^2\kappa^2}{R_L\omega_T^2} + \frac{2\chi\omega_0\kappa}{\omega_T}\left(\frac{A_{\text{max}}}{4} + \frac{\psi}{\chi A_{\text{max}}}\right). \]  

(1.81)

(1.82)

(1.83)

For stricter current constraints, the noise contribution from the MOSFET increases since the transistor size gets further away from the optimal value. For very low \( I_{D,\text{max}} \), we can obtain an approximate formula for \( F_{\text{opt}}^{(c)} \) by maintaining the last term in (1.83).

\[ F_{\text{opt}}^{(c)} \bigg|_{q^1 \ll r^1} \approx \frac{2\omega_0I_0\psi\rho^2}{\omega_T I_{D,\text{max}} \left[(1+mL_o)/\rho+1\right]}. \]  

(1.84)

Now the noise is dominated by the induced gate noise. Moreover, under the high-bias condition \( \rho \gg 1 \), \( F_{\text{opt}}^{(c)} \bigg|_{q^1 \ll r^1} \propto \frac{\rho}{\kappa^2 I_{D,\text{max}}}. \) Then we see the benefit of using an external gate-source capacitor: doubling the capacitance ratio \( \kappa \) can improve the optimal noise figure by 6 dB. Moreover, lowering the bias voltage can also improve the optimal noise
figure. These approaches can be considered effective until \( q^1 = \frac{A_{\text{max}}}{2} \) reaches \( r^1 \) which is not affected by \( \kappa \) and \( \rho \). The analysis is verified by the plots in Fig. 1.6.

\[ \sigma^\text{II} \triangleq \{ Q_{L_d} = \infty, \ b = 0 \} \]

In this part, we make a more complicated assumption \( \sigma^\text{II} \) that \( Q_C, Q_{L_g} \) and \( Q_s \) are finite while we still keep \( Q_{L_d} = \infty \) and \( b = 0 \). The noise factor under assumption \( \sigma^\text{II} \) becomes

\[
F^\text{II} = 2 + \frac{4 R_G \omega_0^2 \kappa^2}{R_L \omega_T^2} - \frac{4 \chi \omega_0 \kappa}{(1 + \frac{1}{Q_s^2}) \omega_T} T_0
+ \frac{2 \chi \omega_0 \kappa}{\omega_T} \left( A_0 + \frac{N^\text{II}}{A_0} \right)
\]

where

\[
N^\text{II} = \frac{T_0^2 - \left( \frac{\omega_T}{\chi \omega_0 \kappa} + \epsilon(1) \right) T_0 + \left( 1 + \frac{1}{Q_s^2} \right) \frac{\chi}{\omega_0}}{1 + \frac{1}{Q_s^2}},
\]

\[
\epsilon(1) = \frac{1 - 2 \xi}{Q_s \chi}.
\]

The optimization constraint \( M \) under this assumption is

\[
M^\text{II} = 2 (1 + \epsilon(2)) \left( T_0 + \epsilon(3) \right)
\]

with

\[
\epsilon(2) = \frac{\omega_0 \kappa}{Q_s \omega_T} - \frac{\omega_0 \kappa}{Q_{L_g} \omega_T} + \frac{1}{Q_{L_g} Q_s} - \frac{1}{Q_s^2},
\]

\[
\epsilon(3) = \frac{1}{(1 + \epsilon(2))} \left( \frac{1}{Q_C} + \frac{1}{Q_{L_g}} \right).
\]

Comparing the expressions of \( F^\text{II}, N^\text{II} \) and \( M^\text{II} \) with those of \( F^1, N^1 \) and \( M^1 \), we can find they are polynomials of \( A_0 \) and \( T_0 \) with the same order and only a few coefficients are changed. Therefore, we can use the same approach to find the optimal design.

The conditional optimum \( \hat{A}_0^\text{II} \) for a fixed \( T_0 \) satisfies

\[
\hat{A}_0^\text{II} = \left\{ \begin{array}{ll}
M^\text{II} & \text{if } T_0 \in \theta_1^\text{II} \\
\sqrt{N^\text{II}} & \text{if } T_0 \in \theta_2^\text{II} \\
A_{\text{max}} & \text{if } T_0 \in \theta_3^\text{II}.
\end{array} \right.
\]
Figure 1.6: An \( I_{D} \)-Constrained 10 GHz CS LNA design uses ideal inductors (\( Q_{Lg} = Q_s = Q_{Ld} = \infty \)) and ignores \( g_{mb} \) \( (b = 0) \). The optimal noise figure varies with \( I_{D,\text{max}} \) for (a) different \( \kappa \) and (b) different \( V_{GS} \). The device channel length is \( L = 0.18 \mu \text{m} \).
The definitions of \( \theta_1^{II} \), \( \theta_2^{II} \) and \( \theta_3^{II} \) have the same form as (1.62), (1.63) and (1.64) with all the superscripts “I” replaced by “II”. In the definitions, \( p^{II} \) satisfies \( (M^{II})^2 = N^{II} \) and \( q^{II} \) satisfies \( M^{II} = A_{\text{max}} \). Their expressions are

\[
p^{II} = \frac{1}{8} \left( 1 + \frac{\epsilon(2)}{Q_s^2} \right)^2 - \frac{2}{1 + \frac{1}{Q_s^2}} \left( \frac{\omega_T}{\chi \omega_0 \kappa} + \frac{\epsilon(1)}{1 + \frac{1}{Q_s^2}} + 8 \epsilon(3)(1 + \epsilon(2))^2 \right) \times \left( 4(1 + \epsilon(2))^2 - \frac{1}{1 + \frac{1}{Q_s^2}} \right) \times \left( \frac{\psi}{\chi} - 4 \epsilon(3) (1 + \epsilon(2))^2 \right)^{\frac{1}{2}} - \left( \frac{\omega_T}{\chi \omega_0 \kappa} + \frac{\epsilon(1)}{1 + \frac{1}{Q_s^2}} + 8 \epsilon(3)(1 + \epsilon(2))^2 \right) \right],
\]

(1.92)

\[
q^{II} = \frac{A_{\text{max}}}{2(1 + \epsilon(2))} - \epsilon(3).
\]

(1.93)

Calculate \( \frac{\partial \hat{F}^{II}}{\partial T_0} \) piecewise. For \( T_0 \in \theta_2^{II} \) and \( T_0 \in \theta_3^{II} \),

\[
\frac{\partial \hat{F}^{II}}{\partial T_0} \bigg|_{T_0 \in \theta_2^{II}} = -\frac{4 \chi \omega_0 \kappa}{(1 + \frac{1}{Q_s^2}) \omega_T} + \frac{2 \chi \omega_0 \kappa}{\omega_T \sqrt{N^{II}}} \frac{\partial N^{II}}{\partial T_0},
\]

(1.94)

\[
\frac{\partial \hat{F}^{II}}{\partial T_0} \bigg|_{T_0 \in \theta_3^{II}} = -\frac{4 \chi \omega_0 \kappa}{(1 + \frac{1}{Q_s^2}) \omega_T} + \frac{2 \chi \omega_0 \kappa}{\omega_T A_{\text{max}}} \frac{\partial N^{II}}{\partial T_0}.
\]

(1.95)

When \( Q_C \), \( Q_{L_s} \) and \( Q_s \) are sufficiently high, (1.94) and (1.95) are always negative according to Theorem 1 in Section 1.7. Therefore, \( \hat{F}^{II} \) is monotonically decreasing in \( \theta_2^{II} \cup \theta_3^{II} \). For \( T_0 \in \theta_1^{II} \),

\[
\frac{\partial \hat{F}^{II}}{\partial T_0} \bigg|_{T_0 \in \theta_1^{II}} = \frac{\chi \omega_0 \kappa}{(1 + \frac{1}{Q_s^2}) \omega_T (1 + \epsilon(2))(T_0 + \epsilon(3))^2} \times \left[ (1 + \epsilon(4))(T_0 + \epsilon(3))^2 \right. \\
- \left. \left( 1 + \frac{1}{Q_s^2} \right) \frac{\psi}{\chi} \right] - \left( \frac{\omega_T}{\chi \omega_0 \kappa} + \epsilon(3) + \epsilon(1) \right) \right]
\]

(1.96)
where
\begin{equation}
\epsilon_4 = 4(1 + \epsilon_2) \left[ \left( 1 + \frac{1}{Q_s^2} \right) (1 + \epsilon_2) - 1 \right].
\end{equation}

With the high-\(Q\) assumption, there exists a unique positive solution \(r^{II}\) that makes (1.96) zero.
\begin{equation}
r^{II} = \sqrt{\frac{(1 + \frac{1}{Q_s^2}) \psi \chi + \epsilon_3 \left( \frac{\omega_T}{A_0} + \epsilon_3 + \epsilon_1 \right)}{1 + \epsilon_4}} - \epsilon_3.
\end{equation}

\(r^{II} > p^{II}\) according to Theorem 2 in Section 1.7 and \(r^{II} \in \theta_1^{II}\) if and only if \(r^{II} \leq \min\{q^{II}, T_{\text{max}}\}\). The global optimum \(T_{\text{opt}}^{II}\) satisfies
\begin{equation}
T_{\text{opt}}^{II} = \min\{r^{II}, q^{II}, T_{\text{max}}\}.
\end{equation}

The statements that we make on the critical constraints in the simple case can also be applied here. And the conclusion that the optimal design is located on the curve \(A_0 = M^{II}\) only if the design is unconstrained or \(A_{\text{max}}\)-constrained is still true.

We will revisit the design examples which have been studied in the simple case and examine the effects of the finite quality factors on the optimization results. The plots in Fig. 1.4 can still be used to depict the RODs and optimum searching. The curves and points in the plots are simply replaced by their counterparts with superscript “II”. The changes are minor because the orders of the equations remain the same.

For the unconstrained example, the optimal design satisfies
\begin{align}
T_{\text{opt}}^{II(a)} &= r^{II}, \\
A_{\text{opt}}^{II(a)} &= 2(1 + \epsilon_2) \left( r^{II} + \epsilon_3 \right), \\
F_{\text{opt}}^{II(a)} &= 1 + \frac{4R_G \omega_0^2 \kappa^2}{R_L \omega_T^2} \left\{ 1 + \frac{\frac{\omega_0^2 \kappa^2}{\omega_T}}{1 + \epsilon_4} \left( \frac{2 + 4 \epsilon_2}{1 + \epsilon_2} \right) - 1 \right\}, \\
&+ \frac{2 \omega_0^2 \kappa^2}{\omega_T} \left( 1 + \epsilon_4 \right) \left( r^{II} + \epsilon_3 \right) \frac{1 + \frac{1}{Q_s^2}}{(1 + \epsilon_2)(1 + \frac{1}{Q_s^2})}.
\end{align}

If all the quality factors become infinity, \(\epsilon_1 - \epsilon_4\) vanish and (1.100) - (1.102) degenerate to (1.73) - (1.75). The effects of the finite quality factors are shown in Fig. 1.7. According to the plot, the noise degradation caused by \(Q_C\) and \(Q_{L_0}\) are not negligible even under the high-\(Q\) condition (\(Q > 10\)). This can be explained mathematically as follows. All the terms associated with \(Q_s\) and most of the terms associated with
$Q_C$ and $Q_{L_g}$ can be considered rather small and negligible in (1.100) - (1.102) except $\frac{\omega_T}{\omega_0 \kappa} (\frac{1}{Q_C} + \frac{1}{Q_{L_g}})$ which is included in the expression of $r^\Pi$. Because $\omega_0 \kappa < \omega_T$, the effects of $Q_C$ and $Q_{L_g}$ are magnified. An equivalent physical explanation can also be provided by looking at the NTFs in Table 1.1. The $H_{n1}$ of $i_{ds}$ and $i_s$ both include a less-than-1 factor $\frac{\omega_T}{\omega_0 \kappa}$ while that of $v_{R_g}$ does not. Therefore, the noise contribution from the gate resistance can be comparable to the channel thermal noise though it has lower source power. This analysis shows the importance of $Q_C$ and $Q_{L_g}$ in the optimization process. Completely ignoring them can introduce dramatic deviation from the real optimal design. However, $Q_s$, resulting from degeneration inductor resistance, source contact and substrate resistance, is of less importance. Eq. (1.100) - (1.102) can be simplified as

$$T_{\text{opt}}^{\Pi(a)} = \sqrt{\frac{\psi + \omega_T}{\omega_0 \kappa} \left( \frac{1}{Q_C} + \frac{1}{Q_{L_g}} \right)} \frac{1}{\chi},$$  

$$A_{\text{opt}}^{\Pi(a)} = 2 \sqrt{\frac{\psi + \omega_T}{\omega_0 \kappa} \left( \frac{1}{Q_C} + \frac{1}{Q_{L_g}} \right)} \frac{1}{\chi},$$  

$$F_{\text{opt}}^{\Pi(a)} = 1 + \frac{4 R_R \omega_0^2 \kappa^2}{R_L \omega_T^2} + \frac{2 \omega_0 \kappa}{\omega_T} \sqrt{\psi} \frac{\psi + \omega_T}{\omega_0 \kappa} \left( \frac{1}{Q_C} + \frac{1}{Q_{L_g}} \right) \frac{1}{\chi}.$$

The validity of this simplification is also verified in Fig. 1.7. Due to the similarity of the formulas, $\rho$ and $\kappa$ can be selected according to the same strategy used in the simple case.

For the example wherein the power gain needs to be maximized, there is no degeneration inductor and the optimal design satisfies

$$T_{\text{opt}}^{\Pi(b)} = 0,$$  

$$A_{\text{opt}}^{\Pi(b)} = \sqrt{\frac{\psi}{\chi}} = A_{\text{opt}}^{\Pi(b)},$$  

$$F_{\text{opt}}^{\Pi(b)} = 2 + \frac{4 R_R \omega_0^2 \kappa^2}{R_L \omega_T^2} + \frac{4 \omega_0 \kappa}{\omega_T} \sqrt{\psi} \frac{\psi + \omega_T}{\omega_0 \kappa} \left( \frac{1}{Q_C} + \frac{1}{Q_{L_g}} \right) \frac{1}{\chi}.$$

It is not surprising to see that including the finite quality factors makes no difference from the simple case because a series resistor at the gate node is required for impedance matching anyway. The difference between $F_{\text{opt}}^{\Pi(b)}$ and $F_{\text{opt}}^{\Pi(a)}$ is smaller than that between $F_{\text{opt}}^{\Pi(b)}$ and $F_{\text{opt}}^{\Pi(a)}$ which means the benefit of using an degeneration inductor will be further attenuated if finite quality factors are considered.

In the $I_D$-constrained example, if $q^\Pi \geq r^\Pi$, the optimization results will be exactly
Figure 1.7: The optimal noise figure varies with different combinations of $Q_C$, $Q_{Lg}$, and $Q_s$ in an unconstrained 10 GHz LNA design. The results are also compared to the approximate value calculated from (1.105). The capacitance ratio is fixed at $\kappa = 1$. We assume ideal inductor $L_d$ and ignore $g_{mb}$. The bias voltage is set $V_{GS} - V_{TH} = 0.6$ V. The device channel length is $L = 0.18 \ \mu m$. 
the same as those of the unconstrained situation in (1.100) - (1.102). If \( q^\Pi < r^\Pi \), the optimization results are

\[
T_{opt}^{\Pi(c)} \bigg|_{q^\Pi < r^\Pi} = q^\Pi, \quad (1.109)
\]

\[
\approx \frac{A_{max}}{2}, \quad (1.110)
\]

\[
A_{opt}^{\Pi(c)} \bigg|_{q^\Pi < r^\Pi} = 2(1 + \epsilon(2))(q^\Pi + \epsilon(3)) = A_{max}, \quad (1.111)
\]

\[
F_{opt}^{\Pi(c)} \bigg|_{q^\Pi < r^\Pi} = 2 + \frac{4R_G\omega_0^2\kappa^2}{R_L\omega_T^2}
\]

\[
+ \frac{\chi\omega_0\kappa}{\omega_T} \left[ (2 + 4\epsilon(2))\epsilon(3) - \epsilon(1) \right] - 1
\]

\[
+ \frac{A_{max}}{4}
\]

\[
\times \left[ 1 + \frac{4\epsilon(2) + 3}{4(1 + \epsilon(2))^2(1 + \frac{1}{Q_g^2})} \right]
\]

\[
+ \frac{2\chi\omega_0\kappa}{(1 + \frac{1}{Q_g^2})\omega_T A_{max}} \left[ \epsilon(3)\epsilon(1) + \epsilon(2)^2 \right]
\]

\[
+ \frac{\omega_T}{\chi\omega_0\kappa} \epsilon(3) + \left( 1 + \frac{1}{Q_g^2} \right) \frac{\psi}{\chi}, \quad (1.112)
\]

\[
\approx 1 + \frac{4R_G\omega_0^2\kappa^2}{R_L\omega_T^2}
\]

\[
+ \frac{2\chi\omega_0\kappa}{\omega_T} \left[ A_{max} \right]
\]

\[
+ \frac{\psi + \frac{\omega_T}{\omega_0\kappa} \left( \frac{1}{Q_C} + \frac{1}{Q_Lg} \right)}{\chi A_{max}} \right] \quad (1.113)
\]

(1.110) and (1.113) approximate the original formulas by making the high-Q assumption. For very low \( I_{D,max} \), the last term in (1.113) dominates.

\[
F_{opt}^{\Pi(c)} \bigg|_{q^\Pi < r^\Pi} \approx \frac{2\omega_0I_0\psi\rho^2}{\omega_T I_{D,max}} \left[ (1 + m\frac{1}{L_C})\rho + 1 \right]
\]

\[
+ \frac{2I_0 \left( \frac{1}{Q_C} + \frac{1}{Q_Lg} \right) \rho^2}{\kappa I_{D,max}} \left[ (1 + m\frac{1}{L_C})\rho + 1 \right], \quad (1.114)
\]

Comparing (1.114) with (1.84), the finite quality factors not only degrade the noise factor but also change the relation between \( F_{opt}^{\Pi(c)} \bigg|_{q^\Pi < r^\Pi} \) and \( \kappa \) because the degradation term is inversely proportional to \( \kappa \). In the extreme case of very low \( Q_C \) or \( Q_Lg \),
Figure 1.8: The optimal noise figure varies with finite quality factors \( Q_C = Q_L = Q_s = Q \) in a 10 GHz \( I_D \)-constrained design \( (I_{D,\text{max}} = 100 \ \mu A) \). The capacitance ratio \( \kappa \) is set to 1 and 2. We assume an ideal inductor \( L_d \) and ignore \( g_{mb} \). The bias voltage is set \( V_{GS} - V_{TH} = 0.6 \ \text{V} \). The device channel length is \( L = 0.18 \ \mu \text{m} \).

doubling \( \kappa \) can only improve the optimal noise figure by 3 dB instead of 6 dB. A design example with \( \kappa = 1 \) and 2 at \( I_{D,\text{max}} = 100 \ \mu A \) is plot in Fig. 1.8 to verify this trend.

From the above analysis, we show that the major degradation in the optimal noise factor is caused by \( \epsilon(3) \) and \( \epsilon(3) \) is only related to the curve \( A_0 = M^{\text{II}} \). In other words, the optimal noise factor is susceptible to the finite quality factors only if the optimal design lies on the curve \( A_0 = M^{\text{II}} \), such as \( F_{\text{opt}}^{\text{II(a)}} \) and \( F_{\text{opt}}^{\text{II(c)}} \). In an \( T_{\text{max}} \)-constrained case where the optimal design does not lie on the curve, the degradation is negligible, such as \( F_{\text{opt}}^{\text{II(b)}} \).

\( Q_{L_d} \) has not been included in the above discussions. In fact, a \( Q_{L_d} \)-constraint problem can be converted to an equivalent \( I_D \)-constraint problem by defining an equivalent \( I_{D,\text{max}} \). From (1.35) and (1.38), the equivalent current constraint is

\[
I_{D,\text{max}(Q_{L_d})} = \frac{2Q_{L_d}R_GI_0\rho^2}{yR_L[(1 + m\frac{L_d}{L_c})\rho + 1]}.
\]  

The only difference between a \( Q_{L_d} \)-constraint problem and an \( I_D \)-constraint problem is that lowering bias \( \rho \) does not help the noise factor.
\( \sigma^{\text{III}} \triangleq \{ Q_{L_d} = \infty, \; b \neq 0 \} \)

Based on the analysis in the previous part, we further include the back-gate transconductance into the optimization process. We cannot follow the same routine proposed in the previous cases since even a small \( b \) can make a fundamental change. For example, \( M \) becomes a second order function of \( T_0 \). We will apply the linear perturbation method to obtain an approximate expression of the noise factor perturbation from \( F_{\text{opt}}^{\text{II}} \).

The optimal design variables in this complete case are denoted by \( T_{\text{opt}} \) and \( A_{\text{opt}} \). Obviously, they are functions of \( b \) and they equal to \( T_{\text{opt}}^{\text{II}} \) and \( A_{\text{opt}}^{\text{II}} \) respectively when \( b = 0 \). Moreover, we assume the value of \( b \) is so small that the critical constraint does not change.

The global optimal noise factor \( F_{\text{opt}} \) is also a function of \( b \) and it equals \( F_{\text{opt}}^{\text{II}} \) when \( b = 0 \). So \( F_{\text{opt}} \) can be calculated approximately from \( F_{\text{opt}}^{\text{II}} \) by adding a linear perturbation term.

\[
F_{\text{opt}} \approx F_{\text{opt}}^{\text{II}} + \left. \frac{dF_{\text{opt}}}{db} \right|_{b=0} \cdot b. \tag{1.116}
\]

\( \frac{dF_{\text{opt}}}{db} \bigg|_{b=0} \) can be expanded according to the chain rule:

\[
\left. \frac{dF_{\text{opt}}}{db} \right|_{b=0} = \left( \frac{\partial F}{\partial b} + \frac{\partial F}{\partial A_0} \frac{\partial A}{\partial b} \right)_{\text{O}^{\text{II}}} + \left( \frac{\partial F}{\partial A_0} \frac{\partial \hat{A}}{\partial T_0} + \frac{\partial F}{\partial T_0} \right)_{\text{O}^{\text{II}}} \left. \frac{dT_{\text{opt}}}{db} \right|_{b=0}. \tag{1.117}
\]

\( \hat{A} \) is conditional optimum for a given \( T_0 \) so \( A_{\text{opt}} = \hat{A}|_{T_0=T_{\text{opt}}} \). And \( O^{\text{II}} \) is a condition defined as

\[
O^{\text{II}} \triangleq \{ A_0 = A_{\text{opt}}^{\text{II}}, \; T_0 = T_{\text{opt}}^{\text{II}}, \; b = 0 \}. \tag{1.118}
\]

(1.117) should be studied separately for different cases of critical constraints.

If the unperturbed optimization results are unconstrained, \( \hat{A} = M \) at the vicinity of the optimum and \( T_{\text{opt}} \) satisfies

\[
\left. \frac{\partial \hat{F}}{\partial T_0} \right|_{T_0=T_{\text{opt}}} = 0. \tag{1.119}
\]

\( \hat{F} \) is the conditional optimal noise factor for a given \( T_0 \). The left hand side of (1.119)
can be expanded as a power series of $b$:

$$
\frac{\partial \tilde{F}}{\partial T_0} \bigg|_{T_0=T_{\text{opt}}} = \frac{\partial \tilde{F}}{\partial T_0} \bigg|_{O^{\text{II}}} + o(1). \tag{1.120}
$$

Comparing the coefficients of (1.119) and (1.120) induces that

$$
\frac{\partial \tilde{F}}{\partial T_0} \bigg|_{O^{\text{II}}} = \left( \frac{\partial F}{\partial A_0} \frac{\partial \tilde{A}}{\partial T_0} + \frac{\partial F}{\partial T_0} \right) \bigg|_{O^{\text{II}}} = 0. \tag{1.121}
$$

and (1.117) becomes

$$
\frac{dF_{\text{opt}}}{db} \bigg|_{b=0} = \left( \frac{\partial F}{\partial b} + \frac{\partial F}{\partial A_0} \frac{\partial \sqrt{N}}{\partial b} \right) \bigg|_{O^{\text{II}}}. \tag{1.122}
$$

If the unperturbed optimization results are only $T_{\text{max}}$-constrained, $\tilde{A} = \sqrt{N}$ at the vicinity of the optimum and $T_{\text{opt}} \equiv T_{\text{max}}$. So (1.117) becomes

$$
\frac{dF_{\text{opt}}}{db} \bigg|_{b=0} = \left( \frac{\partial F}{\partial b} + \frac{\partial F}{\partial A_0} \frac{\sqrt{N}}{\partial b} \right) \bigg|_{O^{\text{II}}}. \tag{1.123}
$$

If the unperturbed optimization results are $A_{\text{max}}$-constrained, $A_{\text{opt}} \equiv A_{\text{max}} = \frac{\tilde{A}}{T_0=T_{\text{opt}}} = M |_{T_0=T_{\text{opt}}}$. Therefore

$$
0 = \frac{dA_{\text{opt}}}{db} \bigg|_{b=0} = \frac{\partial M}{\partial b} \bigg|_{O^{\text{II}}} + \frac{\partial M}{\partial T_0} \bigg|_{O^{\text{II}}} \frac{dT_{\text{opt}}}{dT_0} \bigg|_{b=0}. \tag{1.124}
$$

and (1.17) becomes

$$
\frac{dF_{\text{opt}}}{db} \bigg|_{b=0} = \left( \frac{\partial F}{\partial b} - \frac{\partial F}{\partial T_0} \frac{\partial M}{\partial T_0} \right) \bigg|_{O^{\text{II}}}. \tag{1.126}
$$

In the special case that the unperturbed optimization results are both $T_{\text{max}}$-constrained and $A_{\text{max}}$-constrained, the critical constraint of the perturbed results cannot be decided immediately. Then we should calculate $\frac{dF_{\text{opt}}}{db} \bigg|_{b=0}$ according to (1.123) and (1.126), and select the formula that gives a less increment. As a summary, we list the formulas that should be used for different unperturbed optimization results in Table 1.2.
Table 1.2: The Formula to Compute $\frac{dF_{\text{opt}}}{db}|_{b=0}$

<table>
<thead>
<tr>
<th>$A_{\text{opt}}^{\text{II}}$</th>
<th>$T_{\text{opt}}^{\text{II}}$</th>
<th>Critical Constraint</th>
<th>Formula</th>
</tr>
</thead>
<tbody>
<tr>
<td>$&lt; A_{\text{max}}$</td>
<td>$&lt; T_{\text{max}}$</td>
<td>unconstrained</td>
<td>(1.122)</td>
</tr>
<tr>
<td>$&lt; A_{\text{max}}$</td>
<td>$= T_{\text{max}}$</td>
<td>$T_{\text{max}}$-constrained</td>
<td>(1.123)</td>
</tr>
<tr>
<td>$= A_{\text{max}}$</td>
<td>$&lt; T_{\text{max}}$</td>
<td>$A_{\text{max}}$-constrained</td>
<td>(1.126)</td>
</tr>
<tr>
<td>$= A_{\text{max}}$</td>
<td>$= T_{\text{max}}$</td>
<td>undetermined</td>
<td>(1.123) or (1.126)</td>
</tr>
</tbody>
</table>

Figure 1.9: The disturbance of $F_{\text{opt}}^{(a)}$ from $F_{\text{opt}}^{\text{II}(a)}$ in an unconstrained design. The design setups are exactly the same as those of the design shown in Fig. 1.7. The perturbation is computed using (1.122).

In Fig. 1.9 and Fig. 1.10, we calculate the optimal noise figure perturbation for an unconstrained design and a low $I_D$-constrained design respectively. In both cases, we observe that including $g_{\text{mb}}$ into the model gives very few changes to the optimization results.

1.4 Conclusion

General optimization techniques have been proposed for the constrained optimizations of CMOS CS LNAs. A realistic but simple short-channel model for the MOS device is used to perform the noise optimization. The optimization result is subject to one of the three possible critical constraints: no constraint, $A_{\text{max}}$-constraint and $T_{\text{max}}$-constraint. The series resistance at the gate, $Q_C$ and $Q_{L_g}$, has non-negligible effects
Figure 1.10: The disturbance of $F_{\text{opt}}^{(c)}$ from $F_{\text{opt}}^{\text{II}(c)}$ in a low $I_D$-constrained design. The design setups are exactly the same as those of the design shown in Fig. 1.8. The perturbation is computed using (1.126).

to the unconstrained and the $A_{\text{max}}$-constrained designs. We also include the back-gate transconductance into the noise optimization process using linear perturbation method.

1.5 Definitions of Variables

Here we list the definitions of the major variables in Table 1.3.

1.6 Physical Constants

The first part, Table 1.4, includes the process-independent constants. These physical constants are related to material properties or empirical constants for improved fitting accuracy.

The second part, Table 1.5, includes the data that describe a virtual CMOS process.

In the third part, Table 1.6, the values of the intrinsic noise parameters for different channel lengths are listed. Data are according to [12] and [13].
Table 1.3: The Major Variables

<table>
<thead>
<tr>
<th>Variables</th>
<th>Definitions</th>
</tr>
</thead>
<tbody>
<tr>
<td>ρ</td>
<td>$\frac{V_{GS} - V_{TH}}{mE_{sat}L}$</td>
</tr>
<tr>
<td>κ</td>
<td>$1 + C_e/C_{gs}$</td>
</tr>
<tr>
<td>A</td>
<td>$2R_G \omega (C_e + C_{gs})$</td>
</tr>
<tr>
<td>T</td>
<td>$\frac{g_m\omega L_s}{C_{gs}}$</td>
</tr>
<tr>
<td>x</td>
<td>$\frac{WLC_{ox}}{C_{gs}}$</td>
</tr>
<tr>
<td>y</td>
<td>$C_d/C_{gs}$</td>
</tr>
<tr>
<td>$\omega_T$</td>
<td>$\frac{g_m}{C_{gs}}$</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>$\frac{g_{m}}{g_{m\mid V_{DS}=0}}$</td>
</tr>
<tr>
<td>b</td>
<td>$\frac{g_{mb}}{g_m}$</td>
</tr>
<tr>
<td>$\psi$</td>
<td>$\frac{g_{d}}{5\kappa^2}$</td>
</tr>
<tr>
<td>$\chi$</td>
<td>$\frac{\alpha\delta}{5\kappa^2} + \frac{2}{\alpha} - 2c_1\sqrt{\frac{\gamma\delta}{5\kappa^2}}$</td>
</tr>
<tr>
<td>$\xi$</td>
<td>$\frac{\alpha\delta}{5\kappa^2} - c_1\sqrt{\frac{\gamma\delta}{5\kappa^2}}$</td>
</tr>
</tbody>
</table>

Table 1.4: The Physical Constants

<table>
<thead>
<tr>
<th>Constants</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\varepsilon_{ox}$</td>
<td>3.9</td>
</tr>
<tr>
<td>$\nu_{sat}$</td>
<td>$8 \times 10^6$ cm/s [10]</td>
</tr>
<tr>
<td>$\mu_o$</td>
<td>540 cm$^2$/Vs [10]</td>
</tr>
<tr>
<td>$\eta$</td>
<td>1.85 [10]</td>
</tr>
<tr>
<td>$E_o$</td>
<td>$0.9 \times 10^6$ V/cm [10]</td>
</tr>
<tr>
<td>$R_G$</td>
<td>50 $\Omega$</td>
</tr>
<tr>
<td>$R_L$</td>
<td>50 $\Omega$</td>
</tr>
</tbody>
</table>

Table 1.5: The Virtual CMOS Process Constants

<table>
<thead>
<tr>
<th>Process Parameters</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>min. $L$</td>
<td>0.18 $\mu$m</td>
</tr>
<tr>
<td>$T_{ox}$</td>
<td>8 nm</td>
</tr>
<tr>
<td>$V_{THO}$</td>
<td>0.5 V</td>
</tr>
<tr>
<td>$m$</td>
<td>1.2</td>
</tr>
<tr>
<td>$\tilde{E}_o$</td>
<td>$0.35 \times 10^6$ V/cm ($\tilde{V}_{GS} = 1.0$ V)</td>
</tr>
<tr>
<td>$x$</td>
<td>0.7</td>
</tr>
<tr>
<td>$y$</td>
<td>1.0</td>
</tr>
</tbody>
</table>
Table 1.6: The Intrinsic Noise Parameters

<table>
<thead>
<tr>
<th>Channel Length $L$</th>
<th>$\gamma$</th>
<th>$\delta$</th>
<th>$c_i$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.18 $\mu$m</td>
<td>1.0</td>
<td>3.8</td>
<td>0.08</td>
</tr>
<tr>
<td>0.24 $\mu$m</td>
<td>0.87</td>
<td>2.1</td>
<td>0.16</td>
</tr>
<tr>
<td>0.5 $\mu$m</td>
<td>0.73</td>
<td>1.4</td>
<td>0.28</td>
</tr>
</tbody>
</table>

1.7 Theorems about NF Optimizations

Lemma 1. For $T_0 \geq 0$, the following inequality,

$$M^II > \frac{1 + \frac{1}{Q_s^2} \partial N^II}{2 \partial T_0},$$

will always be true if $Q_s$ and $Q_{L_s}$ are sufficiently large. Specially, when both $Q_s$ and $Q_{L_s}$ approach infinity, the following inequality is satisfied

$$M^I > \frac{1}{2} \frac{\partial N^I}{\partial T_0}.$$  

Proof. Firstly, at $T_0 = 0$,

$$M^II - \frac{1 + \frac{1}{Q_s^2} \partial N^II}{2 \partial T_0} \bigg|_{T_0=0} = \frac{\omega T}{2\chi\omega_0\kappa} + \left[ 2k\epsilon(0) + \frac{1}{2}\epsilon(1) \right].$$

Then, for $T_0 > 0$

$$\frac{\partial}{\partial T_0} \left( M^II - \frac{1 + \frac{1}{Q_s^2} \partial N^II}{2 \partial T_0} \right) = 1 + 2(k - 1).$$

From the definitions of $k$, $\epsilon(0)$ and $\epsilon(1)$, both (1.129) and (1.130) are positive if $Q_s$ and $Q_{L_s}$ are sufficiently high. Therefore, (1.127) is true for $T_0 \geq 0$.

When $Q_s$ and $Q_{L_s}$ become infinity, $M^II$ and $N^II$ converge to $M^I$ and $N^I$ respectively. So (1.128) is always true. \qed

Theorem 1. Eq. (1.94) is negative for $T_0 \in \theta^II_2$ and (1.95) is negative for $T_0 \in \theta^II_3$ when $Q_s$ and $Q_{L_s}$ are sufficiently high.

Proof. If $T_0 \in \theta^II_2 \cup \theta^II_3$,

$$\min\{\sqrt{N^II}, A_{\text{max}}\} \geq M^II.$$  

(1.131)
Use (1.127) in Lemma 1, we have

$$\min\{\sqrt{N^\Pi}, A_{\max}\} > \frac{1 + \frac{Q_s}{Q_L^g} \partial N^\Pi}{2} \frac{\partial T_0}{\partial T_0}. \quad (1.132)$$

It implies

$$\sqrt{N^\Pi} > \frac{1 + \frac{Q_s}{Q_L^g} \partial N^\Pi}{2} \frac{\partial T_0}{\partial T_0}, \quad \text{if } T_0 \in \theta^\Pi_2, \quad (1.133)$$

$$A_{\max} > \frac{1 + \frac{Q_s}{Q_L^g} \partial N^\Pi}{2} \frac{\partial T_0}{\partial T_0}, \quad \text{if } T_0 \in \theta^\Pi_3. \quad (1.134)$$

Then both (1.94) and (1.95) are negative. \hfill \Box

**Corollary 1.** Eq. (1.67) is negative for $T_0 \in \theta^I_2$ and (1.68) is negative for $T_0 \in \theta^I_3$.

**Proof.** These are the direct results of Theorem 1 when $Q_s$ and $Q_{Lg}$ approach infinity. \hfill \Box

**Theorem 2.** The inequality

$$r^\Pi > p^\Pi \quad (1.135)$$

is satisfied if both $Q_s$ and $Q_{Lg}$ are sufficiently high.

**Proof.** Since $r^\Pi$ is the unique positive solution that makes (1.96) vanish, (1.135) is equivalent to saying that (1.96) is negative as $T_0$ approaches $p^\Pi$ from the right. In fact, as $T_0 \rightarrow p^\Pi^+$, $\sqrt{N^\Pi} \rightarrow M^\Pi$, so (1.96) becomes

$$\frac{\partial F^\Pi}{\partial T_0} \rightarrow -\frac{4\chi \omega_0}{\left(1 + \frac{Q_s}{Q_L^g}\right) \omega_T} + \frac{2\chi \omega_0}{\omega_T M^\Pi} \frac{\partial N^\Pi}{\partial T_0} \bigg|_{T_0=p^\Pi}. \quad (1.136)$$

Use (1.127) in Lemma 1, (1.136) is negative and the proof is complete. \hfill \Box

**Corollary 2.** The inequality

$$r^I > p^I. \quad (1.137)$$

is always satisfied.

**Proof.** This is a direct result of Theorem 2 when $Q_s$ and $Q_{Lg}$ approach infinity. \hfill \Box

### 1.8 Expressions in Noise Factor Formulation

In Table 1.7, we give the explicit expressions of $|U|^2$, $|E|^2$, and $UE^*$ used in the noise analysis of a CS LNA. The derivations are based on Table 1.1 and the power matching at both ports.
Table 1.7: The Expressions about $|U|^2$, $|E|^2$ and $UE^*$

<table>
<thead>
<tr>
<th>Variables</th>
<th>Expressions</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\frac{</td>
<td>U</td>
</tr>
<tr>
<td>&amp; $\left( 1 + \frac{\omega_c}{\omega_p} bT_0 \right)^2 + \frac{1}{Q_s^2}$</td>
<td></td>
</tr>
<tr>
<td>&amp; $\frac{\omega_c}{\omega_p} \left( 1 + \frac{1}{Q_s^2} \right) + \frac{1}{Q_s^2}$</td>
<td></td>
</tr>
<tr>
<td>$\frac{</td>
<td>E</td>
</tr>
<tr>
<td>&amp; $\frac{2R_G \omega_c}{4k_B T} \left[ (1 - \frac{1}{2Q_s} bT_0) \left( 1 + \frac{\omega_c}{\omega_p} bT_0 \right)^2 + \frac{1}{Q_s^2} \right]$</td>
<td></td>
</tr>
<tr>
<td>$\frac{\Re(UE^*)}{4k_B T \Delta f}$</td>
<td>$\frac{-\omega_c}{\omega_p} T_0 \left[ 1 + \frac{1}{Q_s^2} + \frac{\omega_c}{\omega_p} bT_0 \left( 1 - \frac{1}{Q_s} bT_0 \right) \right] \chi$</td>
</tr>
<tr>
<td>&amp; $\left( 1 + \left( \frac{1}{Q_s} + bT_0 \right)^2 \right) \left( 1 + \frac{\omega_c}{\omega_p} bT_0 \right)^2 + \frac{1}{Q_s^2}$</td>
<td></td>
</tr>
<tr>
<td>$\frac{\Im(UE^*)}{4k_B T \Delta f}$</td>
<td>$\frac{-\omega_c}{\omega_p} T_0 \left[ \frac{\omega_c}{\omega_p} bT_0 + \left( \frac{1}{Q_s} + bT_0 \right) \left( 1 + \frac{1}{Q_s} + \frac{\omega_c}{\omega_p} bT_0 \right) \right] \xi$</td>
</tr>
<tr>
<td>&amp; $\left( 1 + \left( \frac{1}{Q_s} + bT_0 \right)^2 \right) \left( 1 + \frac{\omega_c}{\omega_p} bT_0 \right)^2 + \frac{1}{Q_s^2}$</td>
<td></td>
</tr>
</tbody>
</table>
Furthermore, we list the explicit expressions which are required in the calculation of (1.122), (1.123) and (1.126) in Table 1.8.

<table>
<thead>
<tr>
<th>Variables</th>
<th>Expressions</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\frac{\partial F}{\partial b}$</td>
<td>$\left.\frac{8R_G\omega_T^2}{R_L\omega_T^2}\left(\frac{1}{Q_s} - \frac{\omega_T}{\omega_T^2}\right)T_0\right.\frac{4\omega_T^2\epsilon\eta_0}{Q_s} + \frac{4\omega_T^2\epsilon\eta_0}{Q_s}$</td>
</tr>
<tr>
<td>$\frac{\partial F}{\partial A_0}$</td>
<td>$\left.\frac{2\chi\omega_0\kappa}{\omega_T}\left(1 - \frac{A_0^{11}}{A_0^2}\right)\right.$</td>
</tr>
<tr>
<td>$\frac{\partial F}{\partial T_0}$</td>
<td>$\left.\frac{4\chi_0\epsilon\eta_0}{\omega_T}\left[T_0 - \left(\frac{\omega_T}{A_0\chi_0\omega_0\kappa} + \frac{\epsilon\eta_0}{A_0^2}\right)\right]\right.$</td>
</tr>
<tr>
<td>$\frac{\partial M}{\partial b}$</td>
<td>$\left.\frac{\omega_T\kappa}{\omega_T} + \frac{1}{Q_Lg} - \frac{2(1+\epsilon(2))}{Q_s}\right.\left[\frac{2T_0^2}{1+\frac{\epsilon(2)}{Q_s}}\right]$</td>
</tr>
<tr>
<td>$\frac{\partial M}{\partial T_0}$</td>
<td>$\left.2(1+\epsilon(2))\right.$</td>
</tr>
<tr>
<td>$\frac{\partial N}{\partial b}$</td>
<td>$\left.\frac{\omega_T\kappa T_0}{Q_s}\sqrt{N}\right.\left[\frac{T_0}{Q_s} - \left(\frac{\omega_T}{Q_s}\chi_0\omega_0\kappa + \frac{1}{Q_s}\epsilon\eta_0\right)\right]\left[\frac{T_0^2}{1+\frac{\epsilon(2)}{Q_s}}\right]^{1/2}\sqrt{N}$</td>
</tr>
</tbody>
</table>
Chapter 2

A CMOS Ku-Band Single-Conversion Low-Noise Block Front-End for Satellite Receivers

This work presents a Ku-band single-conversion low-noise block (LNB) front-end in a 0.18 μm CMOS technology. The front-end down-converts the input signal from the Ku-band (10.5 - 13 GHz) to the L-band (0.75 - 2.25 GHz). The in-band noise figure is between 2.8 to 4.2 dB. It achieves a gain of 50 dB with ±2 dB variation. The in-band OIP3 is above 17 dBm and output 1-dB compression point is above 9 dBm. The front-end consumes total of 75 mA from a 1.8 V supply. The die area is 0.8×1.8 mm².

2.1 Introduction

A low-noise block (LNB) down-converter is a critical block in a digital broadcast satellite (DBS) receiver. It is usually installed outdoors with a dish antenna. Signals transmitted from space to earth are picked up by a dish antenna, then an LNB down-converts the received signals to lower frequencies. The output signals are sent to an indoor TV set-top box for finer channel selection. Input signals to an LNB are linearly polarized (vertical or horizontal) and contain useful information within a certain band (the C-band, the Ku-band or the Ka-band). A universal Ku-band LNB is able to detect both polarizations and covers the whole RF band in the Ku-band. The down-converted IF signal is in the L-band.

This work demonstrates an implementation of a fully integrated down-converter front-end in an ultra low-cost 0.18 μm CMOS technology. In addition to the coverage of the Ku-band frequency range, three major challenges remain. The first of them is to achieve a very low noise figure (NF). The NF of a modern LNB down-converter can be reduced to 0.5 dB or even lower. Therefore, discrete HEMT amplifiers are
Figure 2.1: The architecture of a Ku-band LNB down-converter.

usually placed in front of a CMOS down-converter to achieve such a low NF. In the proposed design, the maximum in-band NF of the CMOS front-end is 4.2 dB. This helps to relax the power gain requirement from discrete components and reduces system costs. The second challenge is to obtain a small gain variation along the whole frequency range since the received signals always contain information from all channels simultaneously. The third is to achieve high OIP3 and P1dB in order to meet linearity requirements.

The chapter is organized as follows. The down-converter front-end architecture is described in Section 2.2. Designs of circuit blocks are discussed in Section 2.3 and Section 2.4. Experimental results are summarized in Section 2.5.

2.2 Front-End Architecture

The block diagrams of the proposed universal Ku-band LNB is depicted in Fig. 2.1. A complete down-converter front-end consists of amplifiers built with discrete HEMT devices, an image rejection filter and an integrated CMOS front-end. The LO generation circuit can be a dielectric resonant oscillator (DRO) or an on-chip
PLL. The down-converter uses a single-conversion architecture. The frequency down-conversion is based on a variable LO and fixed IF scheme. Input RF signals in the Ku-band are split into two sub-bands: low-band (LB) from 10.50 to 12.00 GHz and high-band (HB) from 11.50 to 13.00 GHz. The corresponding LO frequencies for LB and HB are 9.75 GHz and 10.75 GHz respectively. Therefore, the IF frequencies for both LB and HB have the same range from 0.75 to 2.25 GHz.

To ensure a design with low NF, an amplifier with very low noise as well as high power gain is necessary. A two-stage low noise amplifier (LNA) design can accomplish this goal. The first stage LNA is optimized for noise performance with a moderate power gain and the second stage is designed for high power gain. We assume that an external filter has rejected signals in the image band (7.50 - 10.00 GHz) to a desired level, so a single-balanced mixer is used for low power. This also circumvents the need for an on-chip balun to convert the single-ended RF signal to differential. The mixer is followed by a three-stage IF amplifier. The purpose of the IF amplifier is to provide more gain and guarantee high enough output power to drive the next stage.

In our design, both input and output are matched to 50 Ω impedance for convenience of measurements, though the LNB output will drive a 75 Ω cable in practice.
2.3 Low-Noise Amplifier

The two-stage LNA has a single-ended input and a single-ended output. The first stage adopts a common source (CS) design since it has superior noise performance over other topologies, while the second stage is based on a cascode configuration in order to provide higher power gain. The complete schematic of the LNA with detail matching network is shown in Fig. 2.2.

The key point to realize good noise performance is to ensure simultaneous input matching and noise matching. A traditional analysis shows a topology of CS with inductive degeneration can achieve simultaneous matching by adjusting degeneration inductance [15]. However, this method is based on three major assumptions. Firstly, the amplifier is unilateral, which can be guaranteed by good layout. Secondly, the source degeneration inductor $L_s$ is lossless. This is usually valid when $L_s$ is not too large and has a high quality factor. In fact, if the real part of input impedance is adjusted to 50 $\Omega$ with an $f_T$ of 50 GHz, $L_s$ has an approximate value of $\frac{R_{in}}{2\pi f_T} = 160$ pH and can have a quality factor of 15 in a typical process. The third assumption is that the back-gate transconductance $g_{mb}$ is ignored. In fact, $g_{mb}$ plays a role in matching. In order to see the fact, let us consider a simplified model of CS with inductive degeneration shown in Fig. 2.3 which includes $g_{mb}$. The input impedance of the amplifier is given by

$$R_{in} = R_g + \frac{\omega_T L_s + g_{mb}(\omega L_s)^2}{1 + (g_{mb}\omega L_s)^2}$$

$$X_{in} = -\frac{1}{\omega C_{gs}} + \frac{\omega L_s(1 - g_{mb}\omega T L_s)}{1 + (g_{mb}\omega L_s)^2}$$

The model includes two internal independent noise sources $\overline{v_g^2}$ and $\overline{i_d^2}$, and the noise factor of the amplifier can be calculated as

$$F = 1 + \frac{R_g}{R_s} + \left(\frac{\gamma}{\alpha}\right) \frac{g_m}{R_s} \frac{(R_s + R_g)^2 + (X_s - \frac{1}{\omega C_{gs}} + \omega L_s)^2}{(X_s - \frac{1}{\omega C_{gs}} + \omega L_s)^2}$$

The optimal source impedance is found by solving $\frac{\partial F}{\partial R_s} = 0$ and $\frac{\partial F}{\partial X_s} = 0$. Assuming $R_s \gg R_g$, we have

$$R_{s, opt} \approx \left(\frac{\omega_T}{\omega} + g_{mb}\omega L_s\right) \sqrt{\frac{R_g}{\left(\frac{\gamma}{\alpha}\right) g_m}}$$

$$X_{s, opt} = \frac{1}{\omega C_{gs}} - \omega L_s$$
Therefore, the optimum noise factor is

\[ F_{\text{min}} \approx 1 + \frac{2\sqrt{\frac{1}{\omega_T}} g_m R_d}{\omega + g_{\text{mb}} \omega L_s} \]  

(2.6)

Simultaneous input matching and noise matching requires \( R_s = R_{\text{in}} = R_{s,\text{opt}} \) and \( X_s = -X_{\text{in}} = X_{s,\text{opt}} \). For non-vanishing values of \( g_{\text{mb}} \), it is impossible to satisfy the two equations with only one variable \( L_s \). But when \( g_{\text{mb}} = 0 \), \( X_{\text{in}} = -X_{s,\text{opt}} \) is automatically satisfied, leaving only \( L_s \) to satisfy \( R_{\text{in}} = R_{s,\text{opt}} \). The above analysis motivates the use of deep N-well NMOS in the LNA design where the isolated P-well bulk is always connected to the source and the effect of \( g_{\text{mb}} \) can be eliminated. From (2.6), it can be observed that \( g_{\text{mb}} \) can help to reduce \( F_{\text{min}} \). In practice, \( \frac{\omega}{\omega_T} \gg g_{\text{mb}} \omega L_s \), which results in a negligible improvement.

The design procedures for the two stages are similar. The gate bias voltages are set to 1.0 V to get an optimal \( f_T \) of about 50 GHz. Transistors are sized according to their \( R_{s,\text{opt}} \). The total width of \( M_1 \) is 80 \( \mu \)m which corresponds to \( R_{s,\text{opt}} = 50 \) \( \Omega \). Since the input impedance will be set to 50 \( \Omega \), NF degradation due to the finite quality factor of \( L_{g1} \) can be minimized for a unity resistance transition. The widths of \( M_2 \) and \( M_3 \) are 50 \( \mu \)m. The \( R_{s,\text{opt}} \) of the second stage is 80 \( \Omega \). Non-50 \( \Omega \) matching helps to reduce bias current as well as power loss in the interstage matching network consisting of \( L_{d1} \) and \( L_{g2} \). In the cascode stage, an inductor \( L_{d2} \) is placed at the interconnection point to resonate out the capacitance of the node. This can improve the noise performance of a cascode amplifier [16].

The LNA is an inductor-intensive design. Inductors are placed close to each other,
so the coupling between these inductors should be modeled. Moreover, small inductors (the minimum is 200 pH) are sensitive to subtle parasitics in the layout. Therefore, the full layout (excluding active devices) is simulated in a 3D EM simulator.

In simulation, the first stage of the LNA achieves a power gain of 8 dB and consumes 15 mA. The second stage of the LNA has a power gain of 13 dB and consumes 8 mA. The two-stage LNA achieves a minimum NF of 2.1 dB.

2.4 Mixer and IF-Amplifier

This section describes the work of Jiashu Chen, a research collaborator in the project. He designed the mixer and IF stages. For complements, we give a brief description of his work. The single-balanced mixer is shown in Fig. 2.4. The RF input transistor $M_4$ acts as the input stage of a cascode LNA. $L_{s4}$ and $L_{g4}$ are selected for simultaneous input matching and noise matching to 50 Ω. The gate bias voltage is set to 0.75 V with a corresponding $f_T$ of 34 GHz. Though it is shifted from the optimal value, a large amount of bias current can be saved. Current bleeding circuits with a resonating inductor $L_{d4}$ are used to improve conversion gain as well as NF [17]. The input LO signal is single-ended. An on-chip balun converts it to a differential form and it introduces about 3 dB of losses. The LO buffer has a tuned load peaking at about 10 GHz. Any noise in the LO signal locating at the IF band will be attenuated.
The bias currents are 5 mA for the mixer, 11 mA for the first IF amplifier, 10 mA for the second IF amplifier and 22 mA for the last stage. For an input LO power level of 0 dBm, the mixer together with the buffers achieves a power conversion gain of 33 dB.
2.5 Experimental Results

A complete front-end as well as a stand-alone two-stage LNA were fabricated in a 0.18 $\mu$m CMOS technology. The chip microphotograph of the front-end is shown in Fig. 2.6. The stand-alone LNA consumes 23 mA from a 1.8 V supply and the complete front-end consumes 75 mA from a 1.8 V supply. All measurements are performed by on-wafer probing.

The performance of the stand-alone LNA is measured over the RF frequency ranging from 10.5 GHz to 13 GHz. The measured S-parameters are shown in Fig. 2.7. It has a peak gain of 19.3 dB at 12 GHz and the RF bands of interests are nearly located within the 3 dB bandwidth. We are also able to measure the noise parameters of the LNA which are shown in Fig. 2.8. For frequencies above 11 GHz, NF almost overlaps the optimal value $\text{NF}_\text{min}$ which means both impedance matching and noise matching are well satisfied. The minimum NF over the RF band is 2.4 dB and has 0.3 dB degradation from simulation. For frequencies below 11 GHz, NF deviates from $\text{NF}_\text{min}$. To explain this, a plot of $\Gamma_{\text{opt}}$ is given. $\Gamma_{\text{opt}}$ is low in the entire RF band. This indicates that the input impedance $Z_{\text{in}}$ matches to optimal noise impedance $Z_{s,\text{opt}}$. Therefore, the deviation is caused by the mismatch between $Z_{\text{in}}$ and $Z_{s}$. This is compatible with the observation in Fig. 2.7 that input matching ($S_{11}$) is slightly off for frequencies below 11 GHz.

The complete front-end is tested under two LO frequencies, 9.75 GHz (LB) and 10.75 GHz (HB). The input LO power levels in both cases are 0 dBm. Fig. 2.9 shows the conversion gain from the RF band to the IF band as well as the image gain from the image band to the IF band. Combining the two cases of LB and HB, the front-end
achieves a center gain of 50 dB and a gain variation of ±2 dB. The image rejection is 8 dB in the worst case. Fig. 2.10 shows the single side band (SSB) NF, OIP3 and output P1dB. Without using an external image rejection filter, the double side band (DSB) NF is measured. The SSB NF can be calculated according to (2.7):

\[ F_{SSB} = \frac{\text{Gain} + \text{Image Gain}}{\text{Gain}} F_{DSB} \]  

(2.7)

where \( F_{SSB} \) and \( F_{DSB} \) are noise factors in linear scale. The in-band SSB NF is between 2.8 and 4.2 dB. A two-tone test is performed to measure intermodulation distortion. The two tones are separated by 10 MHz. The in-band OIP3 is above 17 dBm. And the output 1-dB compression point is above 9 dBm over the whole band. The key performance is summarized in Table 2.1 and compared with other work.
Figure 2.8: Measured noise parameters of the stand-alone LNA.

Figure 2.9: Front-end conversion gain for the RF band and the image band (LB and HB respectively) versus IF frequencies.
2.6 Conclusion

In this work, we presented a CMOS front-end for the application of a universal Ku-band LNB down-converter. It achieves a gain of 50 dB with ±2 dB variation. The in-band NF is between 2.8 to 4.2 dB. The OIP3 is above 17 dBm and output 1-dB compression point is above 9 dBm. The front-end consumes total of 75 mA from a 1.8 V supply.
Chapter 3

A 4-Port-Inductor-Based VCO Coupling Method for Phase Noise Reduction

A 4-port-inductor-based VCO coupling technique is introduced to improve VCO phase noise performance. Complete design steps including resonant network design and circuit topology selection are discussed and prototype designs have been demonstrated to verify the analysis. The proposed 12.8 GHz CCVCO design achieves phase noise of -116 dBc/Hz at 1 MHz offset, a tuning range of 31.4%, FOM and FOMT of 184 and 194 respectively.

3.1 Introduction

Achieving low phase noise performance has become the most challenging problem for CMOS VCO designs when applications move to higher frequencies and supply voltages scale down. Explicitly, a simple but intuitive phase noise formula can be derived for a VCO working in the voltage-limited regime wherein the best phase noise is usually achieved [20]:

\[
\mathcal{L}_{\text{min}} \propto (1 + \alpha \eta) \cdot \frac{L}{Q_T \cdot V_{o,\text{max}}^2} \cdot \frac{\omega_0^3}{(\Delta \omega)^2}. \tag{3.1}
\]

\(Q_T\) can be considered as process dependent only, so the only way to compensate the degradation caused by \(\omega_0\) and \(V_{o,\text{max}}\) is to scale down inductance \(L\). However, the layout parasitics can limit the use of this method in practice when \(L\) is already very small. An alternative way is to have \(N\) equal VCOs that are coupled by connecting their outputs together, Fig. 3.1 (b). These coupled oscillators are locked in phase, so the total current is \(N\) times a single VCO and the impedance is reduced by a factor of \(N\). Therefore, the phase noise \(\mathcal{L}\) is effectively improved. Notice that this method
improves only phase noise but not the figure of merit (FOM).

Motivated by this conceptual description of coupled oscillators, we have developed a practical oscillator coupling technique by using a 4-port inductor. In Section 3.2, this technique is explained by discussing the circuit design details. Prototype designs have been fabricated in a 65nm CMOS process and the experimental results are shown in Section 3.3.

3.2 VCO Circuit Designs

3.2.1 4-Port Inductor

A 4-port inductor is shown in Fig. 3.2 (a). This is a balanced structure whose differential mode (DM) and common mode (CM) are separable [21]. Therefore, we need to consider the DM characteristic only. We study two scenarios. In the first, port 3 and 4 are shorted and differential signals are applied to port 1 and 2. It is equivalent to a single-turn 2-port inductor. Its DM inductance is denoted by $L_{2p}$ and its quality factor is denoted by $Q_{2p}$. Fig. 3.2 (b) shows a distributed model for this case. In the second scenario, equal differential signals are applied to both sides. In
addition, port 1 and 4 are in phase and so are port 2 and 3. Now the DM inductance at each DM port is $L_{4p}$ and the quality factor is $Q_{4p}$. A similar distributed model is shown in Fig. 3.2 (c). A conclusion about the two models can be drawn:

$$L_{4p} \approx \frac{1}{2} \cdot L_{2p}, \text{ at low frequencies;} \quad (3.2)$$

$$Q_{4p} > Q_{2p}, \text{ at high frequencies.} \quad (3.3)$$

(3.2) is straightforward when all parasitic capacitance becomes negligible at low frequencies. This means that 4-port DM operations can reduce the network impedance by half. And (3.3) can be viewed as an extra performance bonus which can be understood by observing the fact that the quarter taps of the loop are virtual grounds in 4-port DM operations.

### 3.2.2 Interlocked-Ring Structure

Some capacitance is needed to form a complete resonant network. To simultaneously satisfy two contradictory design specifications of the frequency tuning range (TR) and the frequency sensitivity ($K_{VCO}$), one must use a large digitally controlled coarse tuning capacitor array, and then the intra-connection inside the array may become problematic. An interlocked-ring structure is proposed for the intra-connection of a large capacitor array in Fig. 3.3. With the finite inductance of the metal traces, the structure should be modeled as a transmission line and the signal delay along these traces can be significant especially at high frequencies. In the proposed structure, the current directions of any two adjacent traces must be opposite so that the mutual inductance always partially cancels the self inductance and signal delay is reduced. Therefore, we can assume that signal has the same phase at any point of a ring.

### 3.2.3 LC Tank

Two kinds of resonant networks can be constructed using 4-port inductors and the interlocked-ring capacitor arrays. Fig. 3.4 (a) shows a conventional parallel $LC$ network. Assuming the total DM capacitance of an array is $C_t$, the resonance frequency is

$$f_{2p} = \frac{1}{2\pi L_{2p}C_t^{\frac{1}{2}}}. \quad (3.4)$$

In Fig. 3.4 (b), a cross-coupled 4-port $LC$ tank is shown. At each DM port, there are two capacitor arrays and the ports are forced in phase by the cross traces at the
Figure 3.2: (a) A 4-port inductor. The distributed model for (b) 2-port DM operations and (c) 4-port DM operations with port 1 and 4 in phase and port 2 and 3 in phase.
Figure 3.3: A 5-bit (31-unit) coarse tuning capacitor array is connected in the interlocked-ring structure. The current directions are shown by the arrows.

center. Its resonance frequency is calculated by

\[ f_{4p} = \frac{1}{2\pi (L_{4p} \cdot 2C_t)^{\frac{1}{2}}} \approx f_{2p}. \]  

(3.5)

Though the two LC tanks have close resonance frequencies, the network impedance of the cross-coupled 4-port LC tank is only quarter that of a 2-port LC tank which can lead to about 6 dB improvement in phase noise.

3.2.4 VCO Topologies

Now we have two kinds of LC tanks (2-port and 4-port), we are ready to construct corresponding VCO topologies. Fig. 3.5 (a) shows a conventional NMOS cross-coupled VCO with PMOS tail current feeding from the center tap of the loop inductor (NVCO). Fig. 3.5 (b) and (c) are two types of coupled VCOs (CVCO) based on cross-coupled 4-port LC tanks. The former uses NMOS cross-coupled pairs for both VCOs (NCVCO) and the bias current has to feed from the quarter taps. The latter topology uses complementary NMOS and PMOS cross-coupled pairs respectively (CCVCO). They share the same bias current so the total bias current half of the NCVCO topology.
3.3 Experimental Results

To verify our analysis and discussions in Section 3.2, both passive structures and prototype VCO designs are fabricated in a 65 nm LP CMOS process. All the measurements are accomplished by on-wafer probing.

3.3.1 Passive Structures

Passive structures include a 4-port inductor as shown in Fig. 3.2 (a) and a cross-coupled 4-port LC tank as shown in Fig. 3.4 (b). Both of them are fully characterized by a 4-port network analyzer, then the DM characteristics can be derived according to their 4-port parameters.

Fig. 3.6 (a) and (b) compare $L_{4p}$ and $Q_{4p}$ of the 4-port inductor to $L_{2p}$ and $Q_{2p}$ respectively. Both plots have shown good agreement with the conditions (3.2) and (3.3). The superiority of $Q_{4p}$ over $Q_{2p}$ at low frequencies is not obvious because the capacitive coupling to the substrate is a minor issue, while at high frequencies the difference is remarkable. In our case, the frequency of interest is about 10 to 15 GHz, where $Q_{4p}$ is between 25 and 20, or about 25% to 33% higher than $Q_{2p}$.

Fig. 3.7 plots the 4-port DM impedance $|Z_{4p}|$ of the cross-coupled 4-port LC tank. This curve indicates a 4-port resonance frequency of 15 GHz.

3.3.2 Prototype VCOs

Using the same 4-port inductor and cross-coupled 4-port LC tank, we built the three types of VCOs in Fig. 3.5. The chip micrograph of the CCVCO design is shown in Fig. 3.8. A differential-input differential-output buffer is placed at one side of the
Figure 3.5: The schematics of three VCO topologies: (a) NVCO, (b) NCVCO and (c) CCVCO. In these plots, the interlocked-ring structures of capacitor arrays have been simplified.
Figure 3.6: The measured DM 2-port and 4-port characteristics of a 4-port inductor: (a) the inductances and (b) the quality factors.
Figure 3.7: The measured DM impedance of a cross-coupled 4-port LC tank.

VCO. The other two designs have similar layout floorplans. Though the output is in differential mode, only one side is connected to the spectrum analyzer and the other side is terminated with a broadband 50 Ω load. This causes an inherent loss of 3 dB in the measured signal power. The measured frequency tuning range of the CCVCO design is plotted in Fig. 3.9. By using 6-bit coarse tuning plus analog tuning scheme, more than 30% of tuning range is achieved with $K_{VCO}$ kept relatively low.

It is the most important to compare the phase noise performance of these three designs to verify the proposed VCO coupling method. Results are shown in Fig. 3.10. Using the NVCO design as a reference, the phase noise improvement of the NCVCO design is only 4 dB, less than the expected 6 dB. The extra noise is contributed by the tail current sources at the quarter tap which is CM noise in an ideal case. In fact, any unbalanced load, in our case the one-side buffer, can cause CM to DM conversion and degrade phase noise. On the other hand, the improvement of the CCVCO design is 8 dB. The additional 2 dB comes from the superior flicker noise performance of the complementary topology.

Finally, the performance of the three VCO designs are summarized in Table 3.1 and compared with two previously published designs. [22] uses a similar technology but higher voltage supply and [23] uses a better technology but the same voltage supply as our prototypes. Both proposed CVCO topologies have achieved lower phase noise than the reference design (NVCO). Especially, the CCVCO topology has achieved the state-of-the-art performance at the frequency.
Figure 3.8: The chip micrograph of the CCVCO design with the output buffer included.

Figure 3.9: The measured frequency tuning range of the CCVCO design with 6-bit coarse tuning control.
Figure 3.10: The measured phase noise of the three VCO designs: (a) NVCO, (b) NCVCO and (c) CCVCO. Their coarse tuning control codes are all set to 0x1F.
Table 3.1: VCO Performance Summary and Comparison

<table>
<thead>
<tr>
<th>Designs</th>
<th>NVCO</th>
<th>NCVC0</th>
<th>CCVC0</th>
<th>[22]</th>
<th>[23]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Tech.</td>
<td>65nm</td>
<td>65nm</td>
<td>65nm</td>
<td>65nm</td>
<td>0.13μm</td>
</tr>
<tr>
<td>Supply [V]</td>
<td>1.2</td>
<td>1.2</td>
<td>1.2</td>
<td>2.0</td>
<td>1.2</td>
</tr>
<tr>
<td>Center $f_0$ [GHz]</td>
<td>12.3</td>
<td>13.0</td>
<td>12.8</td>
<td>11.2</td>
<td>10.3</td>
</tr>
<tr>
<td>TR [%]</td>
<td>30.0</td>
<td>31.4</td>
<td>31.3</td>
<td>9.9</td>
<td>11.7</td>
</tr>
<tr>
<td>$K_{VCO}$ [GHz/V]</td>
<td>0.13</td>
<td>0.16</td>
<td>0.13</td>
<td>1.83</td>
<td>N/A</td>
</tr>
<tr>
<td>$\mathcal{L}$ (1MHz) [dBc/Hz]</td>
<td>$-108 \sim -112 \sim -116 \sim -116$</td>
<td>$-116$</td>
<td>$-118$</td>
<td></td>
<td></td>
</tr>
<tr>
<td>$P_{dc}$ [mW]</td>
<td>7.4</td>
<td>31.0</td>
<td>22.5</td>
<td>72</td>
<td>7.8</td>
</tr>
<tr>
<td>FOM [dB/Hz]</td>
<td>180</td>
<td>178</td>
<td>184</td>
<td>178</td>
<td>189</td>
</tr>
<tr>
<td>$FOM_T$ [dB/Hz]</td>
<td>190</td>
<td>188</td>
<td>194</td>
<td>178</td>
<td>191</td>
</tr>
</tbody>
</table>
3.4 Conclusion

In this work, we demonstrate a VCO coupling technique based on 4-port inductors that can effectively improve the phase noise performance. A prototype CCVCO design has achieved state-of-the-art performance in both phase noise and tuning range figure of merits.
Chapter 4

Design of CMOS True-Single-Phase-Clock Dividers Based on the Speed-Power Trade-Off

In this work, we introduce a true-single-phase-clock (TSPC) divider synthesis technique that is based on a general TSPC logic family. The design of five types of TSPC dividers and prescalers (RE-0∼4) are discussed, and their performance are compared in terms of the speed-power trade-off. The proposed RE-2 type has shown better balance between speed and power performance than other types. The measurement results show that the maximal input frequencies can be 19 GHz and 16 GHz for a divide-by-2 divider and a divide-by-2/3 prescaler respectively, and the power consumption is less than 0.5 mW.

4.1 Introduction

High speed and low power are the two major challenges for modern communication circuit designs. A frequency divider is a good example that requires balance between the two sides. True-single-phase-clocke (TSPC) dividers are well known for their low power consumption comparing to the current mode logic (CML) implementation, but their application is limited to relatively low frequencies. With the development of CMOS technologies, the improvement of the intrinsic speed of a device makes it possible for the TSPC logics to replace CML even in high frequency (> 10 GHz) applications.

The standard TSPC logic was introduced in [24]. Later in [25]-[26], modified topologies were proposed to improve speed performance. Moreover, based on the Extended-TSPC (E-TSPC) technique which was discussed in [26], more recent re-
searches have been published on the design of both fixed-modulus dividers and dual-modulus prescalers [27]-[30]. In this work, we will extend the design basis to a more general TSPC logic family. According to the proposed synthesis rules, various types of TSPC dividers and prescalers including both the known types and new types can be constructed. The speed-power trade-off is the most critical concern for TSPC circuits. Design strategies such as topology-selection and transistor sizing are thoroughly discussed. Finally, both simulation and measurement results of prototype designs are demonstrated to verify our analysis.

The chapter is organized as follows. In Section 4.2, we investigate the general single-stage TSPC building blocks based on which we can build different types of TSPC D flip-flops (DFF). In Section 4.3, we discuss the design strategy of TSPC divide-by-2 dividers for the speed-power trade-off. Further, we extend the analysis to divide-by-2/3 prescalers. In Section 4.4, experimental results are demonstrated to validate the analysis.

### 4.2 Basic TSPC Logic Family

TSPC logic gates refer to those who have series single-phase-clock-controlled latches in their NMOS logic, PMOS logic or both. Assuming the single-input single-output cases, there are total of seven types of basic TSPC logic gates and their schematics are shown in Fig. 4.1. The clocked transistors whose gates are connected to the clock signal “$\phi$” are always placed close to the rail for higher operating speed [31]. Each basic logic gate is named according to its schematic following two principles. Firstly, each name is comprised of two characters. The first character is for the clock and the second is for the input “A”. Secondly, each character indicates the connection status of the corresponding signal and is selected from “C”, “N” and “P”. “C” means the corresponding signal is present in both the NMOS logic and the PMOS logic, while “N” (“P”) means the signal is present only in the NMOS (PMOS) logic. Moreover, these basic logics can be categorized into two subgroups. Fig. 4.1 (a) - (e) belong to a group called the ratioless logic family and Fig. 4.1 (f) - (g) belong to the ratioed logic family. In a ratioless logic gate, the NMOS branch and the PMOS branch cannot be turned on at the same time. On the other hand, for a ratioed logic gate both may be on simultaneously so the output level may depend on the size ratio of the NMOS and the PMOS transistors for certain input combinations. Therefore, in addition to the standard high and low levels, a ratio logic also includes a ratioed logic level (high or low). In Section 4.6, we summarize the truth table of all these basic TSPC logics where “z” stands for the dynamically holding state and “x” stands for the ratioed logic level.

By cascading multiple stages of basic TSPC logic gates which can be regarded as clocked inverters (Fig. 4.1 (h)), one can construct an edge-triggered TSPC DFF. At this initial step, only ratioless logic gates are used for better robustness, and the
Figure 4.1: The family of CMOS TSPC logic gates. Ratioless types: (a) CC, (b) CN, (c) CP, (d) NC and (e) PC. Ratioed types: (f) NP and (g) PN. A general symbol (h).
number of stages should be minimized to enhance operating speed and reduce power consumption. An exhaustive search shows the minimal required stages are three and the three stages can have four possible combinations corresponding to four types of DFFs which are listed in Table 4.1. RE-0 and FE-0 are dual structures in terms of signal phases, and so are RE-1 and FE-1. Dual structures have the same timing behavior except the signal phases are inverted, so in the following we only discuss the topologies that are rising-edge-triggered. In Fig. 4.2, we show a general schematic that can be applied to all kinds of three-stage DFFs. “D” is the data input and “Q” is the inverted output. “M” and “N” are two internal nodes between stages.

### 4.3 TSPC Dividers and Prescalers

#### 4.3.1 Ratioless Divide-by-2 Divider

Given a specific type of DFF, RE-0 or RE-1, a divide-by-2 divider can be constructed by feeding the inverted output “Q” back to the data input “D”, as shown in Fig. 4.3 (a). We will first compare the performance of the two types of ratioless dividers. The dividers are given the same names as the basic DFFs. Notice that the RE-1 type is actually the standard TSPC topology.

According to the clock phase and the output signal transition, a complete divide-by-2 cycle is divided into four non-overlapping and consecutive phases. The definitions of the phases are summarized in Table 4.2. Each phase consists of one or more logic transitions and they are depicted in Fig. 4.4 for an RE-0 divider and an RE-1 divider respectively. Though the two types of dividers are different by one transistor in
Figure 4.3: (a) A TSPC divide-by-2 divider. (b) A TSPC divide-by-2/3 prescaler. The divider is 2 when MC = 1 and is 3 when MC = 0.

Table 4.2: The Signal Transition in the 4-Phase Divide-by-2 Operation of a Rising-Edge-Triggered Divider

<table>
<thead>
<tr>
<th>Phase</th>
<th>φ</th>
<th>D (Q)</th>
<th>Propagation Delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>0</td>
<td>0 (Hold)</td>
<td>( t_I = t_{1,1} )</td>
</tr>
<tr>
<td>II</td>
<td>1</td>
<td>0 → 1</td>
<td>( t_{II} = t_{2,1} + t_{2,2} + t_{2,3} )</td>
</tr>
<tr>
<td>III</td>
<td>0</td>
<td>1 (Hold)</td>
<td>( t_{III} = t_{3,1} )</td>
</tr>
<tr>
<td>IV</td>
<td>1</td>
<td>1 → 0</td>
<td>( t_{IV} = t_{4,1} )</td>
</tr>
</tbody>
</table>

the second stage, the voltages at all nodes are exactly the same at all times. The only difference happens in Phase I wherein “N” is dynamically held high by node capacitance in an RE-0 divider but it is statically charged to the high level in an RE-1 divider.

The propagation delay of the phases are denoted by \( t_I, t_{II}, t_{III} \) and \( t_{IV} \) respectively. Each of them equals to the total transition delay within the phase (Table 4.2), and the maximal input frequency is limited by the phase which has the maximal propagation delay:

\[
f_{in} \leq \frac{1}{2 \times \max\{t_I, t_{II}, t_{III}, t_{IV}\}}.
\] (4.1)

Because an RE-0 divider has one more transistor in stack than an RE-1 divider, the transition speed will be slower due to higher loading capacitance as well as higher charging path resistance. Therefore, an RE-1 divider usually has a higher maximal input frequency than an RE-0 divider.

Ideally, there is no direct path from the supply to the ground at any time, so the power consumption of the dividers is dominated by the dynamic switching power and
Figure 4.4: The 4-phase divide-by-2 operation of two types of ratioless TSPC dividers: (a) RE-0 (PC-CC-NC) and (b) RE-1 (PC-CN-NC). The turned-off transistors are depicted in gray.
RE-1: Phase I

RE-1: Phase II

RE-1: Phase III

RE-1: Phase IV

(b)

Figure 4.4: (Continued).
it is calculated by

\[ P_{sw} = C_L V_{dd}^2 f_{in} \frac{f_{in}}{2}. \] (4.2)

Operating at the same frequency, an RE-0 divider will have higher power consumption than an RE-1 divider because of the larger \( C_L \) resulting from the extra transistor.

Therefore, it can be concluded that an RE-1 divider has superior performance in terms of both propagation delay and power consumption over an RE-0 divider, with the same robustness using ratioless logics.

### 4.3.2 Ratioed Divide-by-2 Divider

For higher operating speed, the ratioed logic gates can be used to further reduce the number of transistors. According to the truth table, NP can be an alternative logic gate for either CP or NC if the ratioed level is properly designed by sizing, and the PN gate can be an alternative for either CN or PC gates. Besides the reduction in the number of transistors, the ratioed logics can also shorten the logic transition times by taking advantage of the ratioed logic level. For instance, if the output of a ratioed logic at the end of a phase is designed to hold a ratioed high, which is lower than the standard supply level, and it will be discharged to the ground level in the next phase, the propagation delay is less than that of using a ratioless logic because the initial voltage level has been made closer to the final level. The lower the ratioed high level, the shorter the transition time becomes.

Along with the benefits for faster transition, the ratioed logic gates also introduce practical design challenges. The most important is the sensitivity to process variation especially the mismatch between NMOS and PMOS transistors. Thus, a ratioed level, high or low, can vary over a wide range, and it is easier to cause incorrect response in the following stages. Moreover, from a power consumption perspective, a ratioed logic can cause direct current from the supply to the ground. Even though the dynamic power is reduced because of the reduction in capacitive loading, this short-circuit power makes a significant contribution to the total power consumption. The static short-circuit power is calculated by

\[ P_{sc} = T_{sc} V_{dd} \] (4.3)

where \( T_{sc} \) is the average direct current in one cycle. Unlike \( P_{sw} \), \( P_{sc} \) can be considered constant for all frequencies.

With different levels of short-circuit power \( P_{sc} \), we study three types of ratioed logic divider structures, RE-2, RE-3 and RE-4, with one, two and three stages of ratioed logics respectively. All of them are based on the fundamental ratioless RE-1 divider given that it is better than the RE-0 type. In an RE-2 divider which has only one stage of ratioed logic, the second stage has a CN stage replaced by a PN stage.
Table 4.3: Three Types of 3-Stage Ratioed DFFs

<table>
<thead>
<tr>
<th>DFF Name</th>
<th>Trigger Edge</th>
<th>1st Stage</th>
<th>2nd Stage</th>
<th>3rd Stage</th>
</tr>
</thead>
<tbody>
<tr>
<td>RE-2</td>
<td>Rising Edge</td>
<td>PC</td>
<td>PN</td>
<td>NC</td>
</tr>
<tr>
<td>RE-3</td>
<td>Rising Edge</td>
<td>PC</td>
<td>PN</td>
<td>NP</td>
</tr>
<tr>
<td>RE-4</td>
<td>Rising Edge</td>
<td>PN</td>
<td>PN</td>
<td>NP</td>
</tr>
</tbody>
</table>

To see the difference, a four-phase analysis can be pursued in a way similar to that in Fig. 4.4 (b). The output of the second stage, node “N”, is at ratioed high in Phase I and it is discharged to ground in Phase II (transition 2.1). Hence the corresponding transition time $t_{2,1}$ is shorter than that of a RE-1 divider, and the total propagation delay of Phase II $t_H$ is also reduced. This improvement is the most effective in the sense that $t_H$ is usually the dominate delay term in (4.1) which limits the maximal input frequency. An RE-2 divider works reliably if the ratioed high level can turn off the pull-up network of the third stage in Phase I. An RE-3 divider is derived from an RE-2 divider by using one more stage of ratioed logic. The replacement can happen at the first stage or the third. But in neither case, the ratioed level can help shorten the transition time as for an RE-2 divider. The selection is therefore mainly according to robustness. If the first stage PC is replaced by a PN logic gate, the ratioed low level must be low enough to turn off the pull-down network of the second stage in Phase III. If the third stage NC is replaced by an NP logic, the ratioed high level needs to turn on the pull-down network of the first stage in Phase II. Practically, the latter condition is easier to satisfy which means the latter structure is more robust. In fact, this is similar to a structure used in [25]. Finally, an RE-4 divider uses ratioed logics for all three stages. It is also known as the E-TSPC divider according to [26]. Obviously, it has the least capacitive loading but consumes the most short-circuit power. As a summary, the structures of the three types of ratioed DFFs, based on which the ratioed dividers are built, are listed in Table 4.3. In the table, the ratioed stages are in italic fonts.

To intuitively investigate the divide-by-2 operations of different dividers, the periodic waveforms of the internal nodes are compared in Fig. 4.5. The data are generated from transient simulations. All simulations use the same sinusoidal signal as clock signals, shown in Fig. 4.5 (a). Fig. 4.5 (b) compares the operations of an RE-2 divider and an RE-1 divider. As expected, the effective transition 2.1 of the RE-2 divider happens earlier which results in a reduced propagation delay $t_H$. This improvement is caused by the ratioed high level. Fig. 4.5 (c) compares the operations of an RE-3 divider and an RE-2 divider. In fact, the two sets of curves almost overlap which means the RE-3 type does not show any salient advantages over the RE-2 type in terms of transition speed. This also matches our analysis. Fig. 4.5 (d) compares an RE-4 divider and an RE-3 divider. The RE-4 divider has faster transition 1.1 and 2.3. However, the reason here is that a two-transistor stack is removed in the PMOS
Figure 4.5: Comparisons of simulated nodal waveforms of different types of dividers. (a) Input clock signal (four phases). (b) RE-2 versus RE-1. (c) RE-3 versus RE-2. (d) RE-4 versus RE-3.
network of the first stage and the capacitive load is significantly reduced.

Generally, the ratioed dividers have faster transition with the penalty of higher power consumption and worse robustness. Among the ratioed dividers, the RE-2 type should be the most power-efficient and the RE-4 type has a slight speed superiority over the RE-2 type.

4.3.3 Divide-by-2/3 Prescaler

Besides the fixed-modulus dividers, dual-modulus prescalers can also be synthesized in a similar way by using more than one TSPC DFFs and additional feedback logic gates. In this work, we discuss the design of divide-by-2/3 prescalers based on different types of TSPC DFFs which have been introduced in the previous two parts. A general diagram of the proposed prescaler is shown in Fig. 4.3 (b) wherein the DFFs can be any type. The two operation modes are switched by the external control signal “MC”. When “MC” is high, the feedback signal from DFF2 is blocked and the prescaler operates in the same way as a divide-by-2 divider. When “MC” is low, the prescaler operates in the divide-by-3 mode. The schematics of the four types of prescalers are shown in Fig. 4.6 (a)-(d).

An advantage of this implementation is that the feedback logic gates, enclosed by the dash-lined box, can be absorbed by the first stage of DFF1 and the logic depth is reduced. In fact, this logic absorption is based on the fact that the single-input single-output TSPC logic family in Fig. 4.1 can be generalized to the multiple-input single-output case. For any CMOS combinational logic, a CC type logic gate can be obtained by placing clocked transistors in series with both the NMOS and the PMOS network, and the other types can be obtained accordingly.

The transistors are sized for the highest operating speed, but the optimal sizes are generally different for the two operation modes. Our strategy is to balance the performance of the two modes with equal maximal input frequencies. The speed-power trade-off is also embodied in the design of prescalers.

4.4 Experimental Results

Prototype designs including both divide-by-2 dividers and divide-by-2/3 prescalers are fabricated in a 65 nm LP CMOS technology to verify our analysis on the speed-power trade-off. All designs are embedded inside a common on-chip fixture. Fig. 4.7 (a) shows the diagram of the on-chip testing configuration with an arbitrary device-under-test (DUT) and the fixture. The two 100 Ω poly-resistors configure the dc level of the input signal, and they set the input impedance roughly to be 50 Ω over a wide band so that minimal amount of power is reflected when the input clock signal is fed by an external signal generator whose source impedance is also 50 Ω. The DUT is followed by three stages of RE-1 type divide-by-2 circuits instead of being connected.
Figure 4.6: The schematics of different types of divide-by-2/3 prescalers. The ratioless type: (a) RE-1 and the ratioed types: (b) RE-2, (c) RE-3 and (d) RE-4.
Figure 4.6: (Continued).
to the output directly. Therefore, the output frequency is scaled down by a factor of 8. This setup makes the output signal less vulnerable to the parasitics such as pad capacitance and cable loss. At the same time, the DUT can be tested in a close-to-reality scenario. In the micrograph of a test chip, Fig. 4.7 (b), we show that the core area of the DUT and the test fixture is very compact so most of the layout area is occupied by the pad structures.

Four types of fixed-modulus divide-by-2 dividers, RE-1, RE-2, RE-3 and RE-4, are implemented. All of them are tested under a constant 1.2 V supply voltage and 293 K room temperature. For the input sensitivity measurements, the input power is swept to find the minimal required level for correct operations. Given that the input impedance of the complete circuit is 50 Ω, the amplitude of the input sinusoidal signal
$V_m$ is related to the input power $P_{in}$ by the formula

$$P_{in}[\text{dBm}] = 10 \log_{10} \left( \frac{(V_m[V])^2}{2 \cdot 50\Omega \cdot 1\text{mW}} \right).$$

(4.4)

The intrinsic input impedance of a TSPC divider is capacitive and can be much higher than 50 $\Omega$ (ten times or more) due to small transistor sizes. Hence, $V_m$ is a better metric for the strength of the input signal than $P_{in}$. However, to make our measurement results comparable to previously published works, $P_{in}$ is still used for the input sensitivity curves. The measurement results for the four types of dividers are plotted together in Fig. 4.8 (a). 0 dBm is set as the upper bound of $P_{in,\text{min}}$ to define the range of the input frequency. From the curves, the ratioed dividers, RE-2, RE-3 and RE-4, demonstrate higher maximal input frequencies than the ratioless divider, RE-1. The input frequency of the RE-4 divider can be as high as 21 GHz which gives the best speed performance over other types. The RE-2 divider ranks the second and has a maximal input frequency of 19 GHz which is only slightly lower than that of the RE-4 divider. The speed of the RE-3 divider is even lower. These results are consistent with the waveform comparisons in Section 4.3. The power consumption of these dividers are also compared in Fig. 4.8 (b) under the common setup that the input power is always set to 0 dBm at different frequencies and the portion from the bias resistors and the divide-by-8 divider is excluded. Again, the results support our analysis. The total power consumption of each divider is approximately a linearly increasing function of the input frequency, and the slope is proportional to the loading capacitance $C_L$ according to (4.2). The slope of the RE-1 divider is the largest and that of the RE-4 divider is the smallest. By extrapolating these lines to the dc end, we can get the value of the frequency-independent short-circuit power which increases from RE-1 to RE-4. Overall, for very high input frequencies, the RE-2 divider gives the best power efficiency and the RE-4 is the worst. In our measurements, the power consumption of the RE-2 divider is only 65% of the RE-4 divider when operating at 15 GHz and 80% when operating at 18 GHz.

Corresponding to the divide-by-2 dividers, the four types of dual-modulus divide-by-2/3 prescalers are also fabricated and tested under the same environment. Two sets of measurements are performed for the two operating modes respectively. The input sensitivity and the power consumption of the divide-by-2 mode (MC=1) are shown in Fig. 4.9 (a) and (b). Those of the divide-by-3 mode (MC=0) are shown in Fig. 4.9 (c) and (d). The comparison among different types of prescalers at each operating mode leads to similar conclusions of the divide-by-2 dividers. The RE-4 prescaler can operate up to 18 GHz for both modes. The RE-2 prescaler follows and has a maximal input frequency of 16 GHz for both modes. The RE-4 prescaler is still the most power consuming though the degradation is less of a concern since there is a further reduction in the dynamic power consumption. Operating at 15 GHz, the power consumption of the RE-2 prescaler is 80% and 75% of that of the RE-4
Figure 4.8: The measurement results of various types of divide-by-2 dividers. (a) Input sensitivity curves. (b) Power consumption with input power of 0 dBm.
prescaler for the divide-by-2 and the divide-by-3 mode respectively. In Fig. 4.10, we also show the output waveforms of the RE-4 prescaler (at the output of the divide-by-8 divider) operating at the two modes respectively by an oscilloscope. The input signals are set to the maximal input frequency 18 GHz.

4.5 Conclusion

We have done a thorough analysis on the design of TSPC dividers based on a general TSPC logic family. Five different types of topologies, RE-0∼4, are synthesized, and their performance is compared in terms of the speed-power trade-off. Prototype designs of the latter four types are fabricated to verify the analysis. The RE-4 (E-TSPC) type gives the best speed performance but consumes the most power. The RE-1 (standard TSPC) types is a robust structure that is the least power consuming but is relatively slow comparing to the ratioed types. The RE-2 type, which is proposed in this work, is a good compromise between speed and power whose maximal input frequency is close to that of the RE-4 type and power consumption does not increase too much from the RE-1 type. To the best of our knowledge, we have demonstrated both the TSPC divide-by-2 dividers and the divide-by-2/3 prescalers with the highest input frequency. The proposed TSPC synthesis technique can find wide use in frequency synthesizer designs for modern wireless applications where both high speed and low power are critical.

4.6 TSPC Logic Truth Table

The truth table of the TSPC logics are listed in Table 4.4.
Figure 4.9: The measurement results of various types of divide-by-2/3 prescalers. (a) Input sensitivity curves and (b) power consumption for the divide-by-2 operation (MC=1). (c) Input sensitivity curves and (d) power consumption for the divide-by-3 operation (MC=0). The input power is set to 0 dBm for power consumption measurements.
Figure 4.9: (Continued).
Figure 4.10: The measured output waveforms of the RE-4 divide-by-2/3 prescaler, further divided by 8, with an input frequency of 18 GHz in (a) the divide-by-2 mode ($MC=1$, $f_{out}=18/16=1.125$ GHz) and (b) the divide-by-3 mode ($MC=0$, $f_{out}=18/24=0.75$ GHz).
Table 4.4: The Truth Table of the TSPC Logics

<table>
<thead>
<tr>
<th>Logic</th>
<th>Type</th>
<th>Input: A</th>
<th>Output: Y</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>$\phi = 1$</td>
<td>$\phi = 0$</td>
</tr>
<tr>
<td>CC</td>
<td>ratioless</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>CN</td>
<td>ratioless</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>CP</td>
<td>ratioless</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>NC</td>
<td>ratioless</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>PC</td>
<td>ratioless</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>NP</td>
<td>ratioed</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>PN</td>
<td>ratioed</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
Part III

CMOS mm-Wave Techniques
Chapter 5

A Layout-Based Optimal Neutralization Technique for mm-Wave Differential Amplifiers

A layout-based optimal neutralization technique is proposed for the designs of mm-wave differential amplifiers. Based on a new layout style which exploits routing signal capacitive coupling, the need for physical neutralization capacitors are obviated which results in compact and robust layout. Experimental prototype designs at 60 GHz and 110 GHz amplifiers demonstrate the utility of the idea by direct comparison with unneutralized designs.

5.1 Introduction

CMOS amplifiers are fundamentals for mm-wave IC designs in silicon technologies. As the operating frequency gets close to the $f_{\text{max}}$ of a transistor [32], special care of topology selection and layout optimization must be taken to maximize the available gain. For the simple common-source (CS) configuration whose model is shown in Fig. 5.1, we can derive the maximal stable gain ($MSG$) as

$$MSG_{\text{se}} \approx \frac{g_m}{\omega C_{gd}} \left\{ [1 - \omega^2 L_s (C_{gs} + C_{ds} + \frac{C_{gs} C_{ds}}{C_{gd}})}ight. $$

$$+ \frac{L_s g_g g_{ds}}{C_{ds}} \right]^2 + \omega^2 L_s^2 g_m + g_g (1 + \frac{C_{ds}}{C_{gd}})$$

$$+ g_{ds} (1 + \frac{C_{gs}}{C_{gd}})^2 \right\}^{-\frac{1}{2}}. \quad (5.1)$$

Reduction of $C_{gd}$ is the bottleneck of the enhancement of $MSG$. Additionally, $L_s$ also adds non-dominant poles and accelerates the roll-off of $MSG$. A differential
configuration consisting of two equal single-ended CS transistors uses cross-coupled capacitors between gates and drains to neutralize $C_{gd}$ and improve $MSG$. Usually the neutralization capacitors are explicitly implemented as MIM or MOM capacitors [33]. This method requires two extra capacitors in the layout. And, it is hard to control accuracy, especially due to the unwanted parasitic inductance associated with the neutralization current path. Moreover, the capacitors occupy extra space and make routing difficult.

In this work, we propose a layout method for differential CS transistors by which neutralization capacitance is intrinsically embedded in the coupling of metal signal wires. Therefore, no extra capacitors are required and the neutralization current path is minimized. The chapter is organized as follows. In section 5.2, we provide a description of this method and show critical design equations. In section 5.3, we discuss practical design issues. Finally in section 5.4, we demonstrate the experimental results of several prototype designs.

## 5.2 Neutralization Technique

In Fig. 5.2, the proposed layout method is shown by a simplified diagram. Two multi-finger transistors are laid out in an interdigitated style. The complete structure can be regarded as a 1-D array of equal cells and the cells are lined up in the same direction as that of signal transmission. Each cell consists of a unit differential pair whose inputs and outputs are connected to four parallel signal buses accordingly. At the top level, the buses for the input and output of the same transistor are placed apart to avoid coupling, such as “$g_1$” and “$d_1$”. But those of different transistors are placed close to each other, such as “$g_1$” and “$d_2$”, and therefore the coupling capacitors can be used as neutralization capacitors.

A model to analyze this layout configuration is shown in Fig. 5.3. Besides the coupling capacitors, this model also considers the self inductance and the mutual inductance of the signal buses. Moreover there is no parasitic source degeneration inductance for the differential-mode operation in this model arising from the source-
sharing configuration in each unit cell. According to this model, the complete MSG expression can be derived as

\[
MSG = \left| \frac{-g_m}{j\omega(C_{gd} - C_n + \Delta \cdot M)} + 1 \right|
\]  

(5.2)

where

\[
\Delta = -\omega^2(C_{gs} + 2C_{gd})(C_{gs} + 2C_n) + g_g g_{ds} \\
+ j\omega g_m(C_{gd} - C_n) \\
+ j\omega(g_g + g_{ds})(C_{gs} + C_{gd} + C_n).
\]  

(5.3)

For physically small structures, \(\frac{1}{\sqrt{MC_n}} \gg 2\pi f_{\text{max}}\) is satisfied, which implies the effect of the mutual inductance is negligible. So (5.2) has a simpler form

\[
MSG = \sqrt{\frac{g_m^2}{\omega^2(C_{gd} - C_n)^2}} + 1.
\]  

(5.4)

Furthermore, the stability factor \(K\) can also be derived:

\[
K = \left[ 1 + \frac{2g_g g_{ds}}{\omega^2(C_{gd} - C_n)^2} \right] \cdot MSG^{-1}.
\]  

(5.5)

Both \(MSG\) and \(K\) monotonically increase to infinity as \(C_n\) approaches \(C_{gd}\) from either side. But the maximal power gain \(G_{\text{max}}\) is different. It simply equals \(MSG\) for \(K < 1\). But when \(K > 1\), it starts to decrease and has a local minimum at \(C_n = C_{gd}\) which is actually the invariant \(U\) function [34]:

\[
U = G_{\text{max}}|_{C_n = C_{gd}} = \frac{g_m^2}{4g_g g_{ds} U - 1}.
\]  

(5.6)

Therefore, the peak \(G_{\text{max}}\) happens when \(K = 1\). This includes two possibilities. If \(C_n < C_{gd}\), or in other words, \(C_{gd}\) is partially neutralized, the ratio between \(C_n\) and \(C_{gd}\) is

\[
n_1 = 1 - \frac{1}{\omega C_{gd}} \sqrt{\frac{g_g g_{ds}}{U - 1}}.
\]  

(5.7)

If \(C_n > C_{gd}\), or \(C_{gd}\) is over neutralized, the ratio is

\[
n_2 = 1 + \frac{1}{\omega C_{gd}} \sqrt{\frac{g_g g_{ds}}{U - 1}}.
\]  

(5.8)
In both cases, the same peak $G_{\text{max}}$ value is achieved:

$$G_{\text{max}1} = G_{\text{max}2} = 2U - 1.$$  \hfill (5.9)

Fig. 5.4 shows how $MSG$, $G_{\text{max}}$ and $K$ vary with the capacitor ratio.

In our model, $g_g$ is converted from a physical resistor in series with $C_{gs}$ so $g_g \propto \omega^2$ which implies

$$G_{\text{max}i} = 2U - 1 \propto g_g^{-1} \propto \frac{1}{\omega^2}, \quad i = 1, 2,$$  \hfill (5.10)

$$|n_i - 1| \propto \omega, \quad i = 1, 2.$$  \hfill (5.11)

As the operating frequency becomes higher, the optimal neutralization scheme gets further away from the unilateral design ($C_n = C_{gd}$). Especially, if $n_1$ calculated from (5.7) is negative, the optimum can be achieved by using over neutralization only.

In practice, the selection of $C_n$ is shifted from the theoretical optimum ($n_1C_{gd}$, $n_2C_{gd}$) in a direction so that the resulted design is even further from the unilateral case. This occurs because $G_{\text{max}}$ drops quickly once $K$ exceeds 1, which can be seen from the slope of $G_{\text{max}}$.

$$\frac{dG_{\text{max}}}{dC_n} = \frac{dMSG}{dC_n} - \frac{MSG}{\sqrt{K^2 - 1}} \frac{dK}{dC_n} = \left\{ \begin{array}{ll} -\infty, & n \to n_{1+} \\
 & +\infty, \quad n \to n_{2-}. \end{array} \right.$$  \hfill (5.12)

### 5.3 Design Approach

From the analysis in section 5.2, it is clear that the design objective is to properly design the layout structure so that $U$ is not lower than a certain level and the capacitor ratio between $C_n$ and $C_{gd}$ is within a desired range. Starting from a simple assumption that the number of cells in line does not affect the performance, we only study the design of a unit cell.

The two critical design variables in a unit cell are the poly-gate finger width, $w_f$, and the spacing between parallel signal buses, $s$, which are labeled in Fig. 5.2. $U$ only depends on $w_f$ by

$$U = \frac{U_0}{1 + \alpha \cdot w_f^2}.$$  \hfill (5.13)

The quadratic term in (5.13) reflects the degradation of $U$ caused by the series poly-gate resistors. For the capacitor ratio, we can use simple relations of $C_n \propto s^{-1}$ and
Figure 5.2: A simplified diagram of the interdigital layout of a differential pair.
Figure 5.3: The differential-mode small-signal model of the proposed layout structure in Fig. 5.2.

\[ \begin{align*}
    \text{Power Gain [dB]} & = n_1 \sqrt{C_{n1}/C_{gd1}} \\
    \text{Stability Factor} K & = 1 @ n_1 < 1, 1 @ n_2 > 1
\end{align*} \]

Figure 5.4: MSG, \( G_{\text{max}} \) and the stability factor \( K \) vary with neutralization capacitor \( C_n \) (Mutual inductance \( M \) ignored).
$C_{gd} \propto w_f$ so

$$n \triangleq \frac{C_n}{C_{gd}} = \frac{\beta}{s \cdot w_f}.$$  \hspace{1cm} (5.14)

Eq. (5.14) ignores fringing effects. Suppose $w_f$ has been decided according to (5.13), then $s$ can be derived from (5.14). The selection of $s$ must be subject to process design rules so the range of $n$ is limited. $\beta$ is another parameter that can be adjusted by changing layout styles. For example, the signal buses can be laid out using multiple metal layers in parallel which induces a larger $\beta$ value.

In Fig. 5.5 (a) and (b), we show the comparisons of $G_{\text{max}}$ between different layout configurations for the partial neutralization designs ($w_f = 1 \mu m$) and the over neutralization designs ($w_f = 0.75 \mu m$) respectively. These figures are generated from post-layout simulation results with extracted $RC$ parasitics. The kinks in the curves result from the fact that the optimum value of $s$ is frequency dependent.

Finally, we make a qualitative study of the effect of the number of cells. In Fig. 5.5 (c), the layouts that use the same unit cell but have different numbers of cells, or equivalently, different numbers of fingers, $n_f$, are compared. At low frequencies, these curves almost overlap. But the structures with higher $n_f$ become unconditionally stable at lower frequencies. This can be explained by the increased series resistance of the signal buses in the direction of signal transmission as more cells are cascaded.

All the comparisons in Fig. 5.5 do not consider inductive coupling of the signal buses because the design kit does not provide the capability of inductance extractions. However, using the $RC$ extraction only can be good enough as long as the physical size of a structure is not too large.

## 5.4 Experimental Results

To verify the proposed neutralization technique, several prototype mm-wave differential amplifiers are fabricated in a 65nm LP CMOS technology.

The first design is a 60 GHz single-stage amplifier. The unit cell parameters are selected as $w_f = 0.75 \mu m$ and $s = 0.47 \mu m$. The signal buses use multi-layer coupling. These correspond to a capacitor ratio of $n = 1.4$. Therefore, this is an over neutralized design. The number of fingers is 32 (16 cells). The input and output matching networks adopt the simple single-stub configuration using 75 $\Omega$ conventional CPW transmission lines. The chip micrograph of this design is shown in Fig. 5.6 (a). This amplifier is designed for direct measurements and the GSSG pads are modeled and included in the matching networks. The differential-mode characteristic is obtained through a 2-port measurement using balun probes. The measured $S$-parameters are summarized in Fig. 5.7 (a). The maximal gain is 10.9 dB at 62.2 GHz and the $MSG$ at the frequency is 13.8 dB. The $S_{11}$ and $S_{22}$ at the peak-gain frequency are $-11.6$ dB and $-20.3$ dB respectively. The difference between $MSG$ and $S_{21}$ is due to the chosen
Figure 5.5: The comparisons of $G_{\text{max}}$ between different layout configurations. (a) The partial neutralization designs ($w_f = 1\mu m$, $nf = 16$ and single-layer coupling). (b) The over neutralization designs ($w_f = 0.75\mu m$, $nf = 16$ and multi-layer coupling). (c) Designs use the same unit cell but different numbers of cells ($w_f = 0.75\mu m$, $s = 0.47\mu m$ and multi-layer coupling). For the neutralized designs ($n > 0$) in (a) and (b), $s = s_{\text{min}} + \Delta s$ and $s_{\text{min}}$ is the minimal metal spacing defined by the process.
stability factor, $K = 1.15$. The gate and drain bias voltages are 0.65V and 1.2V. The total power consumption is 13 mW. As a reference, we also fabricate a single-ended CS transistor with the same size ($w_f = 0.75\mu m$ and $n_f = 32$) and measure under the same bias conditions. The measured $MSG$ is also plotted in the same figure which is 9.4 dB, 4.4 dB lower than the neutralized design. These data verify the effectiveness of the proposed neutralization technique because this differential amplifier has superior power gain over all of the unconditionally stable ($K > 1$) single-ended counterparts.

The second design is a 60 GHz two-stage amplifier and each stage is the same as the first design. The chip micrograph is shown in Fig. 5.6 (b). The two stages are directly connected through AC coupling capacitors. The measurement setup is the same as that of the first design and the measurement results are plotted in Fig. 5.7 (b). The maximal gain is 18.5 dB at 61.6 GHz and the $MSG$ at the frequency is 30.1 dB. $S_{11}$ and $S_{22}$ at the peak-gain frequency are $-11.5$ dB and $-15.7$ dB respectively. The stability factor $K$ is 6.6. The bias conditions are the same as that of the first design and the total power consumption doubles.

The third design is a 110 GHz single-stage amplifier. The unit cell parameters are $w_f = 0.75\mu m$ and $s = 0.45\mu m$. The signal buses use multi-layer coupling. Then the capacitor ratio is about $n = 1.4$ and it is also an over neutralized design. The number of fingers are 16 (8 cells). The chip micrograph is shown in Fig. 5.6 (c). A de-embedding technique using on-chip baluns are applied to obtain the differential-mode 2-port parameters [35]. The de-embedded $S$-parameters are summarized in Fig. 5.7 (c). Limited by the frequency range of the VNA, we can only obtain data up to 110 GHz. At 110 GHz, the gain is 7.8 dB with $MSG$ of 11.6 dB. $S_{11}$ and $S_{22}$ are $-15.1$ dB and $-14.4$ dB. $K$ is 1.24. The gate and drain bias voltages are 0.8V and 1.2V. The total power consumption is 11 mW. A reference single-ended transistor with the same size ($w_f = 0.75\mu m$ and $n_f = 16$) and bias conditions has a $MSG$ of 7.2 dB at 110 GHz, 4.4 dB lower than the neutralized design. The measured $MSG$ enhancement by using neutralization agrees with the value predicted by (5.4).

## 5.5 Conclusion

A layout-based neutralization technique has been proposed for the designs of mm-wave differential amplifiers. The neutralization capacitors are realized directly from extrinsic transistor signal line coupling, obviating the need for extra capacitors. Several prototype designs demonstrate the proposed technique and measurement results of the amplifiers confirm the theory and effectiveness of the approach. An improvement of 4.4 dB in $MSG$ is observed in the measurements, corresponding exactly with the theoretical value based on the amount of over neutralization.
Figure 5.6: The chip micrographs of (a) a 60 GHz single-stage amplifier, (b) a 60 GHz two-stage amplifier and (c) a 110 GHz single-stage amplifier with on-chip baluns for de-embedding.
Figure 5.7: The $S$-parameters of (a) a 60 GHz single-stage amplifier obtained from direct measurements, (b) a 60 GHz two-stage amplifier obtained from direct measurements and (c) a 110 GHz single-stage amplifier obtained from de-embedding.
Chapter 6

The “Load-Thru” (LT) De-embedding Technique for the Measurements of mm-Wave Balanced 4-Port Devices

The differential-mode behavior of a balanced 4-port device can be characterized by simple 2-port measurements if baluns are placed at both the input and the output. But the traditional insertion loss technique is not able to fully de-embed the baluns. Therefore, we propose the “load-thru” de-embedding technique which uses the differential-mode characteristics of a balun to fully extract the complete differential-mode behavior of the DUT. Theoretical analysis and mm-wave measurement verifications are provided.

6.1 Introduction

In SOC designs, differential signaling has well-known advantages such as superior noise immunity. Unfortunately, characterization of a 4-port device, especially at mm-wave frequencies, is not simple. An accurate and complete characterization requires a dual-source VNA that can generate pure-mode drives [36], and this is very costly. However, if only the differential-mode (DM) behavior is of interest, on-chip baluns can be placed at both the input and the output of a device-under-test (DUT) as common-mode (CM) blockers and the differential-mode behavior can be investigated by 2-port measurements. The traditional “insertion loss” (IL) technique is usually used to de-embed the baluns, or more precisely, to compensate the insertion loss introduced by the two baluns. The IL technique needs only one extra de-embedding structure wherein two baluns are back-to-back connected. Though simple, the IL technique is strictly restricted to the scenario where all connection ports are power
matched.

In this work, we will propose the “load-thru” (LT) de-embedding technique that can fully characterize a balun so that the complete differential-mode 2-port parameters of a symmetric 4-port DUT can be extracted. The chapter is organized as follows. In section 6.2, we introduce the procedure of the LT de-embedding technique and give a theoretical analysis. In section 6.3, we discuss practical design considerations that can make the de-embedded results more accurate. Finally in section 6.4, measurement results of a mm-wave differential amplifier are used to verify the theory.

6.2 De-embedding Theory

6.2.1 DM and CM Separation

The signaling in all of our measurement setups is represented in mixed-mode parameters [37]. The proposed LT de-embedding technique relies on the condition that the DM and the CM operations are completely separated.

The most critical element, a balun, is a 3-port device. Let port “1” denote the unbalanced port connected to the pad and port “2” and “3” denote the balanced ports connected to the DUT. Then the signals at port “2” and “3” can be separated to DM and CM signals by introducing two linear transformation matrices, $K_B^V$ and $K_B^I$:

$$
\begin{align*}
\begin{bmatrix}
V_1 \\
V_d \\
V_c
\end{bmatrix} &= \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & -1 \\
0 & \frac{1}{2} & \frac{1}{2}
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2 \\
V_3
\end{bmatrix},
\end{align*}
$$

$$
\begin{align*}
\begin{bmatrix}
I_1 \\
I_d \\
I_c
\end{bmatrix} &= \begin{bmatrix}
1 & 0 & 0 \\
0 & \frac{1}{2} & -\frac{1}{2} \\
0 & 1 & 1
\end{bmatrix}
\begin{bmatrix}
I_1 \\
I_2 \\
I_3
\end{bmatrix}.
\end{align*}
$$

Correspondingly, the balun can be characterized by the mixed-mode parameters. If
Figure 6.1: The mixed-mode models of (a) an ideal balun and (b) a balanced 4-port device.

$Y$-parameters are used, one can derive $Y^{B(m)} = K^{B}_{D} Y^{B}(K^{B}_{D})^{-1}$:

\[
Y^{B(m)} = \begin{bmatrix}
Y^{B}_{11} & Y^{B}_{1d} & Y^{B}_{1c} \\
Y^{B}_{d1} & Y^{B}_{dd} & Y^{B}_{dc} \\
Y^{B}_{c1} & Y^{B}_{cd} & Y^{B}_{cc}
\end{bmatrix} = \begin{bmatrix}
Y^{B(d)} \\
Y^{B}_{cc}
\end{bmatrix}.
\] (6.3)

For an ideal balun, the 4 boxed terms of $Y^{B(m)}$ will vanish and $Y^{B(m)}$ becomes block-diagonal. The CM port is then independent of other ports. As shown in Fig. 6.1 (a), a mixed-mode model of an ideal balun consists of a 2-port network and an isolated 1-port network.

A similar transformation can also be applied to a 4-port device. Both the input signals at port “1” and “2”, and the output signals at port “3” and “4”, are separated to DM and CM signals. The voltage and current transformation matrices are $K^{D}_{P}$ and $K^{P}_{D}$ respectively.

\[
\begin{bmatrix}
V_{d1} \\
V_{d2} \\
V_{c1} \\
V_{c2}
\end{bmatrix} = \begin{bmatrix}
1 & -1 & 0 & 0 \\
0 & 0 & 1 & -1 \\
\frac{1}{2} & \frac{1}{2} & 0 & 0 \\
0 & 0 & \frac{1}{2} & \frac{1}{2}
\end{bmatrix} \begin{bmatrix}
V_{1} \\
V_{2} \\
V_{3} \\
V_{4}
\end{bmatrix}.
\] (6.4)
\[
\begin{bmatrix}
I_{d1} \\
I_{d2} \\
I_{c1} \\
I_{c2}
\end{bmatrix} = \begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} & 0 & 0 \\
0 & 0 & \frac{1}{2} & -\frac{1}{2} \\
1 & 1 & 0 & 0 \\
0 & 0 & 1 & 1
\end{bmatrix} \begin{bmatrix}
I_1 \\
I_2 \\
I_3 \\
I_4
\end{bmatrix}.
\] (6.5)

The original 4-port parameters, \(Y^D\), can be converted to the mixed-mode parameters by 

\[
Y^{D(m)} = K^D_Y Y^D (K^D_Y)^{-1}
\]

which gives

\[
Y^{D(m)} = \begin{bmatrix}
Y^D_{d1d1} & Y^D_{d1d2} & Y^D_{d1c1} & Y^D_{d1c2} \\
Y^D_{d2d1} & Y^D_{d2d2} & Y^D_{d2c1} & Y^D_{d2c2} \\
Y^D_{c1d1} & Y^D_{c1d2} & Y^D_{c1c1} & Y^D_{c1c2} \\
Y^D_{c2d1} & Y^D_{c2d2} & Y^D_{c2c1} & Y^D_{c2c2}
\end{bmatrix} = \begin{bmatrix}
Y^{D(d)} \\
Y^{D(c)}
\end{bmatrix}.
\] (6.6)

For a perfectly symmetric 4-port device, port “1” is symmetric with port “2” and port “3” is symmetric with port “4”. The 8 boxed terms of \(Y^{D(m)}\) equal zero which means the DM and the CM signals are separable. In Fig. 6.1 (b), we show the mixed-mode model of a balanced 4-port device comprised of one DM 2-port network and one CM 2-port network.

### 6.2.2 De-embedding Formula

Under the condition that the two modes are separable for both the balun and the DUT, we consider the 2-port measurement setup in Fig. 6.2 (a) and depict its mixed-mode model in Fig. 6.2 (b). From the model, one can see that the CM operation of the DUT is completely suppressed since all external stimuli are converted to DM signals only. The setup is simplified to three cascaded 2-port networks. Suppose we have known the DM 2-port parameters of a balun, the DM \(ABCD\)-parameters \(A^{D(d)}\) of the DUT can be de-embedded from the measured data \(A^M\) using

\[
A^{D(d)} = (A^{B(d)})^{-1} A^M \begin{bmatrix}
1 & 0 \\
0 & -1
\end{bmatrix} A^{B(d)} \begin{bmatrix}
1 & 0 \\
0 & -1
\end{bmatrix}.
\] (6.7)

\(A^{B(d)}\) is the DM \(ABCD\)-parameters of a balun and we have used the fact that the second balun is mirror symmetric with the first one.
6.2.3 Characterization of the Balun

The de-embedding equation (6.7) requires that $A_{B(d)}$ is known. Shown in Fig. 6.3, we propose a measurement setup to characterize the DM behavior of a balun wherein $Z_T$ is supposed to be a known impedance. $A_{B(d)}$ includes three unknowns, but one measurement only provides two independent equations considering the two ports are symmetric. Therefore, we need at least two measurements with different values of $Z_T$.

The first measurement uses a finite $Z_T$ and it is called the “load” structure. The measured data are

$$Y^L = \begin{bmatrix} Y_{11}^L & Y_{12}^L \\ Y_{21}^L & Y_{22}^L \end{bmatrix}.$$  \hspace{1cm} (6.8)

The second measurement uses a trivial setup, $Z_T = \infty$. In fact, the structure degenerates to two back-to-back connected baluns and it is called the “thru” structure. The measured data are

$$Y^T = \begin{bmatrix} Y_{11}^T & Y_{12}^T \\ Y_{21}^T & Y_{22}^T \end{bmatrix}.$$  \hspace{1cm} (6.9)

With (6.8) and (6.9), we are able to derive analytical expressions for the elements of
Figure 6.3: (a) The measurement setup of two back-to-back baluns with termination loads $Z_T$ in the middle. (b) The equivalent mixed-mode model of the setup.

the DM $Y$-parameters of a balun, $Y^{B(d)}$. 

$$
Y^{B}_{11} = Y^{L}_{11} - Y^{L}_{12}, \quad \text{(6.10)}
$$

$$
Y^{B}_{dd} = \frac{Y^{L}_{12}}{4Z_T(Y^{L}_{12} - Y^{T}_{12})}, \quad \text{(6.11)}
$$

$$
Y^{B}_{d1} = Y^{B}_{1d} = -\sqrt{\frac{Y^{L}_{12}Y^{T}_{12}}{2Z_T(Y^{L}_{12} - Y^{T}_{12})}}, \quad \text{(6.12)}
$$

In (6.12), the square root operation selects the root that has a non-negative real part. Therefore, $A^{B(d)}$ can be obtained by converting the calculated $Y$-parameters to $ABCD$-parameters.

6.2.4 Characterization of $Z_T$

The last question is how to characterize $Z_T$. This can be accomplished by using the 1-port version of the “open-short” de-embedding technique [38]. The three required structures are shown in Fig. 6.4 and the 1-port measurement results for the “open”, “short” and “termination” structures are denoted by $Y^O$, $Y^S$ and $Y^{Z_T}$ respectively.
Figure 6.4: The setup for the characterization of $Z_T$ using 1-port open-short de-embedding. The lumped-element model for the de-embedding parasitics are depicted inside the dashed-line box.

The formula to calculate $Z_T$ is

$$Z_T = \frac{1}{Y_T - Y_O} - \frac{1}{Y_S - Y_O}. \quad (6.13)$$

### 6.3 Design Consideration

The above analysis has assumed an ideal balun whose CM port is isolated. For a real balun, signals at the unbalanced port and the DM port can leak to the CM port. This is called mode-conversion. However we can optimize the balun design so that mode-conversion is limited to a satisfying level. The requirement for mode-conversion depends on the CM gain of the DUT. To investigate the mode-conversion level of a balun, we need to study its mixed-mode $S$-parameters $S^{B(m)}$:

$$S^{B(m)} = \begin{bmatrix} S^B_{11} & S^B_{1d} & S^B_{1c} \\ S^B_{d1} & S^B_{dd} & S^B_{dc} \\ S^B_{c1} & S^B_{cd} & S^B_{cc} \end{bmatrix}. \quad (6.14)$$

The mode-conversion level is evaluated by calculating $\left|\frac{S^B_{d1}}{S^B_{dd}}\right|$ and $\left|\frac{S^B_{d1}}{S^B_{cc}}\right|$. Notice that the computation of the $S$-parameters relies on the selection of port impedance [39]. Therefore, the port impedance needs to reflect the real port connection situation which can be different from that of a standard 50 $\Omega$ system.

Fig. 6.5 shows the 3-D structure of the balun that is used in our prototype design. The two coils use different thick metal layers and they completely overlap to maximize magnetic coupling. The metal overlapping also results in undesirable capacitive coupling, a source of mode conversion. There are two adjustable geometric parameters for the balun: the coil diameter and the trace width. To select the best design, various baluns are simulated using a 3-D EM simulation tool. Their the mode conversion levels are plotted and compared in Fig. 6.6. In general, a balun with smaller diameter and narrower width introduces less mode conversion but it suffers
Figure 6.5: The 3-D balun structure.

Figure 6.6: The simulated mode-conversion level $|S_{cd1}/S_{d1}|$ and the insertion gain $S_{d1}$ of baluns with different diameters (fixed $W = 0.5 \mu m$) and different widths (fixed $D = 20 \mu m$).
from higher insertion loss, equivalently, lower $|S_{d1}^B|$. High insertion loss can make the whole structure sensitive to noise either from measurement operations or instruments. Hence, a good balun design is required to balance the two requests.

6.4 Measurement Verification

A 60 GHz differential amplifier has been fabricated in a 65nm LP CMOS technology to demonstrate the LT de-embedding technique. The chip micrograph is shown in Fig. 6.7 (a). The input and output matching networks applies single-stub matching method using conventional 75 $\Omega$ CPW transmission lines. Two baluns in mirror symmetric layout are placed at the input and output of the amplifier. The geometric parameters of the baluns are: $D = 20 \mu m$ and $W = 0.5 \mu m$. According to the simulation data in Fig. 6.6, $|S_{Bc1}^B|$ and $|S_{Bd1}^B|$ are $-20$ dB and $-35$ dB respectively, and $|S_{d1}^B|$ is $-12$ dB. The chip micrograph of the “load” and the “thru” de-embedding structures are shown in Fig. 6.7 (b) and (c). The connecting points from a balun to the DUT also use the same kind of CPW lines to avoid discontinuity. The load impedance $Z_T$ is implemented using a polyresistor with a value of about 80 $\Omega$.

In order to verify the validity of the LT de-embedding technique, the same amplifier with the input and the output port connected to GSSG pads is also fabricated on the same wafer. The pads are modeled and embedded into the matching networks. It is measured by a balun probe to obtain its DM behavior directly. These direct measurement data are compared to the de-bedded data. In Fig. 6.8, the $S$-parameters obtained from both methods are plotted together. Both magnitudes and phases show good agreement over wide frequency range.

6.5 Conclusion

We have proposed the LT de-embedding technique for the characterization of the DM performance of balanced 4-port devices. It requires total of 5 de-embedding structures besides the DUT structure. The de-embedding procedure can be summarized in 3 steps:

1. Measure the 1-port de-embedding structures and extract $Z_T$ using (6.13).

2. Measure the “load” and the “thru” structure. Compute $Y^{B(d)}$ using (6.10)-(6.12) and convert it to $A^{B(d)}$.

3. Measure the DUT structure to obtain $A^M$, then compute $A^{D(d)}$ according to (6.7).
Figure 6.7: The micrograph of (a) the 60 GHz differential amplifier with input and output baluns, (b) the “load” de-embedding structure and (c) the “thru” de-embedding structure.
Figure 6.8: The de-embedded differential-mode $S$-parameters of a differential amplifier using the LT method (solid line) are compared with the directly measured data by using a balun probe (dashed line). (a) Magnitudes and (b) phases.
Part IV

Conclusion
This research work has covered CMOS integrated circuit design techniques at 10 GHz, 60 GHz and up to 110 GHz. Individual building blocks including low-noise amplifiers, voltage-controlled oscillators, high-frequency true-single-phase-clock frequency dividers, and mm-wave amplifiers are studied thoroughly using both theoretical analysis and practical circuit designs. Related fundamental techniques, such as MOS device modeling and de-embedding techniques, are also explored. Furthermore, as a prototype of system-level integration, a Ku-band LNB front-end is implemented for the application of satellite receivers.

We have shown various techniques for different circuitries when the operating frequencies approach the cut-off frequencies of the MOS devices. In general, these techniques offer trade off among speed, noise, power and circuit complexity (area). Therefore, careful optimization, which is also discussed in this work, is required to get balanced performance in every aspect. The problems will not be automatically solved with technology scaling developments since not everything improves, which means that these topics will continue to be meaningful and relevant in the future.
Bibliography


