Distributed Circuits for Ultra-Wideband Communications Systems

A dissertation submitted in partial satisfaction of the requirements for the degree
Doctor of Philosophy

in

Electrical Engineering (Electronic Circuits and Systems)

by

Kelvin Caiwen Fang

Committee in charge:

Professor James Buckwalter, Chair
Professor Peter Asbeck, Co-Chair
Professor Gert Cauwenberghs
Professor William Hodgkiss
Professor Patrick Mercier

2018
The Dissertation of Kelvin Caiwen Fang is approved, and it is acceptable in quality and form for publication on microfilm and electronically:

________________________________________

________________________________________

________________________________________

________________________________________

Co-Chair

________________________________________

Chair

University of California San Diego

2018
DEDICATION

To my loving family, friends, and all those who have supported, inspired, and pushed me to succeed. I am stronger because of you.
It is important to draw wisdom from different places.
If you take it from only one place it becomes rigid and stale.
—Uncle Iroh

The wireless telegraph is not difficult to understand. The ordinary telegraph is like a very long cat. You pull the tail in New York, and it meows in Los Angeles. The wireless is the same, only without the cat.
—Albert Einstein
# TABLE OF CONTENTS

Signature Page .......................................................................................... iii
Dedication ................................................................................................... iv
Epigraph ...................................................................................................... v
Table of Contents ....................................................................................... vi
List of Figures ............................................................................................ viii
List of Tables .............................................................................................. xii
Acknowledgements ...................................................................................... xiii
Vita ............................................................................................................. xvi
Abstract ...................................................................................................... xviii

## Chapter 1
Introduction: Ultra-Wideband Integrated Circuits for High-Speed Communications
1.1 Distributed Amplifier History and Architectures .................................. 4
   1.1.1 Evolution of Traveling-Wave Tube Amplifiers .......................... 4
   1.1.2 Silicon State-of-the-Art in Distributed Amplification .......... 6
1.2 Constant-$k$ Artificial Transmission Lines ........................................... 8
1.3 Distributed Power Amplifiers .............................................................. 9
1.4 Wideband Active Quasi-Circulators ................................................. 10
1.5 Millimeter-Wave Distributed Transceivers ....................................... 11
1.6 Dissertation Organization ................................................................ 12

## Chapter 2
Supply-Scaling for Efficiency Enhancement in Distributed Power Amplifiers
2.1 Distributed Amplifier Efficiency Limitations ......................................... 14
   2.1.1 Uniform Distributed Amplifier ............................................. 17
   2.1.2 Optimal Loadline Matching ............................................... 18
2.2 Efficiency Enhancement through Supply Scaling ............................... 21
2.3 Design Methodology of Band-Pass DA ............................................. 23
   2.3.1 Passive Element Design .................................................. 24
# LIST OF FIGURES

| Figure 1.1: | Frequency content of a sub-millimeter Gaussian pulse. | 2 |
| Figure 1.2: | Communications applications at mm-wave. | 3 |
| Figure 1.3: | Schematic of the first traveling-wave tube amplifier with parasitic absorption and frequency-modulated vacuum tube switching [1]. | 4 |
| Figure 1.4: | Reported gain-bandwidth performance of integrated DAs in various technologies over the past two decades [2]. | 7 |
| Figure 1.5: | Schematic of an a) $L$-network half-section and b) low-pass constant-$k$ $T$-section for DA parasitic absorption. | 8 |
| Figure 2.1: | Schematic of a conventional low-pass DA. | 14 |
| Figure 2.2: | Frequency variation in real and imaginary collector impedances in a uniform DA for $\theta = 0.36 - 2.79$. | 17 |
| Figure 2.3: | Successive collector voltage magnitudes versus transmission line electrical length in a uniform DA. | 19 |
| Figure 2.4: | Simulated collector voltage magnitudes of a stage-scaled DA with 1.1 impedance scaling and 0.96 current scaling. | 21 |
| Figure 2.5: | Loadlines of subsequent stages in a DA under supply-scaling and impedance tapering schemes. Whereas successive stages maintain constant voltage swing under impedance tapering, a supply-scaled architecture matches the optimum loadlines through independent voltage biasing. | 22 |
| Figure 2.6: | Peak collector efficiency of an ideal supply-scaled DA as a function of number of supplies. | 24 |
| Figure 2.7: | BEOL stackup and cross-section of microstrip $T$-line in 90-nm SiGe BiCMOS process. | 25 |
| Figure 2.8: | Synthesized band-pass transmission line including inductor parasitics. | 26 |
| Figure 2.9: | Simulated shunt inductor Q factor after absorption of capacitive elements into constant-$k$ section. | 27 |
| Figure 2.10: | Input capacitance and sensitivity to degeneration resistance $R_E$ of a 6$\mu$m/0.09$\mu$m HBT. | 29 |
| Figure 2.11: | Bandwidth-efficiency tradeoff of the DA versus gain stage degeneration resistance $R_E$. | 30 |
| Figure 2.12: | Simulated gain as a function of number of band-pass DA stages. | 30 |
| Figure 2.13: | Simulated optimal PAE versus number of independent scaled supplies for an 8-stage DA. | 31 |
Figure 2.14: Simulated collector voltage magnitudes in the 8-stage supply-scaled DA. .......................................................... 33
Figure 2.15: Schematic of the fabricated 8-stage supply-scaled DA. ...... 33
Figure 2.16: Chip microphotograph of the supply-scaled DA. ............... 34
Figure 2.17: Measured and simulated S-parameters and $\mu$ stability factor of the supply-scaled DA. ............................................. 35
Figure 2.18: Measured and simulated power gain, collector efficiency, and power-added efficiency at 50 GHz. ................................. 36
Figure 2.19: Measured and simulated output power, 1 dB gain-compressed power, and peak CE and PAE over the operating bandwidth. ... 38
Figure 2.20: Measured power gain, CE, and PAE at 50 GHz, and peak CE and PAE across the bandwidth, of the DA with uniform supply biasing. 39

Figure 3.1: Schematic of an FDD/FD radio front-end. Simultaneous transmit and receive is enabled by an isolating duplexer/circulator. ......... 44
Figure 3.2: Schematic of the tunable distributed active quasi-circulator. ... 46
Figure 3.3: Layout of the a) series and b) shunt spiral inductors for the distributed band-pass $T$-section. ................................. 48
Figure 3.4: Schematics of the (a) varactor stack and (b) stacked-FET switched capacitor. $V_{ctrl}$ ranges from -1.5 to 1.5V. $V_{sw,k}$ turns the $k^{th}$-bit capacitor on/off. .......................................................... 50
Figure 3.5: Simulated large-signal TX-RX isolation over increasing shunt varactor capacitance range. ........................................ 51
Figure 3.6: Chip microphotograph of the distributed active QC. .......... 53
Figure 3.7: Measured TX-RX isolation over swept switched capacitor codes (solid) and varactor control voltage (dashed). ..................... 54
Figure 3.8: Measured (solid) and simulated (dashed) worst-case S-parameters and noise figure of the QC. ................................. 55
Figure 3.9: Measured and simulated power, efficiency, and TX-RX suppression. 55

Figure 4.1: BEOL stackup of the 45-nm RF CMOS SOI process. ............ 59
Figure 4.2: Simulated $Z_0$ and Q factor of the microstrip transmission line. 59
Figure 4.3: Simulated $H_{21}$ and $G_{umx}$ (maximum unilateral power gain) at $|V_C| = 550$ mV and $NF_{min}$ at $|V_C| = 450$ mV of a $32\times1.0$ $\mu$m (a) NMOS and (b) PMOS FET for $|V_D| = 1.0$ V. ................................. 61
Figure 4.4: Schematics of the designed 6-stage (a) NMOS and (b) hybrid SSDAs. .......................................................... 62
Figure 4.5: Simulated Q factor and shunt capacitance of the designed 400 pH spiral inductor on HR and previous LR substrates. ............ 64
Figure 4.6: Simulated Q factors and series capacitances of the 160 and 320 fF HQ-MIM capacitors. .......................................................... 64
Figure 4.7: Simulated $S_{21}$ of the designed band-pass constant-$k$ T-section with real (solid) and ideal (dashed) high-pass filter components. .... 65
Figure 4.8: Simulated input and output capacitances of the NMOS and PMOS cascode gain stages. ................................................. 66
Figure 4.9: Simulated NMOS DA gain versus number of stages $N$. .......... 66
Figure 4.10: Simulated NMOS DA gain sweeping uniform $V_{DD}$ from 1.5 to 2.4 V, with cascode gate voltage $V_{Cas} = V_{DD}/2 + 0.55$ V. ............ 68
Figure 4.11: Small-signal and equivalent noise model of a CMOS cascode gain stage with interstage inductance and input/output capacitances absorbed (gray) into artificial transmission lines. ......................... 68
Figure 4.12: Simulated NMOS DA gain and noise figure sweeping ideal $L_i$ from 0 to 24 pH. ...................................................... 70
Figure 4.13: Simulated gain, $P_{sat}$, PAE, and $OIP_3$ at 50 GHz versus number of PMOS stages in a 6-stage hybrid SSDA. ....................... 71
Figure 4.14: Simulated input (solid) and output (dashed) capacitances versus gate voltage swing for the NMOS (blue) and PMOS (red) gain stages. .......................................................... 72
Figure 4.15: Simulated $OIP_3$ of the hybrid SSDA (solid) and $g_{m3}$ (dotted) versus gate voltage bias for the NMOS (blue) and PMOS (red) cascode gain cells. ................................................. 73
Figure 4.16: Schematic of the fabricated distributed transceiver front-end. The PA path amplifies signals from TX to ANT, and the LNA from ANT to RX ports. ...................................................... 74
Figure 4.17: Simulated PA/LNA gain in TX/RX mode ($V_G = 550/450$ mV, $V_D = 2.4/1.8$ V), sweeping combinations of TX CG and RX CS FET widths. ...................................................... 75
Figure 4.18: Simulated settling times of the DTFE LNA gain stages with zero (red) and -5 dBm (blue) input in response to switched $V_{dd}$. .... 76
Figure 4.19: Chip microphotographs of the (a) NMOS and (b) hybrid SSDAs (2.80 mm x 0.52 mm) and (c) distributed transceiver front-end (2.80 mm x 0.65 mm). ...................................................... 78
Figure 4.20: Measured (red) and simulated (black) $S$-Parameters and noise figure (blue) of the (a) NMOS and (b) hybrid SSDAs and DTFE (c) PA and (d) LNA paths. ...................................................... 79
Figure 4.21: Measured group delay of the SSDAs and DTFE PA/LNA. .... 81
Figure 4.22: Measured gain (dashed) and power-added efficiency (marker) of the SSDAs and DTFE PA/LNA modes at 50 GHz. 82
Figure 4.23: Measured saturated output power (solid), 1 dB gain-compressed output power (dotted), and peak PAE (dashed) over the operating bandwidth. 83
Figure 4.24: Two-tone linearity measurement of the SSDAs and DTFE LNA at 50 GHz. 84
Figure 4.25: Measured $OIP_3$ and 3 dB back-off phase distortion over the operating bandwidth. 85
Figure 4.26: AM-PM linearity measurement of the SSDAs and DTFE PA at 50 GHz. 85
Figure 4.27: Measurement setup for 5 GHz wideband 16-QAM modulation. 86
Figure 4.28: Measured EVM with respect to constellation peaks for 5 GHz 16-QAM signals centered at 42.5 (solid) and 47.5 (dashed) GHz. 86
Figure 4.29: Measured constellations for 5 GHz 16-QAM signals centered at (a) 42.5 and (b) 47.5 GHz for the (i) NMOS and (ii) hybrid SSDAs and (iii) DTFE PA. 88
LIST OF TABLES

Table 2.1: Comparison To Other Published DAs .......................... 41
Table 2.2: Comparison To Other Published Wideband mm-Wave PAs  .... 42
Table 3.1: Relative RX Noise Power Magnitudes .......................... 52
Table 3.2: Comparison To Other Published Circulators ...................... 57
Table 4.1: Comparison To Published Silicon Distributed and Wideband Amplifiers .......................... 91
ACKNOWLEDGEMENTS

From the beginning of Ph. D. recruitment, Prof. Buckwalter has played a prominent role in my development as a professional circuits designer. His advice on subjects ranging from amplifiers and mixers to manuscripts and interviews has always garnered my utmost appreciation. Even following his move to Santa Barbara, I have felt that I could reach out at any time with questions about work and life. I am grateful for the years of guidance.

I would like to thank Prof. Asbeck for “adopting” me into his group in the second year of my Ph. D. Through our meetings and discussions, many new ideas were formed, and much candy was consumed. His genuine excitement for device engineering and PA design is contagious, and his constant cheerfulness made the journey of graduate school a considerably happier one.

I would also like to thank the members of my thesis committee Profs. Cauwenberghs, Hodgkiss, and Mercier for their time and support. Their feedback throughout the preparation and preliminary reviews of my Ph. D. research are greatly appreciated.

I am grateful to my friends and colleagues in the UCSD HSIC and HSDG research groups, especially Drs. Cooper Levy, Bagher Rabet, Narek Rostomyan, Po-Han Wang, Voravit Vorapipat, Mustafa Özen, Jefy Jayamon, Varish Diddi, Cheng-Kai Luo, Najme Ebrahimi, Tissana Kijsanayotin, Po-Yi Wu, Jun Li, Saeid Daneshgar, Mohammad Mehrjoo, Wei Wang, and Vincent Leung. Their expertise and influence on my graduate and post-graduate career cannot be overstated. It has been my honor to work alongside them these past years.

I would like to acknowledge the support of the NSF through the GRFP for allowing me flexibility and autonomy in my Ph. D. research. Additionally, support from the UCSD Powell fellowship program is appreciated.

There are a myriad of people outside the laboratory setting who are responsible for guiding me up to this point and who I foresee will be alongside me for years.
to come. First and foremost, my thanks do not begin to pay Mom and Dad back for raising and supporting me through the last 27 years and providing me every opportunity a child could ask for. Their wisdom and empathy have always produced sound advice (“Mom is always right”), and I strive to emulate their approach to navigating the rest of life’s trials. I am also indebted to my sister Shirley, who has been my cheerleader and confidante for as long as I can remember. Though we may now live on separate coasts, our sibling bond is one I continuously fall back on in both the best and worst times.

I could not dream of a better group of friends than the one I found in Lloyd House, Caltech Class of 2013. In particular, Joel Xu, Steven Okai, Shir Aharon, Reggie Wilcox, Melissa Xu, Yifei Huang, Mario Zubia, Seorim Song, and Ahalya Prabakhar have been like family to me for the past decade. Their intellect and diverse classes of thought have continuously encouraged and challenged me, and I have become the man I want to be due in no small part to our plethora of philosophical (and not-so-philosophical) discussions. At the same time, my friends have been a constant source of support and entertainment, and I hope our lives continue to be intertwined. I am also thankful for my “SD Frandz” (now “Bay Area, soon to be LA, Frandz”) Ali Ebrahim, Stephany Lai, and Dan Pipe-Mazo, with honorary members Jennifer Fang and Ernest Lee. They made me feel instantly welcome upon my arrival at UCSD, and I will always hold dear the memories of SD Zoo and Safari Park trips, Ernest Specials, and cheering for Korrasami. My roommates at “Brolympus”, especially Julian Warchall, Paolo Gabriel, Brandon Robinson, Robert Kaspar, and honorary member John Louie have my gratitude as well. Our board and video game nights, followed typically by Rigo’s runs, were a welcome break from graduate study, and it was a comfort to also work alongside many of them in the ECE department. I would like to give a special acknowledgement to my peers in electrical engineering Angad Rekhi and Angie Wang for offering unique perspectives on circuits design from Stanford and UC Berkeley, respectively, and sharing in the Ph. D. journey.
Additionally, I would like to thank David Pereira, Andrew Shu, and PJ Loury for their many years of friendship and for being my foundation during our childhoods.

Finally, I would like to give my heartfelt gratitude to my girlfriend, Esha Wang. Since we began dating, she has been my first responder in times of elation and melancholy. Her support, encouragement, and respect have been a driving force in the completion of this thesis and its contents and the procurement of internship and job offers. Her proofreading of countless manuscript and email drafts and expertise in \LaTeX{} and Adobe Photoshop are also greatly appreciated. Outside of work, her drive and sense of discovery constantly pull me to new experiences, and I look forward to all our time and challenges ahead. “It’s dangerous to go alone; take this.”

**Reproduced Chapters**

The material in this dissertation is based on the following papers which are either published or submitted for publication.

Chapter 2 is mostly a reprint of the material as it appears in IEEE Bipolar/BiCMOS Circuits and Technology Meeting 2015 and IEEE Journal of Solid-State Circuits 2016, K. Fang; C. S. Levy; J. F. Buckwalter. The dissertation author was the primary or co-primary author of these materials, and co-authors have approved the use of the material for this dissertation.

Chapter 3 is mostly a reprint of the material as it appears in IEEE Microwave and Wireless Components Letters 2017, K. Fang; J. F. Buckwalter. The dissertation author was the primary author of these materials, and co-authors have approved the use of the material for this dissertation.

Chapter 4 is mostly a reprint of the material submitted for publication in IEEE Transactions on Microwave Theory and Techniques 2018, K. Fang; J. F. Buckwalter. The dissertation author was the primary author of these materials, and co-authors have approved the use of the material for this dissertation.
VITA

2011 Chung Ip Wing-Wah Memorial SURF Fellow
2011-2012 President, Caltech IEEE Student Chapter
2012 Undergraduate Technical Intern, Intel Corporation, Santa Clara, CA
2013 B. S. with Honors in Electrical Engineering, California Institute of Technology
2013 NSF GRFP Fellow
2013 Powell Fellow, University of California San Diego
2013-2018 Graduate Research Assistant, University of California San Diego
2015 M. S. in Electrical Engineering (Electronic Circuits and Systems), University of California San Diego
2017 Ph. D. Intern, Apple Inc., Cupertino, CA
2018 Ph. D. in Electrical Engineering (Electronic Circuits and Systems), University of California San Diego
2018 Senior Designer, 5G RF Front-End IC, Nokia, Sunnyvale, CA

PUBLICATIONS


xvii
ABSTRACT OF THE DISSERTATION

Distributed Circuits for Ultra-Wideband Communications Systems

by

Kelvin Caiwen Fang

Doctor of Philosophy in Electrical Engineering (Electronic Circuits and Systems)

University of California San Diego, 2018

Professor James Buckwalter, Chair
Professor Peter Asbeck, Co-Chair

Emergent millimeter-wave (mm-wave) integrated technologies will find applications where the ample available bandwidth offers a significant performance advantage. Ultra-wideband (UWB) signals that cover multiple octaves improve resolution in imaging systems, high-frequency instrumentation, and radar. Additionally, high data rate communications systems will emphasize amplification across several frequency bands in a single amplifier. Conventional tuned amplifiers have difficulties satisfying such large bandwidth requirements due to their inherent gain-bandwidth
tradeoff. On the other hand, distributed amplifiers (DAs) provide an effective solution with their large fractional bandwidths (FBW) and low gain variation and sensitivity to mismatch.

In this dissertation, several distributed circuit design techniques are presented to improve the performance of wideband transceiver front-end system blocks. First, a novel supply-scaling technique is proposed to improve the efficiency of distributed power amplifiers. A single-ended, eight-stage DA is designed in a 90-nm SiGe BiCMOS process, and the fabricated amplifier exhibits measured 12 dB gain over 3 dB bandwidth from 14-105 GHz. The peak saturated output power ($P_{sat}$) is 17 dBm with peak power-added efficiency (PAE) of 12.6% at 50 GHz and 3 dB power bandwidth greater than 70 GHz. This is the largest single-ended output power, efficiency, and power bandwidth reported in the literature for a SiGe BiCMOS DA.

Based on the supply-scaling technique, a tunable distributed active quasi-circulator (QC) with integrated power amplifier (PA) is proposed. The fabricated QC in a 45-nm CMOS SOI process provides more than 40 dB suppression between transmit (TX) output signal at the antenna (ANT) and TX leakage into the receive (RX) port over a tuning range of 5.3-7.3 GHz in small-signal operation. The TX output achieves peak power of 18 dBm and 12% PAE at 6.3 GHz, and large-signal TX-RX suppression is optimized across power level. Compared to the literature, the distributed active QC has the largest output power and fractional operating bandwidth among circulators with >30 dB isolation.

Finally, a hybrid CMOS supply-scaled distributed amplifier (SSDA) taking advantage of higher operating voltage and distortion cancellation of scaled PMOS devices and an integrated distributed transceiver front-end (DTFE) are presented. The hybrid SSDA is designed in a 45-nm RF CMOS SOI process and achieves peak $P_{sat}$ of 17.5 dBm and PAE of 20.2% with low third-order intermodulation (IM3) and amplitude-phase (AM-PM) nonlinearities over a 3 dB bandwidth of 10-82 GHz. The DTFE utilizes time-domain duplexing (TDD) to drive a shared antenna port
for TX and RX modes. It achieves TX gain of 11.7 dB from 12-76 GHz with peak output power of 17 dBm and PAE of 14.2% and RX gain of 9 dB from 11-77 GHz with minimum noise figure (NF) of 6.2 dB. 5 GHz wideband 16-QAM modulation is demonstrated in the RF CMOS SOI circuits for data rates exceeding 20 Gb/s. The hybrid SSDA leads all reported silicon DAs in peak power efficiency and output third-order intercept point (OIP₃), and the transceiver front-end circuits achieve over 3× greater data rates than other published wideband silicon PAs.
Chapter 1

Introduction: Ultra-Wideband Integrated Circuits for High-Speed Communications

A growing interest in communication, imaging, and sensing applications at extremely high frequencies has motivated the research and development of ultra-wideband (UWB) amplifiers at millimeter-wave (mm-wave). The ample bandwidth available in mm-wave regime enables high data rate wireless communications and improved resolution in instrumentation, imaging systems, and radar. For example, in order to achieve sub-millimeter spatial resolution in human body imaging, narrow pulses containing a wide range of frequency content must be transmitted, as seen in Fig. 1.1. For such narrow pulsewidths, a bandwidth of greater than 50 GHz is required, and taking into account group delay variation near the band edge, greater than 70 GHz may be desired of the system power amplifier (PA).

As more users engage in wireless communications and transmit (TX) and receive (RX) more data per user, radio communications systems must also support increasingly higher data rates to meet market demand. Since cellular bands have been
Figure 1.1: Frequency content of a sub-millimeter Gaussian pulse.
Figure 1.2: Communications applications at mm-wave.
saturated, researchers and designers now look towards the wide bandwidth offered by the mm-wave spectrum. Fig. 1.2 illustrates mm-wave communications applications in X (7-12 GHz), K_a (12-18 GHz), K (18-27 GHz), K_a (27-40 GHz), V (40-75 GHz), and W (75-110 GHz) bands. Due to higher atmospheric propagation loss at these frequencies, TX output power and efficiency and RX noise figure (NF) are critical design parameters for radio systems. Additionally, while silicon technology scaling has improved transistor cutoff frequency ($f_t$) to the hundreds of GHz, higher reliability and yield of silicon must be leveraged against the lower intrinsic gain and breakdown of CMOS/BiCMOS processes when compared to III-V technologies.

1.1 Distributed Amplifier History and Architectures

1.1.1 Evolution of Traveling-Wave Tube Amplifiers

![Figure 1.3: Schematic of the first traveling-wave tube amplifier with parasitic absorption and frequency-modulated vacuum tube switching [1].]
Broadband amplification of signals has been desirable in many applications since the onset of communications systems design. The first traveling-wave tube amplifier (TWTA) was invented in 1936 for frequency-modulated (FM) radio (Fig. 1.3), but the design methodology did not become well-known until the distributed amplifier (DA) architecture was published in 1948 for television radio applications [3]. Early DAs were implemented exclusively using vacuum tube technology and achieved bandwidths of up to 300 MHz with typical output powers of 15 W [4].

With the invention of monolithic microwave integrated circuits (MMICs), the first integrated DAs were demonstrated using GaAs MESFETs in 1982 with bandwidths of 12 GHz and output powers up to 25 dBm [5, 6]. At this time, the theory of traveling-wave transistors was also formulated, but the idea did not take hold due to the complicated device engineering involved and an inability to achieve gain flatness [7]. On the other hand, discrete FET DAs were analyzed to be capable of giving flat response nearly up to the cutoff frequency of the transmission lines (t-lines), and well-defined design tradeoffs between gain-bandwidth, flatness, line impedance, and device number and size could be described [8]. InP DAs followed soon after, and the first multi-octave mm-wave amplifier was demonstrated in 1990 with a bandwidth of 5-100 GHz [9].

The high electron mobility, breakdown voltage, and substrate resistivity of III-V processes made them the predominant technologies for DA fabrication, but market demand for low cost, power, and chip size drove integrated circuits and devices research towards silicon processes near the turn of the century. In 1998, the first monolithic CMOS DA was demonstrated on silicon-on-sapphire (SOS) with 5 dB gain and 10 GHz bandwidth [10]. A DA fabricated in a 0.6-μm digital CMOS process achieved 6.5 dB gain over a bandwidth of 0.5-5.5 GHz without the use of thick metals or bondwire inductors in 2000 [11]. Around that time, AlGaN/GaN technologies also saw a surge of advancements, and a DA exhibiting 5 W output power and 8 GHz bandwidth on SiC substrate was reported [12]. Nevertheless, distributed
amplifiers and circuits in silicon have remained at the forefront of microelectronics research for their integrability with modern digital and analog circuits. Digital CMOS, SiGe BiCMOS, and CMOS SOI processes have garnered an overwhelming amount of attention and research study, and their continuous gate-length scaling promises ever-increasing bandwidths for integrated DAs.

1.1.2 Silicon State-of-the-Art in Distributed Amplification

In the past twenty years, there has been remarkable ingenuity in the development of distributed circuit topologies in silicon. While DAs can be cascaded similarly to other amplifier architectures for higher gain, their unique four-port characteristic enables more complex chip designs. In [13], internal feedback is used between same-side ports to amplify the input signal twice and achieve an impressive 660 GHz gain-bandwidth (GBW) product. Both individual gain stage topologies (common-gate, cascode, stacked-FET, among others) and full DA configurations have been extensively explored. Inductive coupling between gate and drain lines in combination with cascode gain stages in [14] extend the bandwidth of a single DA to 61.3 GHz in 90-nm CMOS. In [15], a novel cascaded constructive wave amplifier (CCWA) is presented utilizing active feedback stages to achieve 26 dB gain at 99 GHz. Finally, non-uniform DA architectures open up a plethora of design space for improving GBW, power efficiency, noise figure, etc. A multi-port 2-D lattice is designed in [16] to “funnel” transistor output power to the load for 20 dBm saturated output power at 85 GHz with 24 GHz power bandwidth. In [17], tapering of both the drain line impedance and gain stage size is adopted for large-signal load modulation and bandwidth improvement, leading to 348 GHz GBW, 17.5 dBm pseudo-differential $P_{sat}$, and 13.2% peak PAE in a 130-nm SiGe BiCMOS process.

While gain and bandwidth have historically been the primary performance metrics for DAs, more rigorous modern systems specifications emphasize output
Figure 1.4: Reported gain-bandwidth performance of integrated DAs in various technologies over the past two decades [2].
power, efficiency, noise figure, and linearity. Fig. 1.4 shows the GBW performance of published state-of-the-art DAs in both silicon and III-V processes. Though DAs with GBW as high as 1.5 THz have been reported, these circuits require multiple cascaded amplifiers with large die footprints and poor power efficiencies. The increasing importance of these metrics is reflected in the shift of DA figures of merit (FOM) to include 1 dB gain-compressed output power ($P_{1dB}$) and $NF$ [14]. The design methodologies presented in this dissertation aim to improve upon these characteristics, and to enable fair comparison with classical tuned amplifiers as DAs approach them in performance, comprehensive FOMs are further developed.

1.2 Constant-$k$ Artificial Transmission Lines

![Diagram](image)

Figure 1.5: Schematic of an a) $L$-network half-section and b) low-pass constant-$k$ $T$-section for DA parasitic absorption.

As the frequency characteristic of a DA is determined by the cascading filter sections which make up its input and output lines, the choice of filter type is an important design consideration. Constant-$k$ sections have been historically used
in distributed gain cells for their simplicity, robustness, and asymptotic behavior. First invented in 1922, the constant-$k$ filter is a network based on the image parameter method [18]. Fig. 1.5(a) shows a half-section $L$-network composed of series impedance $Z$ and shunt admittance $Y$. For a symmetric network, image impedance $Z_i$ is defined as the input impedance looking into a port with all other ports terminated with $Z_i$ [19]. By terminating the $L$-network with its mirror, one can calculate the image impedance for the resulting $T$-section:

$$Z_i = \sqrt{Z^2 + \frac{Z}{Y}}$$

(1.1)

For infinitesimally small $T$-sections, the image impedance approaches $k = \sqrt{Z/Y}$, the characteristic impedance $Z_0$ of the transmission line they could describe. Thus, a cascade of constant-$k$ filters can be construed as an artificial t-line, a construct particularly suited for the cascading gain stages in a DA. A low-pass $T$-section, as shown in Fig. 1.5(b), is capable of absorbing the parasitic capacitance $C_{par}$ of a distributed gain cell into its shunt admittance, enabling wideband frequency characteristic with series transmission lines of inductance and capacitance-per-length $L_{tl}$ and $C_{tl}$.

### 1.3 Distributed Power Amplifiers

Efficient utilization of mm-wave bands will emphasize amplification across several frequency bands in a single amplifier. Over the past decade, a number of DAs with bandwidths in excess of 80 GHz have been demonstrated in silicon [20–25]. However, conventional DAs suffer from poor power efficiency, making these designs unattractive for broadband power amplification. To address the efficiency issues, previous attempts at DA scaling have been realized by impedance tapering of the loaded collector-line and scaling of the gain stage device sizes [17, 26]. Unfortunately, this incurs greater resistive line losses and high-frequency reflections due to impedance...
mismatch, degrading the gain as well as limiting the number of stages that can be implemented. Therefore, the design of distributed amplifiers for high output power and efficiency over wide bandwidth remains an open challenge.

This dissertation presents a supply-scaled distributed amplifier that offers improved collector efficiency (CE) and power-added efficiency (PAE). The analysis investigates load modulation at each stage within the distributed amplifier and indicates how the supply-scaling technique performs load pulling analogous to impedance tapering but does not incur the same passive losses or frequency dependency. By feeding separate dc supply voltages through high-pass constant-\(k\) filter sections, improved power efficiency is achieved while maintaining a constant 50\(\Omega\) line impedance within the amplifier bandwidth.

1.4 Wideband Active Quasi-Circulators

Non-reciprocal components such as isolators and circulators play an important role in radio frequency (RF) systems by directing signal flow and preventing interference between circuit blocks. Circulators and quasi-circulators (QCs) have applications in reflection phase shifters and amplifiers and in RF front-ends for simultaneous transmit and receive (STAR) and full duplex (FD) systems. Ferromagnetic circulators exhibit high isolation and power handling capability but are costly and, to date, have remained incompatible with silicon integrated circuit processing.

Recently, a new category of low-loss passive circulators has been demonstrated, using the modulation of capacitive elements. However, these require additional signal generators and are limited in either frequency range [27, 28] or power handling [29].

As a consequence, there has thus been an increased interest in active QCs in silicon [30–34]. While active circulators have been shown to achieve high isolation, previous designs have also suffered from relatively low bandwidth [30] and output power [32–34], making them unattractive for high data-rate radio communications.
In this dissertation, an active QC with integrated power amplification and tunable phase offset lines is presented to realize high power and isolation over a wide frequency range.

### 1.5 Millimeter-Wave Distributed Transceivers

UWB mm-wave applications generally require narrow pulse width or high data rate while maintaining low noise and signal distortion over large fractional bandwidth (FBW). Digital beamforming at mm-wave bands places additional requirements on multi-channel, multi-function communications systems, producing new interest in broadband circuit blocks with bands of interest at 28, 39, 45, 57-71, 71-76, and 81-86 GHz. In the last few years, antennae with FBW exceeding 100% [35–38] have been shown and continue to be an important area for innovation, while wideband band-pass filters have been developed for mm-wave signal chains [39–41]. Active oscillators/frequency synthesizers [42–44], mixers [45], and circulators [31, 46, 47] are also being pushed to larger operating frequency ranges for signal generation across various bands. Recently, an active power divider/combiner circuit using distributed amplifiers was demonstrated in a 130-nm SiGe BiCMOS technology to realize a wideband control circuit for bidirectional transceivers with low amplitude and phase imbalance and good isolation from 2-22 GHz [48]. However, the design of a compact, broadband TX/RX front-end for multi-channel, millimeter-wave arrays remains an open challenge [49].

DAs feature large FBW in mm-wave applications by incorporating transmission line theory into traditional amplifier design with the additional benefit of low gain variation and sensitivity to mismatch [3]. Due to relatively low high-frequency noise over bandwidth and group delay variation, they offer an appealing solution to multi-octave and multi-mode chipsets. As CMOS silicon-on-insulator (SOI) process scaling has improved transistor cutoff frequency $f_t$ to speeds comparable to those of
SiGe HBTs, it has become feasible to integrate distributed circuits into wideband communications systems in a cost-effective manner. Lower junction capacitances, noise, and harmonic suppression due to the isolation of the buried oxide layer have led to advantages in both TX and RX. Additionally, thicker back-end-of-line (BEOL) metal layers and high-resistivity (HR) substrates have tailored CMOS SOI for mm-wave applications. Another consequence of deep scaling in advanced CMOS SOI technology nodes has been the improvement in PMOS FET mm-wave characteristics relative to NMOS [50]. As transistor lengths diminish and the mobility gap between electrons and holes becomes less relevant, the higher voltage reliability of PMOS devices offers a promising avenue for power and efficiency enhancement.

This dissertation presents several circuit design approaches to SSDAs and a distributed transceiver front-end (DTFE) in RF CMOS SOI. An NMOS DA characterizes the RF-optimized features of the process and achieves a large GBW product relative to output power, efficiency, and third-order intercept point (IP3). A hybrid CMOS DA leverages high-pass filter sections introduced by the supply-scaling technique to integrate PMOS gain stages for further enhanced linearity and large-signal operation at mm-wave frequencies. Finally, a shared antenna line topology is used to demonstrate the first distributed transceiver front-end with TX/RX switching for time domain duplexing (TDD) operation.

1.6 Dissertation Organization

Background material, previous works and motivations, and circuit and system design considerations have been introduced in Chapter 1.

Chapter 2 presents an overview of the limitations of conventional DA designs and tapered lines. It introduces the concept of supply-scaling and discusses its advantages over impedance tapering techniques, detailing the analysis of interstage load modulation due to traveling waves. The design and analysis of a band-pass DA
to enable independent supply biasing is presented. Measurements of the fabricated 8-stage supply-scaled DA in a 90-nm SiGe BiCMOS process and comparison with previous works are included. Measurements of the DA with uniform biasing are also shown to verify the supply-scaling theory.

Chapter 3 presents the design of a tunable distributed active QC in a 45-nm CMOS SOI process. It describes the use of the band-pass DA topology to allow independent phase tuning while simultaneously improving the efficiency of the integrated PA. An analysis of CMOS SOI tuning elements is also given. Measurements of the fabricated QC and comparison with previous works are shown.

Chapter 4 presents several UWB circuit designs that take advantage of mm-wave properties of a 45-nm RF CMOS SOI process to achieve superior power efficiency and linearity. It describes the design and optimization of 6-stage NMOS and hybrid CMOS DAs and DTFE. An analysis of third-order intermodulation (IM3) and amplitude-phase (AM-PM) nonlinearity cancellation with scaled PMOS devices is given. Complex measurement setups and results of the fabricated circuits and comparison with previous works are shown.

Conclusions drawn from this dissertation are discussed in Chapter 5.
Chapter 2

Supply-Scaling for Efficiency Enhancement in Distributed Power Amplifiers

2.1 Distributed Amplifier Efficiency Limitations

Figure 2.1: Schematic of a conventional low-pass DA.
Fig. 2.1 shows the schematic of a conventional uniform DA. Distributed amplifiers constructively add the output current from each gain stage in the collector transmission line as the RF input signal travels along the base line. Neglecting losses, the DA exhibits gain that linearly increases with the number of stages while maintaining bandwidth in contrast to cascading amplifier stages.

In a DA, transistor parasitic capacitances are absorbed into the input and output lines to create lumped-element $T$-section constant-$k$ filters. The cascade of $T$-sections forms an artificial transmission line whose cutoff frequency determines the bandwidth of the DA [51]. For transmission line segments of length $l_{seg}/2$, with inductance-per-length $L_{tt}$ and capacitance-per-length $C_{tt}$, to each side of the gain stage, loaded by parasitic capacitance $C_{par}$ as shown in Fig. 1.5, the $T$-section characteristic impedance $Z_{0,l}$ and low-pass cutoff frequency $f_{c,l}$ are given by

$$Z_{0,l} = \sqrt{\frac{L_{tt} \times l_{seg}}{C_{tt} \times l_{seg} + C_{par}}} \quad (2.1)$$

$$f_{c,l} = \frac{1}{\pi \sqrt{(L_{tt} \times l_{seg})(C_{tt} \times l_{seg} + C_{par})}}. \quad (2.2)$$

While DAs achieve large gain-bandwidth product, conventional topologies exhibit poor power efficiency due to a number of factors. Since the collector of each transistor sees the same impedance in both directions, half of the collector current from each stage travels towards the reverse termination (these reverse currents generally do not cancel, and the power is lost). Secondly, the wideband nature of the amplifier prohibits harmonic tuning of transistor outputs, preventing waveform engineering for higher-efficiency classes of amplifier operation. Finally, the voltage swing at each stage is not uniform, with later stages having a larger swing due to voltage summing along the collector transmission line. Since the dc collector bias is shared amongst all stages, this results in a large amount of wasted headroom. The inefficiency is evident from Fig. 2.1, where the impedance seen at the collector of the $n^{th}$
stage in an $N$-stage DA with amplitude of $v_n$ and current swing of $i_{C,n}$ is

$$Z_{C,n} (\omega) = \frac{v_n (\omega)}{i_{C,n}} \prod_{m=1}^{n} e^{-j \theta_{B,m}(\omega)}$$

$$= \frac{v_{F,n} (\omega) + v_{R,n} (\omega) + v_{C,n} (\omega)}{i_{C,n}} \prod_{m=1}^{n} e^{-j \theta_{B,m}(\omega)}, \quad (2.3)$$

where

$$v_{F,n} (\omega) = \sum_{k=1}^{n-1} \left[ v_{C,k} (\omega) \sqrt{\frac{Z_{F,n} (\omega)}{Z_{F,k} (\omega)}} \prod_{m=k+1}^{n} e^{-j \theta_{C,m}(\omega)} \right]$$

$$v_{R,n} (\omega) = \sum_{k=n+1}^{N} \left[ v_{C,k} (\omega) \sqrt{\frac{Z_{R,n} (\omega)}{Z_{R,k} (\omega)}} \prod_{m=n}^{k} e^{-j \theta_{C,m}(\omega)} \right] \quad (2.4a)$$

$$v_{C,n} (\omega) = i_{C,n} \left( Z_{F,n} (\omega) \parallel Z_{R,n} (\omega) \right) \prod_{m=1}^{n} e^{-j \theta_{B,m}(\omega)}. \quad (2.4b)$$

The forward traveling voltage wave $v_{F,n}$ is due to the current of the preceding stages (zero for the first stage), while $v_{R,n}$ is the reverse wave from subsequent stages (zero for the last stage), and $v_{C,n}$ is the voltage induced by the transistor. The latter is determined from the small-signal impedance seen by collector $n$ looking toward the reverse termination ($R_R$),

$$Z_{R,n} (\omega) = Z_{0,n} \frac{Z_{R,n-1} (\omega) + j Z_{0,n} \tan \theta_{C,n} (\omega)}{Z_{0,n} + j Z_{R,n-1} (\omega) \tan \theta_{C,n} (\omega)}; \quad (2.5)$$

and toward the load ($R_L$),

$$Z_{F,n} (\omega) = Z_{0,n+1} \frac{Z_{F,n+1} (\omega) + j Z_{0,n+1} \tan \theta_{C,n+1} (\omega)}{Z_{0,n+1} + j Z_{F,n+1} (\omega) \tan \theta_{C,n+1} (\omega)}; \quad (2.6)$$

where $Z_{0,n}$, $\theta_{n}$ describe the characteristic impedance and electrical length of the transmission line section to the left of stage $n$, and $Z_{R,0}$, $Z_{F,N+1}$ equal $R_R$ and
$R_L$, respectively. In the case of a single stage amplifier, the impedance seen at the collector is not affected by traveling waves and $Z_{C,n} = v_{C,n}/i_{C,n}$, which would be recognized as the impedance to optimally match the transistor for the transfer of power into a load. For the general case of an $N$-stage DA, however, this is not true.

### 2.1.1 Uniform Distributed Amplifier

![Graph](image-url)  

Figure 2.2: Frequency variation in real and imaginary collector impedances in a uniform DA for $\theta = 0.36 - 2.79$.

For a uniform DA, $Z_{F,n} = Z_{R,n} = Z_0$ and $\theta_{B,n} = \theta_{C,n} = \theta$. Therefore, the impedance seen at the collector simplifies to

$$Z_{C,n}(\omega) = \frac{Z_0}{2} \left( \sum_{k=1}^{n-1} \frac{i_{C,k}}{i_{C,n}} + 1 + \sum_{k=n+1}^{N} \frac{i_{C,k}}{i_{C,n}} e^{-2j(k-n)\theta(\omega)} \right).$$  

(2.7)
When each device contributes the same current, i.e. \( i_{C,k} = i_C \), this impedance is further simplified to

\[
Z_{C,n}(\omega) = \frac{Z_0}{2} \left( n + e^{-j(N+1-n)\theta(\omega)} \frac{\sin (N - n) \theta (\omega)}{\sin \theta (\omega)} \right).
\]  

(2.8)

In this case, the impedance seen by the collector has a linearly increasing real component, as well as a complex component that leads to frequency dependent amplitude and phase variation. Fig. 2.2 shows the variation in \( Z_{C,n} \) with respect to electrical length \( \theta \), and Fig. 2.3 shows collector voltages \( v_n \). It can be seen that all but the final gain stage in a uniform DA have output voltages and impedances that change periodically with frequency. The preceding theory has been corroborated by electro-optic measurements on signal propagation internal to distributed amplifiers [52]. More generally, when each stage of the DA is not uniform, the \( n^{th} \) transistor not only sees frequency dependent load modulation due to \( v_{F,n} \) and \( v_{R,n} \) from (2.3), but also frequency-varying \( Z_{F,n} \) and \( Z_{R,n} \), which impact \( v_{C,n} \) as described in (2.4c). The ability to control the impedance at each collector forms the basis of loadline modulation.

2.1.2 Optimal Loadline Matching

In a conventional uniform DA, the collector of each transistor is fixed to \( V_{C,n} = V_{CC} \) and the loadline impedance is

\[
Z_{OPT,n} = \frac{V_{CC} - V_K}{I_C}.
\]

(2.9)

where \( V_K \) is the knee voltage of the technology and \( I_C \) is the dc bias current (constant across stages). For a uniform DA, this loadline impedance is always larger than or equal to the optimum impedance seen in (2.3), since the reverse traveling voltages do not always add constructively. For class-A operation, the maximum amplitudes for the voltage and current are \( v_n = V_{CC} - V_K \) and \( i_{C,n} = I_C \). We desire to set the
Figure 2.3: Successive collector voltage magnitudes versus transmission line electrical length in a uniform DA.
collector impedance according to the loadline impedance:

$$Z_{OPT,n} = \frac{\max(v_n)}{\max(i_{C,n})} = Z_{C,n}.$$  

However, it is obvious that the voltage swing at each transistor is different even as the current through each stage is fixed. This leads to a non-optimal loadline matching for uniform DAs.

One solution to the DA efficiency problem is impedance tapering along the output transmission line as proposed by [3]. Using this approach, collector line impedances and lengths are set such that $v_{R,n}$ (or $i_{R,n}$) = 0 and all the generated power travels to the RF output, circumventing the need for a reverse termination. The tapered line load pulls each transistor $n$ to see $Z_{opt,n} = (V_{CC} - V_K) / I_{C,n}$ with a constant voltage swing. A number of impedance-tapered DAs have been demonstrated in silicon and III-V processes with efficiency gains [53, 54]. However, due to the frequency dependency of the load pulling mechanism, these designs can only achieve narrowband efficiency enhancement through careful optimization of unequal-length sections. Additionally, the high load impedances at early stages require narrow-width transmission lines that are lossy and difficult to synthesize, even with III-V back-end-of-line (BEOL) processes [26].

Recently, efforts have been made in [17] to design an enhanced-efficiency DA with over 100 GHz bandwidth, utilizing simultaneous scaling of device size with output line impedance. The low degree of tapering, however, dictates the need for an explicit reverse termination ($Z_{R,1}(\omega) \neq \infty$ as in ideal impedance tapering) and sacrifices the perfect cancellation of reflected waves. The resulting mismatches along the output line, combined with losses from the high-impedance early stages, limit the output power and overall PAE, especially at higher frequencies. It is evident that the inability to synthesize a large range of transmission line impedances with low loss is a major detriment to attempts at efficiency improvement using these techniques.
2.2 Efficiency Enhancement through Supply Scaling

Figure 2.4: Simulated collector voltage magnitudes of a stage-scaled DA with 1.1 impedance scaling and 0.96 current scaling.

To avoid tapered transmission lines, we propose a supply-scaling technique for enhancing DA efficiency while maintaining a constant 50-$\Omega$ characteristic impedance along the synthesized collector line. From Fig. 2.3, it can be seen that the voltage at successive collectors increases along the output line monotonically, and more accurately, the average voltage increases linearly inside the amplifier pass band. This feature contrasts with a stage-scaled DA (shown in Fig. 2.4 with transmission line scaling of 1.1 and current scaling of 0.96), which exhibits larger variation in voltage and impedance with respect to frequency. By independently setting the dc collector voltages $V_{C,n}$ to match the maximum $v_n$ in each section of a standard DA, we
eliminate the wasted headroom present at each but the last stage. This approach effectively moves the loadline of each transistor to an optimal point for dc power consumption without requiring any change in the passive component parameters from stage to stage:

\[ Z_{OPT,n} = \frac{V_{C,n} - V_K}{I_{C,n}} = \frac{v_{F,n} + v_{R,n}}{i_{C,n}} + \frac{Z_0}{2}. \]  

(2.11)

While supply-scaling performs load modulation analogous to a tapered line (Fig. 2.5), it offers a number of advantages for wideband operation. Not only are high-impedance transmission lines avoided, but the sensitivity of efficiency-enhanced operation to frequency and process variation is lower compared to that of impedance tapering as well.

Figure 2.5: Loadlines of subsequent stages in a DA under supply-scaling and impedance tapering schemes. Whereas successive stages maintain constant voltage swing under impedance tapering, a supply-scaled architecture matches the optimum loadlines through independent voltage biasing.
Looking at the $N^{th}$ collector in Fig. 2.3, the peak output power for an ideal lossless non-tapered DA operating under class-A bias is constant across all frequencies and given by
\[ P_{\text{out}} = \frac{N_{iC,n}}{2\sqrt{2}} \times \frac{v_{C,N}}{\sqrt{2}} = \frac{1}{8}N^2I_C^2Z_0. \] (2.12)

The dc power consumed per stage is
\[ P_{\text{DC},n} = V_{C,n}I_C. \] (2.13)

In a uniform DA (i.e. $V_{C,n} = V_{C,N}$), the theoretical collector efficiency is therefore
\[ CE = \frac{P_{\text{out}}}{NP_{\text{DC},n}} = 25\%, \] half that of a conventional class-A amplifier.

On the other hand, if the supply voltages are scaled such that $V_{C,n} = n(V_{C,N} - V_K)/N$, the maximum voltage swing at each stage within the pass band, the collector efficiency becomes
\[ CE = \frac{\frac{1}{4}N^2I_CV_{C,1}}{I_CV_{C,1}\sum_{n=1}^{N} n} = \frac{N}{2(N + 1)}. \] (2.14)

As shown in Fig. 2.6, the theoretical efficiency of a supply-scaled DA approaches 50% as $N$ becomes large. In reality, a number of factors prevent maximum efficiency operation, including collector line losses and nonzero knee voltage. Providing individual dc supplies to each gain stage may prove to be impractical for real systems as well.

### 2.3 Design Methodology of Band-Pass DA

Conventional DA designs feature base and collector transmission lines with low-pass characteristics and a shared dc bias across all gain stages. To avoid $I^2R$ loss, efficient DAs must supply the collector bias through an off-chip bias tee or choke, whose low-frequency cutoff prevents true dc performance, rather than through the reverse termination. In some applications, such as odd-derivative Gaussian pulse
generation and wideband RF communications, it is not necessary to provide amplification down to dc. Thus, the bias voltage levels can be isolated between DA sections. To realize independent biasing of the supply voltages along the DA and eliminate the need for a bulky bias-tee, a band-pass topology is chosen, which introduces dc-blocking capacitors and dc-feed inductors in between transmission line segments as parts of a high-pass $T$-section filter.

![Graph showing collector efficiency as a function of number of supplies.](image)

Figure 2.6: Peak collector efficiency of an ideal supply-scaled DA as a function of number of supplies.

### 2.3.1 Passive Element Design

To achieve $Z_{0,l} = 50 \, \Omega$ in (2.1), the transmission line $Z_{0,T} = \sqrt{L_{ul}/C_{ul}}$ must be greater than 50 $\Omega$ since the device parasitic capacitance lowers the final characteristic impedance. In addition, losses in the transmission line, expressed per-length as $\alpha_{ul}$ in the propagation constant $\gamma_{ul} = \alpha_{ul} + \beta_{ul}$, limit the marginal gain of each additional
stage [55]. We seek to minimize the total attenuation factor $\alpha_d l_{seg}$ while maximizing $Z_0$ to allow for the largest parasitic capacitance loading, and thus, gain per stage.

Fig. 2.7 shows the BEOL stackup for this process. Since the dielectric stack height is sufficiently large, a microstrip line is used as the transmission line element to ease access to the device. Optimizing the shunt capacitance loading budget with respect to line resistance results in a 2 $\mu$m-wide line on M9 layer, with $Z_{0,T}$ of 78.6 $\Omega$ and less than 0.7 dB/mm loss up to 110 GHz. Keeping $Z_{0,l} = 50 \Omega$ and setting our target bandwidth $f_{c,l} = 110$ GHz, the total series inductance and shunt capacitance per stage are 145 pH and 58 fF, respectively. For comparison with Fig. 2.3, this results in a transmission line $\theta$ of $\pi/2$ at 55 GHz.

Figure 2.7: BEOL stackup and cross-section of microstrip $T$-line in 90-nm SiGe BiCMOS process.

The band-pass DA also includes a high-pass constant-$k$ section to decouple the dc level of each stage. Fig. 2.8 shows the embedded high-pass $T$-section within the standard low-pass filter. For shunt inductance $L_{hp}$ and series capacitors $2C_{hp}$, the characteristic impedance and low-frequency cutoff are

$$Z_{0,h} = \sqrt{\frac{L_{hp}}{C_{hp}}} \quad (2.15)$$
\[ f_{c,h} = \frac{1}{4\pi\sqrt{L_{hp} \times C_{hp}}} . \]  

Matching \( Z_{0,h} = Z_{0,l} = 50 \, \Omega \) and choosing \( f_{c,h} = 8 \, \text{GHz} \) to cover X-band frequencies, (2.15) and (2.16) give \( L_{hp} = 500 \, \text{pH} \) and \( C_{hp} = 200 \, \text{fF} \).

As the additional components of the high-pass \( T \)-section contain parasitic elements, the effect on DA performance should be considered. The shunt inductors contribute not only inductance, but also shunt capacitance and conductance due to winding and substrate leakage. Shunt capacitance sets a self-resonant frequency for the inductor in many applications, but in the artificial transmission line of a DA, it can be included as part of the low-pass \( T \)-section. Thus, whereas the parasitic capacitance degrades the high-frequency performance in a purely high-pass DA [56], it is absorbed here with \( C_{par} \) from the transistor. While this means the capacitance will not modify the response of the high-pass constant-\( k \) sections, it does lead to a reduction in the allowable device size, which may result in a reduction in either \( f_T \) or \( P_{1dB} \) and \( P_{sat} \), depending on how the device biasing is optimized. As such, it is imperative to make the inductor footprint as small as possible to occupy less of the shunt capacitance budget. Generally, reducing the line width and turn diameter to decrease the capacitance to ground results in higher series resistive losses through the inductor. However, due to the inductor's already high-impedance nature at RF

Figure 2.8: Synthesized band-pass transmission line including inductor parasitics.
frequencies, this parasitic resistance is not critical in comparison to capacitive effects. Assuming a fixed resistance $R_S$ over frequency, and an inductor component quality factor greater than 10 in the band of interest,

$$G_{Shunt} = \frac{1}{R_{Shunt}} = \frac{1}{R_S \left[ 1 + \left( \frac{\omega L_S}{R_S} \right)^2 \right]} = \frac{R_S}{R_S^2 + (\omega L_S)^2} \approx \frac{R_S}{(\omega L_S)^2}.$$  \hspace{1cm} (2.17)

Figure 2.9: Simulated shunt inductor Q factor after absorption of capacitive elements into constant-$k$ section.

From (2.17), it is clear that with increasing frequency, the impact of the inductor losses becomes minimal, justifying the use of a low-Q, compact, high turn count, square spiral. Electromagnetic simulations [57] of the inductor show the conductance is less than 2 mS above 20 GHz, significantly less than that of the transistor, indicating negligible impact on DA performance. Fitting the simulated behavior to the lumped-element model shown in Fig. 2.8, it is found that $R_{ser}$
dominates the resistive performance, and thus, the capacitive elements are absorbed into the constant-\(k\) section with minimal effect. The spiral inductor achieves peak \(Q\) of 16 after absorption of 10 fF total shunt capacitance into the artificial transmission line (Fig. 2.9).

In a similar manner as the shunt inductor, the shunt capacitance to ground for the series blocking capacitors of the high-pass \(T\)-section can be included in the low-pass constant-\(k\) line. High density metal-insulator-metal (MIM) capacitors in the SiGe technology used exhibit shunt parasitic capacitance of only 2.5 fF, and hence do not significantly impact the device size. The series inductance of the capacitor can be absorbed into the inductance of the low-pass constant-\(k\) section, and the resistive losses are directly included in the series loss of the artificial transmission lines. Care must be taken in design of the connections to the series capacitors to minimize this resistance. Capacitors from adjacent high-pass sections can be combined to reduce area and loss.

### 2.3.2 Active Element Design

In the 90-nm SiGe BiCMOS process, HBTs have simulated peak transit frequency \(f_t\) upwards of 300 GHz [58, 59]. However, base resistance in bipolar devices creates shunt losses in the synthesized input line, degrading the gain-bandwidth (GBW) product of the DA. Furthermore, as detailed in [17], the input conductance and capacitance of a common-emitter amplifier increase with frequency, incurring extra loss and impedance mismatch. To counteract these effects, resistive emitter degeneration flattens the input characteristic for ultra-wideband operation. For base-emitter capacitance \(C_{be}\), base resistance \(r_b\), transconductance \(g_m\), and degeneration resistance \(R_E\), the capacitance and conductance looking into the base are given by [17]:

\[
C_{in}' = \frac{C'}{1 + \omega^2 C''^2 (r_b + R_E)^2}
\]
\[ G'_{in} = \frac{\omega^2 C' (r_b + R_E)}{1 + \omega^2 C'^2 (r_b + R_E)^2}, \] (2.19)

where \( C' = C_{bc}/(1 + g_m R_E) \). As shown in Fig. 2.10, increasing emitter resistance \( R_E \) has diminishing returns on the decrease of input capacitance - this in turn reduces the transmission line loading sensitivity to process variation. On the other hand, while larger \( R_E \) linearly reduces the effective \( G_m \) of the transistor, the smaller input capacitance and conductance allows for the use of a larger device and lowers the shunt loss in the \( T \)-section. With these considerations, a 20 \( \Omega \) resistor is chosen to maintain high overall gain for the target operation bandwidth. The bandwidth gained comes at the expense of a slight degradation in efficiency, as there is power lost across \( R_E \). As shown in Fig. 2.11, we trade only 1\% in PAE to improve the bandwidth by more than 80 GHz.

The final DA gain stage employs an HBT cascode to mitigate the Miller effect, increase the input and output impedances of the stage, and improve the isolation...
Figure 2.11: Bandwidth-efficiency tradeoff of the DA versus gain stage degeneration resistance $R_E$.

Figure 2.12: Simulated gain as a function of number of band-pass DA stages.
between base and collector lines. High-performance HBTs in this process achieve maximum $f_t$ at 2 mA/$\mu$m current density. We bias the common-emitter transistor at 1.8 mA/$\mu$m to avoid the $f_t$-rolloff associated with Kirk effect at high current swing. Because of the difference in capacitance seen at the base and collector, emitter lengths of 12 $\mu$m and 6 $\mu$m are chosen for the common-base and common-emitter devices, respectively, to satisfy $Z_{0,l} = 50$ $\Omega$ for both input and output lines. Additionally, a degeneration capacitor of 40 fF is included in parallel with $R_E$ to introduce a high-frequency zero at $\omega_z = 1/(R_E C_E)$ for gain peaking near the low-pass cutoff. To ensure good decoupling of the dc bias network, the base of the cascode device is biased at 2.6V through a combination of MOS and MIM RC low-pass filters. The MIM capacitor is placed as closely to the device as possible to limit the parasitic inductance of the bias path, preventing high-frequency instability.

![Figure 2.13: Simulated optimal PAE versus number of independent scaled supplies for an 8-stage DA.](image)

Figure 2.13: Simulated optimal PAE versus number of independent scaled supplies for an 8-stage DA.
Though an ideal DA exhibits unbounded GBW product as more stages are added, attenuation of the signals along the base and collector transmission lines limits the achievable gain in practicality. Fig. 2.12 shows the simulated gain versus number of stages. An 8-stage DA is found to offer a good GBW-to-chip area ratio. A sweep of the number and values of collector dc voltages in Fig. 2.13 then reveals the optimal supply-scaling for peak PAE. Since the improvement in efficiency between four and eight independent supplies is small, we opt to tie the supplies of every two stages together as a tradeoff between efficiency and chip complexity. This results in a simulated 20% improvement in PAE over a uniform DA with negligible impact on gain. Though the peak CE for an ideal supply-scaled DA with four independent supplies is 40% from (2.14), we are only able to scale from 2.7 to 4.0 V, giving a peak CE of 29.6% - transistor knee voltage and emitter resistance and transmission line losses further reduce the efficiency. Fig. 2.14 shows the simulated collector voltage magnitude for each successive gain stage. Compared to the voltage distribution found for an ideal low-pass DA in Fig. 2.3, the band-pass DA sees a shift in the zero electrical length frequency due to the high-pass $T$-section. Additionally, the collector voltages exhibit non-idealities as the DA approaches the cutoff frequency of the low-pass artificial transmission line. However, one may observe the expected monotonically increasing voltage distribution in the mid-band.

2.4 Measurement Results

The schematic of the fabricated 8-stage supply-scaled DA is shown in Fig. 2.15, and a chip microphotograph is shown in Fig. 2.16. The amplifier occupies an area of $2.65 \text{ mm} \times 0.57 \text{ mm}$, including pads. Measurement of the DA is performed with on-wafer probing, and no de-embedding of pad parasitics is done. Forward and reverse terminations are provided on-chip, and the high-pass $T$-section filters allow for supplying of dc biases without the need for bias-tees. Four supply voltages of 2.7,
Figure 2.14: Simulated collector voltage magnitudes in the 8-stage supply-scaled DA.

Figure 2.15: Schematic of the fabricated 8-stage supply-scaled DA.
3.2, 3.6, and 4.0 V draw 21.6 mA of nominal bias current each, resulting in total dc power consumption of 297 mW for small input signal. Fig. 2.17 shows the measured and simulated $S$-parameters and stability factor $\mu$ of the DA. The amplifier achieves a peak small-signal gain of 12.0 dB with a 3 dB pass-band bandwidth from 14-105 GHz (91 GHz), corresponding to a GBW product of 362 GHz. Measured gain is 2.5 dB lower than simulated, which is consistent with slow HBT process corners on this fabrication run. Additionally, the S-parameters show a pronounced ripple in gain and impedance match around 30 GHz. This degradation is mainly caused by imperfect modeling of the dc supply distribution network, whose extra inductance becomes manifest in the high-pass $T$-section filters near the low-frequency cutoff. Except for this ripple, the input and output rejection ratios $S_{11}$ and $S_{22}$ are better than 9 dB from 10-90 GHz. The reverse isolation is less than -24 dB, and the amplifier is unconditionally stable across the entire measured frequency range.
Figure 2.17: Measured and simulated S-parameters and μ-stability factor of the supply-scaled DA.
Large-signal measurements are performed across the entire operating frequency band. Measured and simulated power, gain, and efficiency at the midband frequency of 50 GHz are shown in Fig. 2.18. The DA has a 1 dB gain-compressed output power $P_{-1dB}$ of 14.9 dBm and saturated output power $P_{sat}$ of 17.0 dBm. Peak CE and PAE are 15.1% and 12.6%, respectively. The output power characteristics match simulation results reasonably well, though maximum PAE suffers by 1.2% due to the lower gain. Fig. 2.19 shows measured and simulated $P_{sat}$, $P_{-1dB}$, and peak CE and PAE at various frequencies across the bandwidth. The 3 dB power bandwidth is greater than 70 GHz (15-87 GHz), and PAE is better than 8.5% up to 80 GHz, except for the aforementioned gain degradation at 30 GHz. As a control experiment, the DA was also measured with all supply voltages set equal to the final stage value of 4 V (Fig. 2.20). The uniform DA exhibited 17% reduced CE and PAE.
and consumed the same current compared to supply-scaled operation, while only a 0.2 dB increase in gain and saturated output power on average was observed.

Supply-scaling comes at the disadvantage of requiring multiple supplies. Nonetheless, given approximately 95% efficiency of modern switched dc-dc converters, the supply-scaled DA efficiency would be reduced by 5.3% to a PAE of 12.0% due to converter inefficiency. Even accounting for the converter, the supply-scaled DA maintains an overall 11.1% efficiency advantage over the uniform DA PAE of 10.8%. In addition, the introduction of switched-mode power converters could suggest the possibility to explore dynamic supply modulation for envelope-tracking techniques under back-off conditions. This verifies the supply-scaling theory and design as an effective method of efficiency enhancement without DA performance degradation.

Table 2.1 summarizes the performance of similar published wideband DAs. The supply-scaled DA achieves the largest reported GBW product of any single-stage silicon-based distributed power amplifier (> 10 dBm) to the author’s knowledge. Among amplifiers shown in silicon, this work also has the largest single-ended output power, with comparable 3 dB power bandwidth. In particular, the supply-scaled DA exhibits nearly 3 dB greater output power and 3% higher efficiency above 70 GHz than the stage-scaled SiGe BiCMOS DA in [17]. While this work implemented a single-ended amplifier as opposed to [17], supply-scaling could also be applied to a differential DA to double the output power. A differential design might also be utilized to simplify the biasing circuits. However, differential DAs in the millimeter-wave bands require an output balun with a bandwidth that matches the amplifier bandwidth, and this poses a significant challenge. Compared to silicon-based W-band tuned PAs in Table 2.2, the DA achieves much greater bandwidth while maintaining similar peak power, efficiency, and gain.
Figure 2.19: Measured and simulated output power, 1 dB gain-compressed power, and peak CE and PAE over the operating bandwidth.
Figure 2.20: Measured power gain, CE, and PAE at 50 GHz, and peak CE and PAE across the bandwidth, of the DA with uniform supply biasing.
2.5 Conclusion

This chapter has introduced supply-scaling as a technique to achieve wideband power efficiency enhancement in DAs, and its advantages over impedance tapering techniques are discussed from the point of view of interstage load modulation. Design methodology of a band-pass DA to enable the technique is presented, with focus on the effects of high-pass constant-\( k \) filter element parasitics. To verify the theory of efficiency enhancement, an 8-stage supply-scaled DA is demonstrated in a 90-nm SiGe BiCMOS process with bandwidth greater than 90 GHz. Peak saturated output power of 17 dBm is measured with a relatively high PAE of 12.6% using four supply voltages from 2.7 V to 4.0 V. Compared to previously reported mm-wave silicon-based DAs, the presented amplifier demonstrates superior power and efficiency performance above 70 GHz.

Acknowledgement

This chapter is mostly a reprint of the material as it appears in IEEE Journal of Solid-State Circuits, Sept. 2016, K. Fang; C. S. Levy; J. F. Buckwalter and Proceedings of the 2015 IEEE Bipolar/BiCMOS Circuits and Technology Meeting, Oct. 2015, K. Fang; C. S. Levy; J. F. Buckwalter. The dissertation author was the co-primary author of this material. The authors would like to acknowledge Integrand Software for the use of EMX, the Trusted Foundry Access Program for chip fabrication, and the National Science Foundation for support through a CAREER research award and Graduate Fellowship. Additionally, the support of the Office of Naval Research is appreciated. They also thank Bagher Rabet for valuable discussions.
<table>
<thead>
<tr>
<th>Reference</th>
<th>Gain (dB)</th>
<th>BW (GHz)</th>
<th>GBW (GHz)</th>
<th>Peak $P_{\text{1dB}}$ (dBm)</th>
<th>PAE at $P_{\text{1dB}}$ (%)</th>
<th>Area (mm$^2$)</th>
<th>Technology</th>
</tr>
</thead>
<tbody>
<tr>
<td>[20]</td>
<td>15</td>
<td>80</td>
<td>450</td>
<td>N/A</td>
<td>N/A</td>
<td>0.31</td>
<td>40-nm digital CMOS</td>
</tr>
<tr>
<td>[21]</td>
<td>8.5</td>
<td>135</td>
<td>360</td>
<td>10</td>
<td>7.9</td>
<td>0.36</td>
<td>55-nm SiGe BiCMOS</td>
</tr>
<tr>
<td>[22]</td>
<td>9</td>
<td>92</td>
<td>259</td>
<td>N/A</td>
<td>N/A</td>
<td>0.45</td>
<td>45-nm CMOS SOI</td>
</tr>
<tr>
<td>[23]</td>
<td>11</td>
<td>85 (5-90)</td>
<td>320</td>
<td>12</td>
<td>6.8</td>
<td>1.28</td>
<td>0.12-μm CMOS SOI</td>
</tr>
<tr>
<td>[24]</td>
<td>7.4</td>
<td>80</td>
<td>190</td>
<td>N/A</td>
<td>N/A</td>
<td>0.72</td>
<td>90-nm CMOS</td>
</tr>
<tr>
<td>[25]</td>
<td>10</td>
<td>170</td>
<td>537</td>
<td>7.5</td>
<td>4.5</td>
<td>0.38</td>
<td>0.13-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[26]</td>
<td>13</td>
<td>16 (1-17)</td>
<td>69</td>
<td>35</td>
<td>20</td>
<td>15.3</td>
<td>0.25-μm GaN-SiC</td>
</tr>
<tr>
<td>[17]</td>
<td>10</td>
<td>110</td>
<td>348</td>
<td>16.7*</td>
<td>11.5</td>
<td>2.18</td>
<td>0.13-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[60]</td>
<td>4</td>
<td>87 (4-91)</td>
<td>138</td>
<td>9</td>
<td>4.4</td>
<td>0.80</td>
<td>0.12-μm CMOS SOI</td>
</tr>
<tr>
<td>[61]</td>
<td>14</td>
<td>74</td>
<td>370</td>
<td>3.2</td>
<td>2.4</td>
<td>1.72</td>
<td>90-nm CMOS</td>
</tr>
<tr>
<td>[62]</td>
<td>20</td>
<td>39</td>
<td>394</td>
<td>6.5</td>
<td>1.8</td>
<td>2.24</td>
<td>0.18-μm CMOS</td>
</tr>
<tr>
<td>[63]</td>
<td>7</td>
<td>43</td>
<td>105</td>
<td>N/A</td>
<td>N/A</td>
<td>1.80</td>
<td>0.13-μm CMOS SOI</td>
</tr>
<tr>
<td>[64]</td>
<td>10</td>
<td>45</td>
<td>142</td>
<td>33</td>
<td>N/A</td>
<td>2.76</td>
<td>0.15-μm GaN-SiC</td>
</tr>
<tr>
<td>[65]</td>
<td>12</td>
<td>22</td>
<td>88</td>
<td>32</td>
<td>26</td>
<td>4.35</td>
<td>0.25-μm GaAs</td>
</tr>
<tr>
<td>This Work</td>
<td>12</td>
<td>91 (14-105)</td>
<td>362</td>
<td>14.9</td>
<td>9.7</td>
<td>1.51</td>
<td>90-nm SiGe BiCMOS</td>
</tr>
<tr>
<td>Reference</td>
<td>Gain $(\text{dB})$</td>
<td>BW $(\text{GHz})$</td>
<td>Peak $P_{\text{1dB}}$ $(\text{dBm})$</td>
<td>Peak $P_{\text{sat}}$ $(\text{dBm})$</td>
<td>Peak PAE $(%)$</td>
<td>$P_{\text{sat}}$ BW $(\text{GHz})$</td>
<td>Technology</td>
</tr>
<tr>
<td>-----------</td>
<td>----------------</td>
<td>----------------</td>
<td>-----------------</td>
<td>-----------------</td>
<td>-----------</td>
<td>----------------</td>
<td>------------------------------</td>
</tr>
<tr>
<td>[17]</td>
<td>10</td>
<td>110</td>
<td>16.7*</td>
<td>17.5*</td>
<td>13.2</td>
<td>77</td>
<td>0.13-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[16]</td>
<td>9</td>
<td>N/A</td>
<td>N/A</td>
<td>21</td>
<td>3.5</td>
<td>24 (73-97)</td>
<td>0.13-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[66]</td>
<td>15</td>
<td>56</td>
<td>14.4</td>
<td>17.2</td>
<td>9.2</td>
<td>15 (77-92)</td>
<td>0.13-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[67]</td>
<td>17</td>
<td>15</td>
<td>14.5</td>
<td>17.5</td>
<td>12.8</td>
<td>23 (65-88)</td>
<td>0.12-μm SiGe BiCMOS</td>
</tr>
<tr>
<td>[68]</td>
<td>12</td>
<td>27</td>
<td>12.5</td>
<td>14.8</td>
<td>8.7</td>
<td>25 (80-105)</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[69]</td>
<td>18</td>
<td>33</td>
<td>12.0</td>
<td>14.0</td>
<td>4.5</td>
<td>33 (77-110)</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>This Work</td>
<td>12</td>
<td>91</td>
<td>14.9</td>
<td>17.0</td>
<td>12.6</td>
<td>70 (15-85)</td>
<td>90-nm SiGe BiCMOS</td>
</tr>
</tbody>
</table>

*Pseudo-differential operation*
Chapter 3

A Tunable Distributed Active Quasi-Circulator in CMOS SOI

3.1 Multiband Frequency-Division and Full-Duplex Communications

As more users engage in wireless communications and transmit and receive more data per user, radio systems must support increasingly large data rates to meet market demand. Since cellular bands are already crowded, designers are looking to higher frequencies of operation for data transmission – but while the mm-wave regime sports ample available spectrum, circuits and systems design is more challenging and power hungry. Thus, it is also beneficial to investigate more efficient usage of the cellular bands. Currently, most point-to-multipoint networks utilize either time- or frequency-domain duplexing (TDD/FDD) schemes to divide forward and reverse communication channels, preventing interference between TX and RX signals. Assuming symmetric uplink and downlink data rates, these duplexing methods use only half of either the available time or bandwidth. Recently, full-duplex (FD), or simultaneous transmit and receive (STAR), wireless, as shown in Fig. 3.1, has been
Figure 3.1: Schematic of an FDD/FD radio front-end. Simultaneous transmit and receive is enabled by an isolating duplexer/circulator.

drawing significant research attention. By applying self-interference cancellation of a radio’s output signal, a FD/STAR channel access method can be achieved, doubling the theoretical wireless capacity. STAR systems are particularly appropriate for peer-to-peer networks, such as for public safety and military radios, that are flexible in terms of frequency assignment and omit the large tuned surface acoustic wave (SAW) duplexers found in current Long-Term Evolution (LTE) smartphones. Furthermore, while FD channels can be realized with multiple antennas through spatial duplexing, if innovations can be made in STAR systems using a common RF carrier and antenna, circuit area would decrease dramatically.

The key isolating component in both multiband FDD and FD radios, the circulator is often bulky, expensive, and difficult to integrate with silicon chips. Passive circulators involve ferromagnetic materials that must be carefully confined from silicon FETs, though they exhibit high degrees of isolation and power handling with low insertion loss. Research has been done on microelectromechanical (MEMS) cir-
Calculators, but they remain unreliable and difficult to fabricate in comparison. On the other hand, active circulators utilize transistor non-reciprocity to provide unidirectional signal flow and are thus more compact and affordable. However, they require dc power, degrade signal-to-noise ratio (SNR), and are limited by transistor breakdown effects. Both common active and passive circulators are relatively narrowband in operation due to the phase specificity of their interference cancellation mechanisms, though we aim to improve the operation bandwidth for more mobile radio communications and spectral efficiency.

### 3.2 Quasi-Circulator Design

Fig. 3.2 shows the schematic of the proposed active quasi-circulator (QC). In a QC, power is transferred unidirectionally between two pairs of ports, and the third pair is isolated. That is, the signal flow is described by the S-parameter matrix:

\[
[S_{QC}] = \begin{bmatrix}
0 & 0 & 0 \\
S_{21} & 0 & 0 \\
0 & S_{32} & 0
\end{bmatrix},
\] (3.1)

where port 1 is the transmit signal (TX), port 2 is the antenna (ANT), and port 3 is the receive signal (RX) in a radio system. The isolation between ports 1 and 3 prevents desensitization of the receiver from strong TX signals, as well as protects the PA from unwanted load modulation.

The proposed QC achieves TX gain \((S_{21})\) using an integrated distributed amplifier (DA). While the output currents from the gain stages add constructively in the direction of the ANT port, the reverse wave from the second stage cancels with the first in the direction of the RX due to two 90 degree phase shift lines between the gain stage inputs and outputs. A high level of isolation is maintained across a wide frequency range through tuning of a shunt capacitor bank between the
Figure 3.2: Schematic of the tunable distributed active quasi-circulator.
amplifier stages to effect out-of-phase cancellation at multiple frequencies. Amplifier gains are also individually tuned to offset frequency-dependent line losses. Transistor non-reciprocity provides isolation from ports 2 and 3 to 1. Due to the distributed nature of the circuit, the PA efficiency is less than that of a tuned amplifier, and loading of the transmission lines by tuning capacitors incurs extra loss. However, supply-scaling of the two-stage amplifier as detailed in [70] offers as much as a 33% efficiency improvement.

3.2.1 Passive Element Design

In a DA, FET parasitic capacitances are absorbed into transmission lines to form lumped-element low-pass $T$-section constant-$k$ filters. To achieve a 50 $\Omega$ match at all ports, the transmission line characteristic impedance $Z_0$ must be greater than 50 $\Omega$. Furthermore, a larger $Z_0$ allows for larger device loading parasitic, which translates to higher power and gain of the DA. In this process, we find that parallel-stacked inductors (Fig. 3.3(a)) offer better parasitic and area characteristics than microstrip or coplanar-waveguide (CPW) lines below 10 GHz, with effective $Z_0$ up to 130 $\Omega$. For high-frequency cutoff of 10 GHz, the total series inductance and shunt capacitance per $T$-section stage are solved to be 1.59 nH and 637 fF, respectively [19]. The series inductor is simulated to have a maximum Q factor of 18 and 93 fF total parasitic shunt capacitance, leaving 451 fF shunt capacitance budget.

In order to provide separate dc biases to the amplifier stages and tunable capacitor banks, we adopt high-pass $T$-section filters as well. For a low-frequency cutoff of 1 GHz, a 3.98 nH shunt inductance and 1.59 pF series capacitance per stage are required. As detailed in [70], shunt inductor Q is of less critical importance compared to its parasitic capacitance. Thus, a narrow-width, low-Q spiral is designed for dc biasing, as shown in Fig. 3.1(b). The shunt inductor loads the $T$-section with only 26 fF additional shunt capacitance and has a maximum Q of 10. Series
Figure 3.3: Layout of the a) series and b) shunt spiral inductors for the distributed band-pass $T$-section.
capacitors are realized with interdigitated vertical natural capacitors (VNCAPs).

3.2.2 Active Element Design

Active QCs typically exhibit low power-handling capabilities due to limited transistor breakdown voltages [32–34]. In this design, we utilize a three-stack thick-oxide floating-body 112-nm length NFET amplifier as the power amplifier (PA) stage, allowing nominal drain dc voltage and ac voltage swing of greater than 4.5V. The cancellation stage is a cascode with dc voltage 3.0V since the reverse traveling wave from the PA stage cancels the voltage swing at the drain.

Thick-oxide FETs in this CMOS SOI process have simulated transit frequency $f_t$ upwards of 150 GHz. The common-source (CS) device of the PA stage is sized to meet 50-Ω image impedance for the low-pass $T$-section. The stacked-FET widths and their corresponding gate capacitances are chosen according to [71] for equal drain-source voltage distribution. Sizing of the CS transistor in the cancellation stage is done to perfectly cancel the reverse current from the PA stage at the center frequency of 6.3 GHz. The common-gate (CG) cascode device width is chosen to fill the shunt capacitance budget of the drain line. A small shunt VNCAP is added at the drain of the PA stage to ensure a 50-Ω match as well. Gate biases of each amplifier stage are individually controlled to maintain TX-RX isolation at various frequencies. The gain stages draw 26 and 75 mA of dc current, respectively, under class-AB bias.

3.2.3 Tuning Capacitor Design

For fine tuning of the inter-stage phase, NMOS varactors are used (Fig. 3.4(a)). Series-connecting four varactors in a stack allows for voltage swings of up to $16V_{pp}$ without breakdown. As shown in Fig. 3.5, the capacitance range of the varactor stack determines the isolation capability of the active QC since modulation of the capacitor by the RF deteriorates signal cancellation. We choose for this design
Figure 3.4: Schematics of the (a) varactor stack and (b) stacked-FET switched capacitor. $V_{\text{ctrl}}$ ranges from -1.5 to 1.5 V. $V_{sw,k}$ turns the $k^{th}$-bit capacitor on/off.

A stack of 50 $\mu$m $\times$ 0.23 $\mu$m varactors with overall simulated capacitance range of 56 fF.

To introduce additional shunt capacitance coarse tuning while keeping the varactor variation small, we also utilize a bank of stacked-FET switched capacitors. Similar to the amplifier stage, the switch stack is comprised of three 84 $\mu$m thick-oxide NFETs to support peak TX voltage swing, as shown in Fig. 3.4(b). For high TX-RX isolation up to 1 GHz away from the center frequency, a total of $\pm$400 fF capacitance tuning is provided with four binary-coded switched capacitors ranging from 400 to 50 fF, where the least significant bit (LSB) value is slightly less than the capacitance range of the varactor stack. The switched capacitor has less than 1 ns simulated switching time, with an on-state Q of 12. Tuning of the shunt element causes the image impedance of the overall T-section to deviate from $Z_0$, but due to the distributed nature of the artificial transmission line, the circuit is relatively insensitive to this mismatch.
Figure 3.5: Simulated large-signal TX-RX isolation over increasing shunt varactor capacitance range.

### 3.3 Noise Considerations

Although the tunable phase shifters in the input and output lines allow cancellation of the large TX signal, noise from active and resistive components is uncorrelated and can add in power. Thermal noise, or Johnson-Nyquist noise, is the dominant source of noise in MOSFETs and resistors at gigahertz frequencies and is approximately white, having a power spectral density (PSD) equal throughout frequency, with a Gaussian probability density function. Thus, the distributed active QC can be modeled as an additive white Gaussian noise (AWGN) channel, with the ANT-RX line being the main path of concern. As the channel current noise of the cancellation and PA stage are independent, the output noise powers combine at the LNA input. In addition, amplification of thermal noise from passives such as the input line termination and $T$-section losses through the amplifier stages leads to significant noise figure ($NF$) at the RX port. Simulated noise power magnitude ratios
for several noise sources in the active QC at 6 GHz are shown in Table 3.1. The main contributors to the RX $\text{NF}$ are passive losses, which are designed for minimum and unavoidable, and the gate line termination. Channel noise of the PA stage is larger than that of the cancellation stage as expected due to its higher current draw. While noise figure is an important issue that future design architectures would seek to mitigate, some STAR peer-to-peer applications can define high enough SNR link budget to overcome this limitation.

<table>
<thead>
<tr>
<th>Source</th>
<th>Normalized Noise Power</th>
</tr>
</thead>
<tbody>
<tr>
<td>Passive losses</td>
<td>0.410</td>
</tr>
<tr>
<td>Gate line termination</td>
<td>0.333</td>
</tr>
<tr>
<td>PA stage</td>
<td>0.102</td>
</tr>
<tr>
<td>Cancellation stage</td>
<td>0.076</td>
</tr>
</tbody>
</table>

### 3.4 Measurement Results

A chip microphotograph of the active quasi-circulator is shown in Fig. 3.6. The circuit occupies an area of 1.62 mm × 0.97 mm, which is significantly smaller than passive stand-alone circulators and includes the final PA. Additionally, the chip is built upon a trap-rich high resistivity (HR) substrate split, which offers improved passive element Q factors and substrate decoupling through the capture of parasitic surface conduction (PSC) charges. Measurement of the QC is performed with on-wafer probing, and no de-embedding of pad parasitics is done. The entire chip nominally consumes 415 mW of dc power.

Fig. 3.7 shows the measured S-parameters of the QC over swept switched capacitor codes. The isolation ($S_{31}$) exhibits nulls from 5.3 to 7.3 GHz (32% fractional bandwidth) ranging from 42 dB near the center frequency to 30 dB at the band edge, with a 20 dB cancellation bandwidth of 400 MHz. The varactor control voltage
is also swept, and the varactor stack offers a 150-MHz tuning range, larger than that of the LSB capacitor. TX gain ($S_{21}$), RX insertion loss ($S_{32}$), return losses ($S_{xx}$), and reverse isolation ($S_{12}$) vary by less than 0.5 dB with respect to shunt capacitance, and worst-case measurements and simulations are shown in Fig. 3.8. For comparison with passive circulators, TX suppression, defined as the ratio of TX leakage power at RX to output power at ANT, is $S_{31} - S_{21}$ and better than -45 dB at band center. The RX path noise figure is also measured to the 1 dB gain-compressed output power ($P_{-1dB}$) of the TX. Due to current noise from the active devices, as well as resistive noise from the gate line termination, there is substantial $NF$ at the receiver. This could be reduced in future designs by taking advantage of PA notch-filter degeneration [31] and termination-less DA topologies [64]. Simulation of the QC with an ideal termination-less gate match reduces the noise figure by 7.4 dB across the band. Decreasing the amplifier gate biases further reduces the noise at the cost of gain and operation bandwidth to a minimum value equal to the RX insertion
loss.

Figure 3.7: Measured TX-RX isolation over swept switched capacitor codes (solid) and varactor control voltage (dashed).

Measured and simulated CW power, gain, and efficiency of the TX path amplifier at 6.3 GHz are shown in Fig. 3.9. The active QC has $P_{-1dB}$ of 14 dBm and saturated output power $P_{sat}$ of 18 dBm. Peak drain efficiency (DE) and power-added efficiency (PAE) are 17% and 12%, respectively. Large-signal TX suppression is also shown and is optimized for a specific power level at the cost of small-signal isolation by varactor and gate bias tuning. Suppression peaks of 42 dB and 35 dB for output powers of 12.5 dBm and 15 dBm, respectively, are shown in dashed lines. As the QC approaches saturation, greater than 22 dB suppression is maintained.

Table 3.2 summarizes the performance of similar published circulators in silicon. Compared to the literature, the tunable distributed QC has the largest output power and fractional operating bandwidth with isolation greater than 30 dB under
Figure 3.8: Measured (solid) and simulated (dashed) worst-case S-parameters and noise figure of the QC.

Figure 3.9: Measured and simulated power, efficiency, and TX-RX suppression.
frequency-divison/full duplex (FDD/FD) operation.

3.5 Conclusion

A distributed active quasi-circulator is presented with stacked capacitive tuning elements for high power handling. The fabricated QC in 45-nm CMOS SOI achieves 10.5 dB peak gain and more than 30 dB small-signal TX-RX isolation over a 32% fractional tuning bandwidth (5.3-7.3 GHz). A peak saturated output power of 18 dBm is measured with 12% PAE.

Acknowledgement

This chapter is mostly a reprint of the material as it appears in IEEE Microwave and Wireless Components Letters, Nov. 2017, K. Fang; J. F. Buckwalter. The dissertation author was the primary author of this material. The authors would like to acknowledge Integrand Software for the use of EMX, MOSIS and IBM for chip fabrication, and the National Science Foundation for support through a CAREER research award and Graduate Fellowship. They also thank Peter Asbeck, Cooper Levy, Bagher Rabet, and Narek Rostomyan for technical discussions.
Table 3.2: Comparison To Other Published Circulators

<table>
<thead>
<tr>
<th>Ref.</th>
<th>TX-ANT Gain (dB)</th>
<th>TX-RX Iso. (dB)</th>
<th>BW/Tuning Range (GHz, %)</th>
<th>RX I.L. (dB$^1$)</th>
<th>$P_{sat}$ (dBm)</th>
<th>TX IIP3/ $IP_{1dB}$ (dBm)</th>
<th>$NF$ (dB)</th>
<th>Area (mm$^2$)</th>
<th>Technology</th>
</tr>
</thead>
<tbody>
<tr>
<td>[27]</td>
<td>-1.8</td>
<td>&gt;20</td>
<td>0.734-0.766 (4.3)</td>
<td>1.7</td>
<td>&gt;-7*</td>
<td>27.5/-</td>
<td>4.0</td>
<td>0.64</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[28]</td>
<td>-2.5</td>
<td>53</td>
<td>2.16 (Narrow)</td>
<td>2.5</td>
<td>7</td>
<td>17/5.5</td>
<td>–</td>
<td>–</td>
<td>Rogers 4350</td>
</tr>
<tr>
<td>[29]</td>
<td>-0.5</td>
<td>&gt;25</td>
<td>0.7-2.5 (112.5)</td>
<td>0.5</td>
<td>–</td>
<td>–</td>
<td>–</td>
<td>&gt;23.5</td>
<td>0.1-μm GaN</td>
</tr>
<tr>
<td>[30]</td>
<td>-5.7</td>
<td>20</td>
<td>24.2-24.6 (1.6)</td>
<td>5.7</td>
<td>6</td>
<td>–/–/9.5</td>
<td>10.8</td>
<td>0.72</td>
<td>.18-μm CMOS</td>
</tr>
<tr>
<td>[31]$^3,4$</td>
<td>–</td>
<td>&gt;8.0</td>
<td>0.3-1.5$^b$ (133)</td>
<td>0</td>
<td>15.5</td>
<td>–/–/–</td>
<td>11</td>
<td>7.20</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[32]</td>
<td>-9</td>
<td>30</td>
<td>23.0-25.0 (8.4)</td>
<td>8.5</td>
<td>–</td>
<td>–/-12</td>
<td>–</td>
<td>0.35</td>
<td>.18-μm CMOS</td>
</tr>
<tr>
<td>[33]</td>
<td>7</td>
<td>30</td>
<td>9.5-10.5 (10.0)</td>
<td>2.5</td>
<td>3.5</td>
<td>–/-6.1</td>
<td>6.5</td>
<td>0.72</td>
<td>.18-μm CMOS</td>
</tr>
<tr>
<td>[34]</td>
<td>20</td>
<td>20.5</td>
<td>20.7-25.0 (18.8)</td>
<td>-12</td>
<td>2.5</td>
<td>-11/-19.8</td>
<td>17</td>
<td>3.22</td>
<td>.18-μm CMOS</td>
</tr>
<tr>
<td>This QC</td>
<td>10.5</td>
<td>&gt;30</td>
<td>5.3-7.3$^e$ (31.7)</td>
<td>5.0</td>
<td>18</td>
<td>20/4.5</td>
<td>20</td>
<td>1.57</td>
<td>45-nm SOI</td>
</tr>
</tbody>
</table>

1 - negative value denotes gain; 2 - $P_{out}$ limited by RX linearity; 3 - full RX chain; 4 - frequency division duplex (FDD) operation; 5 - tuning range
Chapter 4

Efficient Linear Millimeter-Wave Distributed Transceivers in CMOS SOI

4.1 Distributed Amplifier and Front-End Design

4.1.1 RF CMOS SOI Process

The distributed circuits are designed using a 45-nm partially depleted CMOS SOI process with an eight layer metal stackup including two thick top Cu levels and both high density (HD-) and high Q (HQ-) metal-insulator-metal (MIM) capacitors (Fig. 4.1). Due to the thick top metals and high dielectric height, the attenuation per unit length for microstrip transmission lines (t-lines) is less than that of coplanar-waveguide (CPW) for the same line width from 10 to 70 GHz, as shown in Fig. 4.2. The microstrip t-line simplifies connection to the devices and maintains a continuous ground plane. DA gain per stage is maximized by designing the largest characteristic impedance relative to the distributed series resistance. Consequently, 2-μm wide M8
Figure 4.1: BEOL stackup of the 45-nm RF CMOS SOI process.

Figure 4.2: Simulated $Z_0$ and Q factor of the microstrip transmission line.
Al layer is used over M5 ground, resulting in less than 0.8 dB/mm loss up to 80 GHz. HQ-MIM capacitors are utilized in the implementation of high-pass filter networks and dc blocks, and a combination of HD-MIM and interdigitated vertical natural capacitors (VNCAP) are used for dc bypassing. This process is built on an HR substrate, which is leveraged in the design of low loss bias and gain peaking inductors.

The RF model for a 32×1.0 μm width double gate contacted NMOS transistor with 2× PC pitch has simulated unity current gain/maximum oscillation frequency \( f_t/f_{max} \) of 280/270 GHz at a class-A bias \( (V_G = 550 \text{ mV}) \) and nominal drain voltage \( V_D \) of 1.0 V when referred to M1. After \( RLC \) interconnection parasitics from M2 to M5 are extracted for the drain and source fingers, and from M2 to M3 for the gate, \( f_t/f_{max} \) reduces to 235/240 GHz (Fig. 4.3). The simulated minimum noise figure \( NF_{min} \) is 1.8 dB at 80 GHz for \( V_G \) of 450 mV and remains less than 1 dB up to 40 GHz. PMOS performance metrics at highly-scaled technology nodes are dominated by the interconnection parasitics, and a PMOS FET of the same size and bias condition has equal \( f_{max} \) of 240 GHz and slightly lower \( f_t \) of 190 GHz post-extraction. \( NF_{min} \) of the PMOS device at 80 GHz is also consistent with that of NMOS at 1.8 dB.

4.1.2 NMOS SSDA

Conventional distributed amplifier operation absorbs transistor parasitic capacitances into lumped-element low-pass constant-\( k \ T \)-section filters [3], and multiple filter sections are cascaded to form artificial transmission lines connecting the gates and drains of each gain stage. Assuming the device \( f_t \) is much greater than the frequency range, the bandwidth of the DA is determined by the image impedance and cutoff frequency of the filter sections [19]. For a characteristic impedance \( Z_0 \) of 50Ω and low-pass cutoff of 80 GHz, each filter section is designed to have a total
Figure 4.3: Simulated $H_{21}$ and $G_{umx}$ (maximum unilateral power gain) at $|V_G| = 550$ mV and $NF_{min}$ at $|V_G| = 450$ mV of a 32×1.0 μm (a) NMOS and (b) PMOS FET for $|V_D| = 1.0$ V.
Figure 4.4: Schematics of the designed 6-stage (a) NMOS and (b) hybrid SSDAs.
series inductance of 200 pH and shunt capacitance of 80 fF. A 352-μm length section of microstrip provides the necessary inductance along with 27-fF distributed shunt capacitance, resulting in a remaining capacitance budget of 53 fF. However, the low-pass characteristic of the conventional DA stage necessitates a shared voltage supply level across all devices, which leads to poor power efficiency in the early stage transistors as the drain voltage swings across only a fraction of the loadline.

The supply-scaling method presented in [70] introduces independent drain biasing to modulate the load impedance at each stage, reducing the wasted voltage headroom and significantly improving DE and PAE. As shown in Fig. 4.4(a), a 50-Ω $Z_0$ high-pass constant-$k$ $T$-section is integrated around both the drain and gate low-pass filter sections to allow independent biases to be fed while maintaining wideband impedance and phase matching. For a low-frequency cutoff of 10 GHz, 160 fF total series capacitance and 400 pH shunt inductance are required per stage. Since the $Q$ of the shunt inductor is less critical than its parasitic capacitance contribution towards the overall DA gain, a narrow width, high turn count, square spiral is designed for the high-pass section. The inductor utilizes the M6 and M7 thick Cu layers with no patterned metal shield underneath. Fig. 4.5 shows the simulated EM [57] characteristics of the designed inductor with the low resistivity (LR) substrate model from previous versions of the process ($\rho = 13.5 \, \Omega \cdot \text{cm}$) and the current HR iteration ($\rho = 2.5 \, \text{k}\Omega \cdot \text{cm}$). The HR substrate improves the parasitic shunt capacitance by 0.5 fF and the $Q$ by up to 0.8. HQ-MIM capacitors are used to realize 160 fF series capacitances (320 fF at either termination), and their frequency responses are plotted in Fig. 4.6. Fig. 4.7 shows the additional losses incurred by the described high-pass filter passives per section, relative to ideal components. The overall thru loss of each band-pass $T$-section filter excluding active device conductances is less than 1 dB from 25-75 GHz.

In a conventional common-source (CS) CMOS gain stage, the gate-drain capacitance $C_{gd}$ not only reduces the $f_{\text{max}}$ of the transistor, but also leads to potential
Figure 4.5: Simulated Q factor and shunt capacitance of the designed 400 pH spiral inductor on HR and previous LR substrates.

Figure 4.6: Simulated Q factors and series capacitances of the 160 and 320 fF HQ-MIM capacitors.
instability due to coupling of the transmission lines of a DA. Cascoding the CS FET with a common-gate (CG) stage improves the input-output isolation and mitigates the Miller feedback effect. The cascode cell also provides parasitic capacitance matching for the gate and drain lines through independent sizing of the CS/CG transistors, as well as increased input and output impedances for lower resistive loading losses. Improved performance is realized by tapering the FET width in the CS and CG stages to 32-μm and 56-μm, respectively, in the NMOS SSDA design to achieve 80 fF total shunt capacitance for the band-pass \( T \)-sections. Input and output capacitances are flat within the operation frequency range, as shown in Fig. 4.8, enabling wideband matching. The gate of the cascode device is carefully ac decoupled through closely placed VNCAPs with M2 to M4 fingers and large shunt resistance to limit the parasitic inductance seen from the bias path.

In theory, the gain of a DA increases linearly with the number of stages while
Figure 4.8: Simulated input and output capacitances of the NMOS and PMOS cascode gain stages.

Figure 4.9: Simulated NMOS DA gain versus number of stages $N$. 

66
bandwidth is maintained by the artificial transmission line cutoff(s). Attenuation of the input and output signals throughout each section limits the achievable gain, however, as additional stages bring diminishing gain returns [55]. On the other hand, increasing the number of independent supply biases facilitates finer voltage scaling and enhances the theoretical power efficiency improvement of the SSDA relative to that of a conventional distributed amplifier by a factor of $2N/(N+1)$, where $N$ is the number of scaled supplies, assuming class-A bias [70]. The number of DA sections is swept in Fig. 4.9, and a 6-stage amplifier is chosen for good GBW to chip size ratio. The effectiveness of the supply-scaling methodology is reliant on a large range of operational drain voltage levels, which is limited in real devices by transistor knee voltage on the low side and breakdown voltage on the high side. Previous measurement data of the 45-nm NMOS FETs has shown reliable operation for $V_D$ of up to 1.2 V per device. Fig. 4.10 plots the simulated $S_{21}$ of the 6-stage DA for uniform cascode drain voltages ranging from 1.5 to 2.4 V, and it can be seen that the gain varies by less than 1.5 dB throughout the pass-band bandwidth.

At low frequencies, the cascode topology displays excellent gain and noise characteristics, but the parasitic capacitance at the interstage node deteriorates the high-frequency performance as the lower effective impedance leaks signal power to ac ground and disturbs the self-circulation of CG channel noise. An interstage inductor can be added between the drain and source of the CS/CG FETs to resonate the associated capacitances, though care must be taken to avoid excessive peaking that could lead to instability and bandwidth reduction. Fig. 4.11 shows a high-frequency model for the cascode gain cell, including dominant capacitances and interstage inductor and assuming input/output parasitics are absorbed into source/load impedances of $Z_0/2$. The transfer function for this model is given by

$$H(s) = \frac{v_{out}}{v_{in}}(s) = \frac{g_{m,CG}Z_0 (g_{m,CS} - sC_{gd,CS})}{2 (g_{m,CG} + sC_{gs,CG}) (1 + s^2C_{gd,CS}L_i) + sC_{gd,CS}}.$$

(4.1)
Figure 4.10: Simulated NMOS DA gain sweeping uniform $V_{DD}$ from 1.5 to 2.4 V, with cascode gate voltage $V_{Cas} = V_{DD}/2 + 0.55$ V.

Figure 4.11: Small-signal and equivalent noise model of a CMOS cascode gain stage with interstage inductance and input/output capacitances absorbed (gray) into artificial transmission lines.
Note that the ideal transfer function does not include the band-pass filter behavior of the SSDA $T$-section. Nevertheless, it can be seen that increasing $L_i$ contributes gain peaking near the high-frequency cutoff to help offset the filter and cascode stage roll-off. Interstage inductance has a similar effect on the cascode high-frequency noise figure. Considering again the gain cell model with FET channel noise included, the equivalent input-referred noise voltage can be expressed as $v_{eq}(s) = v_n(s) + i_n(s) \cdot Z_0/2$, where $v_n$ and $i_n$ are the series and shunt input-referred noise voltage and current, respectively, found in (4.2) at the bottom of the page. The theoretical noise factor for this model $F(s) = 1 + \frac{v_n^2(s)}{v_{eq}^2(s)}$, where $v_n$ is the noise voltage attributed to the source impedance $Z_0/2$, also decreases with larger $L_i$, though additional interstage inductance leads to excess ripple and an ultimate degradation of the 3 dB bandwidth. Simulated $S_{21}$ and $NF$ of the NMOS SSDA with various values of cascode interstage inductance are shown in Fig. 4.12. An $L_i$ value of 12 pH contributes gain peaking of approximately 1.2 dB and lowers the in-band $NF$ by 0.2 dB without significantly reducing the bandwidth of the amplifier. The interstage inductance is realized using 8-$\mu$m wide CPW with M6 signal and M5 ground plane to avoid extra parasitic capacitance from top metal vias required to transition to microstrip line ($Q$ factor is not critical for such small inductor values). The actual inductance is slightly larger than the chosen $L_i$ to account for the shunt capacitance of the CPW itself.

\begin{align*}
v_n(s) &= \frac{1}{G(s)} \left( g_{m,CG}i_{d,CS} - sC_{gs,CG}i_{d,CG} \right) \times \\
& \quad \left( g_{m,CG} + s^2 g_{m,CG}L_iC_{gd,CS} + s \left( C_{gd,CS} + C_{gs,CG} + s^2 L_iC_{gd,CS}C_{gs,CG} \right) \right), \\
i_n(s) &= \frac{sC_{gd,CS}}{G(s)} \left( sC_{gs,CG}i_{d,CG} - g_{m,CG}i_{d,CS} \right) \times \\
& \quad \left( g_{m,CG} + sC_{gs,CG} + g_{m,CS} \left( sL_i \left( g_{m,CG} + sC_{gs,CG} \right) - 1 \right) \right), \\
G(s) &= g_{m,CG} \left( g_{m,CS} - sC_{gd,CS} \right) \left( g_{m,CG} + sC_{gs,CG} \right),
\end{align*}

(4.2)
4.1.3 Hybrid SSDA

The independent biases incorporated by the band-pass SSDA enable the possibility of integrating PMOS FET gain stages in conjunction with standard NMOS. As previously mentioned, PMOS devices in advanced technology nodes exhibit promising large-signal characteristics at mm-wave frequencies deriving from their potentially higher voltage stress handling capabilities for the same mean-time-to-failure [50]. This is due to a lower rate of impact ionization and larger $Si-SiO_2$ energy barrier for holes, which result in better resistance to hot carrier injection (HCI) and time-dependent dielectric breakdown (TDDB) effects. Concurrently, short channel lengths lessen the disparity in transistor speed caused by the difference in electron and hole mobility, and overall $f_{\text{max}}$ becomes largely determined by interconnection parasitics. Other analog performance metrics such as $g_m$ and on-current are comparable as well. As a consequence, PMOS amplifiers can achieve higher output powers.
while maintaining similar gain compared to NMOS. It has been shown that a PMOS device has equal $f_{\text{max}}$ to that of NMOS for the same layout, FET size, and bias voltage magnitudes, and similarly, the input/output capacitances are approximately equal as well (Fig. 4.8). Thus, PMOS FETs of the same width and configuration can simply replace the CS/CG transistors with no additional changes in topology, aside from minor tuning of the optimal interstage inductance. Slightly higher $|V_0|$ is used to bias the PMOS cascode to enhance the current draw and gain.

Figure 4.13: Simulated gain, $P_{\text{sat}}$, PAE, and $OIP_3$ at 50 GHz versus number of PMOS stages in a 6-stage hybrid SSDA.

To take advantage of the higher breakdown, the later gain stages, which experience the largest output voltage swings, in a second SSDA are substituted with PMOS cascode cells to form a hybrid CMOS amplifier for comparison with the full NMOS DA (Fig. 4.4(b)). Since the DA is voltage-limited (i.e. the available current swing is not fully utilized at the load) due to multiple stages for GBW, an increase in voltage supply by a ratio of $\alpha$ raises the theoretical saturated output power ($P_{\text{sat}}$)
by $\alpha^2$ and the drain efficiency by $\alpha$. Large-signal simulations at 50 GHz for varying number of PMOS stages in a 6-stage hybrid SSDA are shown in Fig. 4.13 with a final PMOS cascode supply voltage of -2.6 V. It can be seen that for up to three PMOS gain stages, the increased output power leads to larger maximum PAE before it begins to deteriorate due to lower gain.

Another advantage that arises from the hybrid SSDA design is a significant improvement in large-signal amplitude-phase (AM-PM) and third-order intermodulation (IM3) nonlinearities. As shown in Fig. 4.14, the same sign swing on a CS gate causes opposite sign input capacitance variation, the dominant source of AM-PM in class-A/AB power amplifiers, in NMOS and PMOS transistors. Since the gate swing of each gain stage has the same phase relative to its drain, the NMOS and PMOS phase nonlinearities cancel, while the constructive interference of output signals is maintained. In addition, PMOS devices have lower magnitude third-order nonlinearities.
Figure 4.15: Simulated $OIP_3$ of the hybrid SSDA (solid) and $g_{m3}$ (dotted) versus gate voltage bias for the NMOS (blue) and PMOS (red) cascode gain cells.

$g_{m3}$ compared to NMOS, leading to larger output third-order intercept point ($OIP_3$). Deeper bias into saturation due to higher $|V_D|$ and $|V_G|$ for PMOS gain stages results in further improved linearity performance for the same dc current. Fig. 4.15 shows the simulated $OIP_3$ behavior of the 6-stage hybrid SSDA with respect to NMOS/PMOS gate bias (holding the other constant), and Fig. 4.13 shows the increase in linearity as NMOS stages are replaced with PMOS. Finally, interstage inductance is simulated to provide up to a 0.3 and 0.1 dB increase in $OIP_3$ and $P_{sat}$, respectively, at 80 GHz.

4.1.4 Distributed TX/RX Front-End

As technology and data rates scale, the demand for transceiver front-ends that span multiple frequency bands has increased. To keep the chip size small, circuit area is a critical design parameter. While distributed circuits evidently provide
wide operational bandwidth, the large passive T-sections often restrict their usage in mm-wave phased arrays. Here, an integrated distributed transceiver front-end is presented using a shared antenna line topology for area-efficient power (PA) and low-noise amplification (LNA) from $K_u$ to $V$ band (Fig. 4.16). Targeting the same bandwidth as the NMOS and hybrid SSDAs, the DTFE utilizes the previously designed passive components for its band-pass constant-$k$ T-sections. The input TX and output RX lines are thus identical to those of the standalone SSDAs, with capacitive loading by TX CS and RX CG NMOS FETs of 32 and 56 $\mu$m, respectively. For the shared antenna line, which acts simultaneously as the output transmission line for the PA and input line for the LNA, the shunt parasitic capacitance budget is shared by the TX CG and RX CS. While the low-frequency gain of a cascode is independent of the CG device $g_m$ (evident in (4.2)), this is not true at mm-wave frequencies, and there exists a trade-off between the PA and LNA gain due to the allowable combined size of the antenna line interfacing FETs. This is shown in Fig. 4.17 for combinations of TX CG and RX CS transistor widths. Since the gain of
both paths is less than that of the stand-alone DA, only NMOS cascode cells are used in the DTFE for the highest achievable efficiency and noise figure. It is observed that the LNA gain remains a stronger function of CS size than TX of CG, and final widths of 20 and 36 μm, respectively, are chosen for in-band RX gain of 9 dB and low TX ripple up to 76 GHz. Interstage inductors for the PA and LNA cascodes are optimized for gain and NF bandwidth, respectively.

![Simulated PA/LNA gain in TX/RX mode](image)

**Figure 4.17:** Simulated PA/LNA gain in TX/RX mode ($V_G = 550/450$ mV, $V_D = 2.4/1.8$ V), sweeping combinations of TX CG and RX CS FET widths.

Care must be taken to protect the LNA devices from the PA output in the shared antenna line. In a TDD system, the anticipated multiplexing scheme for most global mm-wave communication standards, this is most easily accomplished by disconnecting the RX chain from the TX during transmission. Due to parasitic capacitance budget constraints, a high power handling isolating switch is not feasible at the antenna line interface. Instead, a 320 fF HQ-MIM capacitor in series with the RX CS gate input provides dc blocking with a small degradation in input capacitance.
Figure 4.18: Simulated settling times of the DTFE LNA gain stages with zero (red) and -5 dBm (blue) input in response to switched $V_{dd}$.

and conductance. Then, the LNA drain biases are grounded whilst the PA is turned on, and vice versa. The FET channel cutoff protects the devices from HCI and TDDB effects, even for high ac voltage swings of up to $2V_D$, but the switching speed must be investigated. In the band-pass distributed front-end design, the relatively small shunt inductor associated with higher low-frequency cutoff allows quick turn-on/off of the drain current. Fig. 4.18 shows the simulated switching speed of the LNA gain stages with ideal switches, and 2% on and off settling times of 400 and 250 ps, respectively, are observed. Design of the supply voltage switch lies outside the scope of this paper, but high voltage handling CMOS SOI switches with ~1 ns switching time have been demonstrated [72], which falls well within projected guard times for 60 GHz TDD standards [73]. Turn-on/off times for the PA and LNA are similar.
4.2 Measurement Results

4.2.1 Small-Signal Performance

S-parameters for the fabricated distributed amplifiers and transceiver front-end are measured using a Keysight E8361A 67-GHz two-port vector network analyzer (VNA) with Keysight N5260A 110-GHz mm-wave extender controller and test head modules. The SSDAs and DTFE are probed on-wafer using GGB Industries 110-GHz GSG and 67-GHz GSGSG probes, respectively, with unused ports per measurement terminated by a series connection of 67-GHz attenuator and RF 50-Ω termination for high-frequency matching. Other transmission line terminations are provided by on-chip polysilicon resistors. Chip microphotographs of the designed circuits are shown in Fig. 4.19. The NMOS and hybrid SSDAs each have a total circuit area of 2.80 mm × 0.52 mm (2.36 mm × 0.34 mm excluding pads), and the DTFE measures 2.80 mm × 0.65 mm (2.36 mm × 0.48 mm excluding pads). Standard short-open-load-thru calibration on a GGB Industries CS-2 calibration substrate is done to move the measurement reference plane to the chip pads. Furthermore, short-open de-embedding of the RF pads is performed via characterization of GSG(SG) test structures.

Fig. 4.20 plots the measured and simulated S-parameters of the NMOS and hybrid SSDAs and DTFE. The full NMOS amplifier achieves a peak small-signal gain of 13 dB with a 3 dB pass-band bandwidth from 10-82 GHz (72 GHz), corresponding to a GBW product of 322 GHz. The measured gain ($S_{21}$) is in good agreement with simulations, and the low-frequency ripple is flatter than expected. Input and output return losses ($S_{11}, S_{22}$) are better than 10 dB from 11-86 GHz, reverse isolation ($S_{12}$) is greater than 25 dB, and the amplifier is unconditionally stable across the entire measured frequency range. Six independent supply voltages ranging from 1.5 to 2.4 V consume a total of 182 mW of dc power nominally.

The three NMOS gain stages of the hybrid SSDA are biased identically to the...
Figure 4.19: Chip microphotographs of the (a) NMOS and (b) hybrid SSDAs (2.80 mm x 0.52 mm) and (c) distributed transceiver front-end (2.80 mm x 0.65 mm).
Figure 4.20: Measured (red) and simulated (black) $S$-Parameters and noise figure (blue) of the (a) NMOS and (b) hybrid SSDAs and DTFE (c) PA and (d) LNA paths.
first half of the NMOS SSDA, while the later PMOS stages have nominal $V_G$ of -650 mV and $V_D$ from -2.2 to -2.7 V to enhance the lower current draw of the PMOS FETs and allow higher final voltage swing. The resulting dc power consumption totals 164 mW. The measured $S_{21}$ of the hybrid SSDA is 12.6 dB with 3 dB bandwidth from 11-83 GHz (GBW of 307 GHz), which is slightly less than that of the full NMOS amplifier. $S_{11}$ and $S_{22}$ are less than -10 dB from 10-85 GHz, $S_{12}$ is below -26 dB, and the amplifier is unconditionally stable throughout.

The DTFE small-signal performance is characterized for both the TX and RX path up to 70 GHz, but the 67-GHz GSGSG probes are limited in loss and connectivity to the VNA mm-wave extender for upper V band $S$-parameters. Nevertheless, both modes match the simulated behavior reasonably well, and the high-frequency cutoffs can be extrapolated from the measured data. For the PA path with LNA turned off ($V_{D,LNA}$ grounded), the small-signal gain is 11.7 dB with a 3 dB projected pass-band bandwidth from 12-76 GHz (64 GHz). The TX-RX isolation ($S_{31}$) is less than -8 dB across the band, and the signal path can potentially be used for PA calibration and digital pre-distortion (DPD) schemes in lieu of directional couplers. With the PA off, the LNA has 9 dB gain ($S_{32}$) over a 11-77 GHz (66 GHz) bandwidth. The TX path consumes 193 mW of dc power from six voltage supplies ranging from 1.7 to 2.4 V. Since the LNA does not expect to produce high output swing, $V_G,LNA$ is set to 450 mV and all $V_D,LNA$ to 1.8 V for total power consumption of 88 mW, though the bias levels can be increased for slight $S_{32}$ gain improvement.

Group delays of the measured circuits are extracted in Fig. 4.21. The NMOS and hybrid SSDAs exhibit less than 20 ps variation from 28-80 GHz, and the DTFE PA/LNA from 35-69 GHz. Exceptional flatness of the group delay with respect to frequency enables ultra-wideband amplification of impulse signals without phase distortion.

Noise figures of the SSDAs and DTFE LNA are measured up to 50 GHz using the gain method. A Keysight E4448A 50-GHz spectrum analyzer is used to observe
the noise floor increase of the cascade of device-under-test (DUT) and a characterized Centellax TA2U50HA 2-50 GHz PA, from which the $NF$ is calculated by the Friis formula

$$NF_{DUT} = 10 \log (f_{DUT}) = 10 \log \left( \frac{f_{MEAS}}{g_{DUT}} - \frac{f_{PA}}{1} \right), \quad (4.3)$$

where $f_{MEAS}$ is the difference between measured noise and the amplified thermal noise floor, $f_{PA}$ is the noise factor of the TA2U50HA, and $g_{DUT}$ is the DUT gain. Minimum $NF$s for the NMOS and hybrid SSDAs and DTFE LNA are 5.3, 4.9, and 6.2 dB, respectively, and the $NF$ versus frequency curves agree well with simulations up to 50 GHz (Fig. 4.20). $NF$s of the SSDAs across the operating bandwidths can be marginally improved by decreasing the gate voltage magnitudes by approximately 0.1 V (at which the LNA is already biased).
4.2.2 Large-Signal Performance

Figure 4.22: Measured gain (dashed) and power-added efficiency (marker) of the SSDAs and DTFE PA/LNA modes at 50 GHz.

Large-signal measurements are performed at frequency intervals of 5-10 GHz across the operation bandwidths of the circuits. Up to 50 GHz, a Keysight E8257D 67-GHz analog signal generator provides the input signal to the Centellax pre-amplifier, and the DUT input and output power are monitored by Keysight N1911A power meters with Keysight 8487A 50-GHz power sensors. Measured power, gain, and efficiency at 50 GHz are shown in Fig. 4.22 for the NMOS and hybrid SSDAs and DTFE PA/LNA. The NMOS SSDA has a 1 dB gain-compressed output power $P_{1dB}$ of 13 dBm and $P_{sat}$ of 17.2 dBm, and peak PAE is 17.4%. The hybrid SSDA exhibits $P_{1dB}$ of 13.3 dBm, $P_{sat}$ of 17.5 dBm, and peak PAE of 20.2%. Finally, the DTFE PA has $P_{1dB}$ of 11.5 dBm, $P_{sat}$ of 17 dBm, and peak PAE of 14.2% while the LNA shows $P_{1dB}$ of 7.5 dBm and $P_{sat}$ of 14 dBm.

From 50-75 GHz, a Quinstar QMM-series active quadrupler and QAM-series
Figure 4.23: Measured saturated output power (solid), 1 dB gain-compressed output power (dotted), and peak PAE (dashed) over the operating bandwidth.

A full band amplifier chain is utilized to drive the DUT, which outputs to a Keysight V8486A 50-75 GHz power sensor. Fig. 4.23 plots the measured and simulated $P_{\text{sat}}$, $P_{-1dB}$, and peak PAE of the circuits across the Ku to V frequency bands. 3 dB power bandwidths of the SSDAs and DTFE PA/LNA are all larger than 60 GHz. It can be seen that the hybrid SSDA achieves greater saturated output power and PAE than the NMOS SSDA at all frequency points across the band. Further increase of the final PMOS stage supply voltage magnitude up to 3.0 V enables 0.7 dB larger $P_{\text{sat}}$ before the onset of transistor breakdown effects is observed in the dc current drawn. On the other hand, the NMOS SSDA final stage supply can only be increased to 2.6 V. These results are corroborated by previous short-term reliability tests done in this technology, which indicated that PMOS cascodes were stable at higher voltages than NMOS. Thus, it is evident that the PMOS gain stages incorporated in the hybrid SSDA enable superior output power and efficiency performance at mm-wave.
Although the DTFE PA does not employ PMOS FETs and is impacted by the shared line LNA, it still maintains a relatively high $P_{sat}$.

![Two-tone linearity measurement of the SSDAs and DTFE LNA at 50 GHz.](image)

**Figure 4.24**: Two-tone linearity measurement of the SSDAs and DTFE LNA at 50 GHz.

Two-tone linearity measurements are conducted on the SSDAs and DTFE LNA using two analog signal generators supplying input tones 50 MHz apart, which are combined through a magic tee, and the DUT output captured on the spectrum analyzer with Keysight 11970V 50-75 GHz harmonic mixer for extended frequency range. Fig. 4.24 shows the linearity of the circuits at 50 GHz, and measured output third-order intercept points ($OIP_3$) of 21.3, 25.0, and 17.6 dBm are observed for the NMOS and hybrid SSDAs and DTFE LNA, respectively. $OIP_3$ values across the operation bandwidth are plotted in Fig. 4.25. The hybrid SSDA exhibits at least 2 dB greater $OIP_3$ than the NMOS amplifier at all frequencies. The DTFE LNA maintains better than 10 dBm $OIP_3$ across the band.

AM-PM nonlinearities of the SSDAs and DTFE PA are also measured using
Figure 4.25: Measured $OIP_3$ and 3 dB back-off phase distortion over the operating bandwidth.

Figure 4.26: AM-PM linearity measurement of the SSDAs and DTFE PA at 50 GHz.
Figure 4.27: Measurement setup for 5 GHz wideband 16-QAM modulation.

Figure 4.28: Measured EVM with respect to constellation peaks for 5 GHz 16-QAM signals centered at 42.5 (solid) and 47.5 (dashed) GHz.
the network analyzer. Power calibration of the VNA is performed at frequency intervals with either the Centellax or Quinstar pre-amplifier, and the phase difference in the large-signal S-parameter measurement is recorded. Fig. 4.26 shows the static AM-PM distortion of the circuits at 50 GHz. The hybrid SSDA exhibits flatter phase response than full NMOS for up to 17 dBm output, and both amplifiers have less than 12 degrees of total variation. Measured AM-PM values at 3 dB back-off for the SSDAs and DTFE PA across the band are shown in Fig. 4.25 as well, and it can be seen that the hybrid SSDA achieves superior phase linearity at all frequencies compared to the NMOS SSDA.

In order to demonstrate the feasibility of real ultra-wideband mm-wave signal amplification with the designed circuits, the instrument setup shown in Fig. 4.27 is used to generate and record 5 GHz bandwidth 16-QAM modulated data (20 Gb/s) centered at 42.5 and 47.5 GHz. A Keysight M8195A 65 GSa/s arbitrary waveform generator (AWG) produces the intermediate frequency (IF) modulated signal centered at 10 GHz with 5.4 dB peak-to-average power ratio (PAPR), which is up-converted by a Marki ML1-1050L double-balanced mixer with local oscillator (LO) frequency of 32.5/37.5 GHz. The DUT output is downconverted by another mixer through high-side injection LO of 46/51 GHz and captured by a Keysight DSO80604B 6-GHz real-time oscilloscope, and the channel is equalized using Keysight 89600 VSA software. Measured 16-QAM constellations for 42.5 and 47.5 GHz carriers are shown in Fig. 4.29, and recorded error vector magnitude (EVM) values are plotted versus peak output power in Fig. 4.28. The SSDAs and DTFE PA achieve less than 6% aggregate EVM referenced to constellation peak in either frequency band (modulation setup contributes 4.5% EVM), and the modulation symbol rate is limited only by the oscilloscope bandwidth and filter cutoff frequencies. The hybrid SSDA exhibits the lowest distortion in both bands up to 12.4 dBm peak output power.
Figure 4.29: Measured constellations for 5 GHz 16-QAM signals centered at (a) 42.5 and (b) 47.5 GHz for the (i) NMOS and (ii) hybrid SSDAs and (iii) DTFE P.A.
Table 4.1 summarizes the performance of the presented circuits alongside similar published distributed and wideband power and low-noise amplifiers in silicon. The NMOS SSDA achieves the largest reported GBW product of any CMOS SOI distributed power amplifier, and the hybrid SSDA leads all silicon DAs in peak power efficiency, to the author’s knowledge. Compared to other wideband PAs, the SSDAs and DTFE exhibit similar $P_{\text{sat}}$, $P_{-1dB}$, and PAE, and $OIP_3$ and $NF$ values are comparable to those of reported V band LNAs. The presented circuits achieve 3× greater data rates than [74], with the potential to further increase modulated bandwidths. Figures of merit $FOM_{1/2}$ are proposed for the comparison of ultra-wideband TX and RX amplifiers:

$$FOM_1 = \frac{P_{\text{sat}} \times \text{PAE} \times \text{BW}}{P_{\text{dc}}} \quad \text{and} \quad FOM_2 = \frac{OIP_3 \times \text{BW}}{(F - 1) \times P_{\text{dc}}},$$

where $F$ is the noise factor, and powers are expressed in mW. The DTFE PA/LNA exhibit higher $FOM$s than other reported amplifiers, excepting [75], and the hybrid SSDA outperforms all circuits by far with $FOM_{1/2}$ values of 499 and 66.4 GHz, respectively. The demonstrated results support the SSDA as a viable design architecture for wideband power and low-noise amplification.

### 4.3 Conclusion

This chapter has introduced the usage of PMOS transistors in mm-wave CMOS SOI distributed amplifiers to further increase the power efficiency improvements provided by the supply-scaling technique while enhancing the achievable output power and linearity. Hybrid and corresponding NMOS SSDAs are fabricated in a 45-nm RF CMOS SOI technology, and the demonstrated circuits achieve bandwidths greater than 70 GHz and peak saturated output powers of 17.5 dBm with 20.2% PAE. A distributed transceiver front-end utilizing a shared antenna line topol-
ogy is also presented for use in ultra-wideband radio communications systems. The front-end PA exhibits an output power of 17 dBm at 50 GHz, and the LNA has minimum $NF$ of 6.2 dB. High data rates of 20 Gb/s employing 16-QAM modulation are shown with less than 6% EVM referenced to constellation peaks.

Acknowledgement

This chapter is mostly a reprint of the material submitted to IEEE Transactions on Microwave Theory and Techniques, K. Fang; J. F. Buckwalter. The dissertation author was the primary author of this material. The authors would like to acknowledge Integrand Software for the use of EMX, MOSIS and GlobalFoundries for chip fabrication, and the National Science Foundation for support through a CAREER research award and Graduate Fellowship. Additionally, the support of the Office of Naval Research is appreciated. They also thank Peter Asbeck, Cooper Levy, Bagher Rabet, and Narek Rostomyan for technical discussions and measurement assistance.
Table 4.1: Comparison To Published Silicon Distributed and Wideband Amplifiers

<table>
<thead>
<tr>
<th>Reference</th>
<th>Gain (dB)</th>
<th>BW (GHz)</th>
<th>$P_{sat}$ (dBm)</th>
<th>$P_{-1dB}$ (dBm)</th>
<th>PAE (%)</th>
<th>$OIP_3$ (dBm)</th>
<th>NF (dB)</th>
<th>$FOM_{1/2}$ (GHz)</th>
<th>Area (mm²)</th>
<th>Technology</th>
</tr>
</thead>
<tbody>
<tr>
<td>[23]</td>
<td>11</td>
<td>5-90</td>
<td>15</td>
<td>12</td>
<td>6.8</td>
<td>15.5</td>
<td>4.8</td>
<td>87.0/7.11</td>
<td>1.28</td>
<td>120-nm SOI</td>
</tr>
<tr>
<td>[60]</td>
<td>9</td>
<td>4-86</td>
<td>11</td>
<td>10</td>
<td>7.7</td>
<td>15.5</td>
<td>5.5</td>
<td>88.3/12.7</td>
<td>1.05</td>
<td>120-nm SOI</td>
</tr>
<tr>
<td>[76]</td>
<td>22</td>
<td>dc-65</td>
<td>13*</td>
<td>10*</td>
<td>10.2</td>
<td>–</td>
<td>6.9</td>
<td>136/–</td>
<td>0.93†</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[61]</td>
<td>14</td>
<td>dc-74</td>
<td>–</td>
<td>3.2</td>
<td>2.4</td>
<td>–</td>
<td>–</td>
<td>–/–</td>
<td>1.72</td>
<td>90-nm CMOS</td>
</tr>
<tr>
<td>[17]</td>
<td>10</td>
<td>dc-110</td>
<td>17.5</td>
<td>16.7</td>
<td>13.2</td>
<td>–</td>
<td>–</td>
<td>–/–</td>
<td>2.18</td>
<td>130-nm SiGe</td>
</tr>
<tr>
<td>[70]</td>
<td>12</td>
<td>14-105</td>
<td>17.0</td>
<td>14.9</td>
<td>12.6</td>
<td>–</td>
<td>–</td>
<td>–/–</td>
<td>1.51</td>
<td>90-nm SiGe</td>
</tr>
<tr>
<td>[21]</td>
<td>8.5</td>
<td>dc-135</td>
<td>–</td>
<td>10</td>
<td>7.9</td>
<td>–</td>
<td>5.7</td>
<td>–/–</td>
<td>0.36</td>
<td>55-nm SiGe</td>
</tr>
<tr>
<td>[74]</td>
<td>20.8</td>
<td>29-57</td>
<td>16.6</td>
<td>13.4</td>
<td>24.2</td>
<td>–</td>
<td>–</td>
<td>–/–</td>
<td>0.16†</td>
<td>28-nm CMOS</td>
</tr>
<tr>
<td>[75]</td>
<td>18.6</td>
<td>22-33</td>
<td>–</td>
<td>3</td>
<td>–</td>
<td>12.9</td>
<td>4.5</td>
<td>–/6.55</td>
<td>0.46†</td>
<td>180-nm SiGe</td>
</tr>
<tr>
<td>[66]</td>
<td>12.3</td>
<td>36-92</td>
<td>17.2</td>
<td>14.4</td>
<td>9.2</td>
<td>–</td>
<td>–</td>
<td>100/–</td>
<td>0.15†</td>
<td>130-nm SiGe</td>
</tr>
<tr>
<td>[77]</td>
<td>17</td>
<td>46-63</td>
<td>–</td>
<td>–1</td>
<td>–</td>
<td>–</td>
<td>4.4</td>
<td>–/–</td>
<td>0.59</td>
<td>90-nm CMOS</td>
</tr>
<tr>
<td>[78]</td>
<td>16.3</td>
<td>49-74</td>
<td>23.2</td>
<td>19.6</td>
<td>10</td>
<td>–</td>
<td>–</td>
<td>28.2/–</td>
<td>2.04</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[79]</td>
<td>7</td>
<td>24-54</td>
<td>–</td>
<td>2</td>
<td>3.5</td>
<td>–</td>
<td>4</td>
<td>–/–</td>
<td>0.15†</td>
<td>40-nm CMOS</td>
</tr>
<tr>
<td>[80]</td>
<td>13</td>
<td>40-67</td>
<td>13.3</td>
<td>12</td>
<td>16</td>
<td>–</td>
<td>–</td>
<td>88.8/–</td>
<td>0.06†</td>
<td>28-nm CMOS</td>
</tr>
<tr>
<td>[81]</td>
<td>20</td>
<td>42-66</td>
<td>20.6</td>
<td>17.6</td>
<td>20.3</td>
<td>–</td>
<td>–</td>
<td>98.8/–</td>
<td>0.43†</td>
<td>90-nm CMOS</td>
</tr>
<tr>
<td>[82]</td>
<td>17.7</td>
<td>55-90</td>
<td>5.5</td>
<td>3.5</td>
<td>5.4</td>
<td>–</td>
<td>–</td>
<td>–/–</td>
<td>0.37</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>[83]</td>
<td>10</td>
<td>2-16</td>
<td>18.5</td>
<td>15.5</td>
<td>17</td>
<td>22</td>
<td>–</td>
<td>92.6/–</td>
<td>0.82</td>
<td>130-nm CMOS</td>
</tr>
<tr>
<td>[84]</td>
<td>18.5</td>
<td>63-93</td>
<td>–</td>
<td>2.5</td>
<td>6.5</td>
<td>–</td>
<td>5.5</td>
<td>–/–</td>
<td>0.24</td>
<td>65-nm CMOS</td>
</tr>
<tr>
<td>NMOS DA</td>
<td>13</td>
<td>10-82</td>
<td>17.2</td>
<td>13</td>
<td>17.4</td>
<td>21.3</td>
<td>5.3</td>
<td>361/22.3</td>
<td>0.8†</td>
<td>45-nm SOI</td>
</tr>
<tr>
<td>Hybrid DA</td>
<td>12.6</td>
<td>11-83</td>
<td>17.5</td>
<td>13.3</td>
<td>20.2</td>
<td>25</td>
<td>4.9</td>
<td>499/66.4</td>
<td>0.8†</td>
<td>45-nm SOI</td>
</tr>
<tr>
<td>DTEP PA / LNA</td>
<td>11.7</td>
<td>12-76</td>
<td>17</td>
<td>11.5</td>
<td>14.2</td>
<td>–</td>
<td>–</td>
<td>–/236/–</td>
<td>1.13†</td>
<td>45-nm SOI</td>
</tr>
</tbody>
</table>

* Differential operation, † Area w/o pads
Chapter 5

Conclusions

This dissertation presents the analysis, design, and measurement results of distributed circuits in integrated silicon technologies for ultra-wideband amplification in wireless communications systems.

The first portion of this dissertation presents the history of traveling-wave and distributed amplifiers and describes the landscape for modern DA designs. Previous novel works and state-of-the-art are discussed, and important performance metrics are presented for modern UWB system architectures.

The second portion of this dissertation presents an X- to W-band (14-105 GHz) band-pass distributed amplifier implemented in a 90-nm SiGe BiCMOS process. This circuit introduces the novel supply-scaling technique for efficiency enhancement in UWB mm-wave DAs. The design methodology performs load pulling of individual gain stages for wideband efficiency improvement and is analyzed from the perspective of forward and reverse traveling waves at subsequent output nodes. Compared to analogous impedance tapering techniques, supply-scaling is demonstrated to have distinct advantages in bandwidth, mm-wave gain and output power, and design complexity. The fabricated SSDA exhibits record GBW and efficiency performance in SiGe BiCMOS, to the author’s knowledge. Further work would in-
clude integration of dc-dc converters for drain voltage supply stepping.

The third portion of this dissertation presents a C-band (5.3-7.3 GHz) distributed tunable active quasi-circulator implemented in a 45-nm CMOS SOI process. The QC incorporates supply-scaling to improve the efficiency of an integrated PA and allow high output power. Additionally, band-pass sections enable independent tuning of TX-RX isolation phase shifters. Stacked-FET switched capacitors and varactors form continuous tuning elements that are capable of handling TX power. Compared to other active circulators and quasi-circulators in silicon, the fabricated circuit demonstrates the largest power handling and fractional operating bandwidth with >30 dB isolation, to the author’s knowledge. To mitigate the larger noise figure inherent to the design, future work would include a termination-less input line and PA notch-filter degeneration, as described in the literature.

The final portion of this dissertation presents two Kᵤ to V-band SSDAs and a distributed transceiver front-end for power and low-noise amplification in UWB mm-wave phased arrays. The circuits are fabricated in a 45-nm RF CMOS SOI process with mm-wave optimized BEOL. Supply-scaling in CMOS SOI enables independent biasing of deeply-scaled PMOS gain stages, whose complementary phase properties with NMOS and lower IM3 distortion provide superior amplifier linearity. Larger voltage stress handling and comparable high-frequency gain of PMOS devices also results in greater output power and efficiency characteristics. Meanwhile, relatively low losses through the BEOL stack and high resistivity substrate enable the design of compact integrated UWB PA/LNA. The fabricated SSDAs and DTFE achieve the largest single-ended OIP₃, Pₚₐₙ and PAE of any mm-wave DA in silicon, to the author’s knowledge. Further work would include the design of integrated drain switches for full TDD operation and adaptive biasing for optimal linearity and efficiency performance.
Bibliography


[40] C.-L. Yang, S.-Y. Shu, and Y.-C. Chiang, “Analysis and design of a chip filter with low insertion loss and two adjustable transmission zeros using 0.18-μm


