Title
Low Power CMOS Circuit Techniques for Optical Interconnects and High Speed Pulse Compression Radar

Permalink
https://escholarship.org/uc/item/9ff3552t

Author
Li, Jun

Publication Date
2015

Peer reviewed|Thesis/dissertation
Low Power CMOS Circuit Techniques for Optical Interconnects and High Speed Pulse Compression Radar

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering (Electronic Circuits and Systems)

by

Jun Li

Committee in charge:

Professor James F. Buckwalter, Chair
Professor Peter Asbeck
Professor Gert Cauwenberghs
Professor Chung Kuan Cheng
Professor Dan Sievenpiper

2015
The dissertation of Jun Li is approved, and it is acceptable in quality and form for publication on microfilm and electronically:

Chair

University of California, San Diego

2015
DEDICATION

To my parents - Mrs. Fengxian Liu and Mr. Yuanshan Li.
# TABLE OF CONTENTS

Signature Page ................................................. iii
Dedication ....................................................... iv
Table of Contents ............................................. v
List of Figures .................................................. viii
List of Tables ................................................... xi
Acknowledgements ............................................. xii
Vita ............................................................... xiv
Abstract of the Dissertation ................................. xv

## Chapter 1
Introduction: Energy Efficient Optical Transceiver and High Resolution Range Sensor ............................................. 1
  1.1 Monolithic Energy Efficient Transmitter ..................... 3
  1.2 Scaling Trends for Silicon Photonic Interconnects in CMOS SOI and FinFET Process .......................... 3
  1.3 High Speed Analog Radar Signal Processor With Offset Calibration ........................................ 4
  1.4 Radar Signal Processor With an IF Correlation Technique 5
  1.5 Dissertation Organization .................................. 5

## Chapter 2
Energy Efficient Transmitter for Optical Interconnects .... 7
  2.1 Silicon Photonic Transmitter ................................ 7
    2.1.1 High Speed Modulator .................................. 7
    2.1.2 Modulator Driver ......................................... 11
  2.2 Measurement Results ...................................... 13
  2.3 Conclusions ................................................ 16

## Chapter 3
Scaling Trends for Silicon Photonic Interconnects in CMOS SOI and FinFET Process ...................................... 17
  3.1 WDM Link for Optical Interconnects ....................... 17
    3.1.1 Link Budget ............................................. 18
  3.2 Silicon Photonic Devices .................................. 20
    3.2.1 Micro-Ring Modulator .................................. 21
    3.2.2 High Speed Photodetector ............................... 22
    3.2.3 Wavelength Multiplexing and De-Multiplexing .... 22
    3.2.4 Tuning and control of ring resonator devices .... 23
3.2.5 Photonic Component Scaling Trends .......................... 24
3.3 Driver and Receiver Circuit Design ............................. 25
  3.3.1 Modulator Driver ........................................... 25
  3.3.2 CMOS Push-Pull Amplifiers ................................ 27
  3.3.3 Transimpedance Amplifier Stage ........................... 28
  3.3.4 Transconductance Amplifier Stage ........................ 31
  3.3.5 Cascade of Transimpedance and Transconductance Stages
  ................................................................. 32
  3.3.6 CMOS Device Technology: 14-nm FinFET and
  28-nm CMOS SOI .................................................. 34
  3.3.7 Proposed Transceiver in 14-nm FinFET and 28-
  nm CMOS SOI ..................................................... 36
  3.3.8 Contribution of Photonic Elements ......................... 39
  3.3.9 WDM Optical Interconnect Energy-per-Bit ................. 40
3.4 Conclusions ...................................................... 44

Chapter 4 High Speed Analog Radar Signal Processor With Offset Cali-
  bration ....................................................................... 45
  4.1 Pulse Compression Radar System ................................ 45
  4.2 Trade-offs in the PCR System ................................... 47
  4.3 High Speed Analog Correlation Technique ..................... 49
    4.3.1 Variable Gain Amplifier (VGA) ............................. 52
    4.3.2 Wideband Analog Correlator ............................... 53
    4.3.3 Wide Range Delay Lock Loop (DLL) ....................... 54
    4.3.4 High Speed Analog-to-Digital Convertor (ADC) ......... 55
  4.4 Calibration Techniques for PCR System ....................... 57
    4.4.1 Digital-assisted Offset Calibration ....................... 58
    4.4.2 Template Alignment and Duty Cycle Distortion .......... 62
  4.5 Experimental Results ........................................... 64
    4.5.1 Wide Range Delay Lock Loop ............................. 66
    4.5.2 DC Offset Calibration ...................................... 68
    4.5.3 Misalignment And Duty Cycle Calibration ................ 68
  4.6 Conclusions ...................................................... 70

Chapter 5 3 Gb/s Radar Signal Processor With an IF-Correlation Tech-
  nique ....................................................................... 72
  5.1 Bidirectional System for PCR and Point-to-Point Com-
  munication ............................................................... 72
    5.1.1 Bidirectional System Frequency Plan .................... 74
    5.1.2 Hybrid Dual-Path PLL for Bidirectional System ....... 75
    5.1.3 IF-Correlation Techniques for PCR ...................... 76
    5.1.4 System specifications and the performance trade-off 77
5.1.5 Baseband Correlation and IF Correlation Techniques ........................................... 78
5.2 Circuit Implementation ............................................................................................... 84
  5.2.1 High Linear SPDT Switch ..................................................................................... 85
  5.2.2 PCR Transmitter and QPSK modulator ............................................................... 86
  5.2.3 IF Correlator and QPSK Demodulator ............................................................... 86
5.3 Measurements ........................................................................................................... 88
  5.3.1 Measurement Results in PCR Mode ................................................................. 90
  5.3.2 Measurement Results in Communication Mode ................................................ 92
5.4 Conclusions ............................................................................................................... 95

Chapter 6 Conclusions .................................................................................................... 97

Bibliography ..................................................................................................................... 99
# LIST OF FIGURES

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 1.1</td>
<td>High Speed Data Center and Pulse Compression Radar System.</td>
<td>2</td>
</tr>
<tr>
<td>Figure 2.1</td>
<td>(a) Cross section (b) Measured DC characteristics and (c) electrical small signal model of Micro-Ring modulator</td>
<td>8</td>
</tr>
<tr>
<td>Figure 2.2</td>
<td>A low-power, reverse-biased PMR modulator driver implemented in a silicon-on-insulator (SOI) process.</td>
<td>11</td>
</tr>
<tr>
<td>Figure 2.3</td>
<td>Transistor sizing and minimum driver power consumption versus data rate for two different process technologies.</td>
<td>13</td>
</tr>
<tr>
<td>Figure 2.4</td>
<td>A micro-photograph of the fabricated transmitter with the ring modulator and the driver circuit showing in the inset pictures.</td>
<td>13</td>
</tr>
<tr>
<td>Figure 2.5</td>
<td>(a,c) 15 Gb/s and 25 Gb/s eye diagram (b,d) 15 Gb/s and 25 Gb/s bathtub xy axises in UI and BER</td>
<td>14</td>
</tr>
<tr>
<td>Figure 3.1</td>
<td>WDM interconnect system.</td>
<td>18</td>
</tr>
<tr>
<td>Figure 3.2</td>
<td>Energy efficiency as a function of data rate for different superlinear terms.</td>
<td>20</td>
</tr>
<tr>
<td>Figure 3.3</td>
<td>(a) Waveguide cross-section diagram (b) circuit model for high speed ring modulator (c) diephoto of reverse-biased depletion ring modulator</td>
<td>21</td>
</tr>
<tr>
<td>Figure 3.4</td>
<td>Transmitter power consumption and energy efficiency comparison.</td>
<td>26</td>
</tr>
<tr>
<td>Figure 3.5</td>
<td>Schematic of push-pull amplifier.</td>
<td>27</td>
</tr>
<tr>
<td>Figure 3.6</td>
<td>Schematic of proposed receiver.</td>
<td>28</td>
</tr>
<tr>
<td>Figure 3.7</td>
<td>Sensitivity as a function of bit rate for TIA stage.</td>
<td>30</td>
</tr>
<tr>
<td>Figure 3.8</td>
<td>$f_T$ simulation of (a) 28-nm FD-SOI and (b) 14-nm FinFET devices</td>
<td>34</td>
</tr>
<tr>
<td>Figure 3.9</td>
<td>Intrinsic gain $A_V$ of 28 nm FD-SOI and 14 nm FinFET devices</td>
<td>35</td>
</tr>
<tr>
<td>Figure 3.10</td>
<td>Device noise power spectral density at 1 GHz.</td>
<td>35</td>
</tr>
<tr>
<td>Figure 3.11</td>
<td>Proposed single-ended receiver with self-biased feedback.</td>
<td>36</td>
</tr>
<tr>
<td>Figure 3.12</td>
<td>Receiver energy efficiency and sensitivity as a function of bandwidth.</td>
<td>37</td>
</tr>
<tr>
<td>Figure 3.13</td>
<td>Energy efficiency of the transceiver.</td>
<td>38</td>
</tr>
<tr>
<td>Figure 3.14</td>
<td>Loss and energy efficiency of photonic elements.</td>
<td>40</td>
</tr>
<tr>
<td>Figure 3.15</td>
<td>Proposed WDM link co-design flow.</td>
<td>41</td>
</tr>
<tr>
<td>Figure 3.16</td>
<td>Energy efficiency of the WDM link with 10 dB loss.</td>
<td>42</td>
</tr>
<tr>
<td>Figure 3.17</td>
<td>Energy efficiency of the WDM link with different loss.</td>
<td>43</td>
</tr>
<tr>
<td>Figure 4.1</td>
<td>Proposed mmWave and analog processing based PCR system.</td>
<td>46</td>
</tr>
<tr>
<td>Figure 4.2</td>
<td>Maximum range $R_{MAX}$ and range resolution $\Delta R$ as a function of bandwidth under a 12 dBm peak power and 0 dB SNR constraint.</td>
<td>49</td>
</tr>
<tr>
<td>Figure 4.3</td>
<td>Correlation of received 7-bits Barker code with different templates.</td>
<td>50</td>
</tr>
<tr>
<td>Figure 4.4</td>
<td>(a) Digital correlation system and (b) analog correlation system.</td>
<td>50</td>
</tr>
</tbody>
</table>
Figure 4.5: Dynamic range requirements of each block in the receiver are
detailed to illustrate how the baseband signal processing circuit
can reduce the required dynamic range. .............................. 51
Figure 4.6: Schematic of variable gain amplifier. .............................. 52
Figure 4.7: Schematic of analog correlator. .............................. 53
Figure 4.8: (a) Multi-range delay lock loop (b) Proposed Wideband DLL. 54
Figure 4.9: Delay cell of proposed VCDL .............................. 55
Figure 4.10: (a) 4-bits Flash ADC (b) preamplifier with average technology. 56
Figure 4.11: (a) Analog calibration (b) Proposed digital assisted calibration. 57
Figure 4.12: Definition of gain and offset in the baseband circuit. Several
proposed offset cancellations using a DAC are proposed in red
for comparison. .............................................................. 58
Figure 4.13: (a) Rx with offset (b) calibrated Rx (c) correlation w/wo dc
offsets. .............................................................. 59
Figure 4.14: (a) dc offset extraction (b) dc offset calibration. .................. 61
Figure 4.15: SLR degradation versus DAC resolution (ADC ENOB of 4 bits). 62
Figure 4.16: (a) Distorted template signal and (b) explanation of PPE and
PWE. .............................................................. 63
Figure 4.17: SLR degradation as a function of template misalignments. ......... 64
Figure 4.18: Propose pulse compression radar signal processor test setup. .... 65
Figure 4.19: PCB of PCR system. ........................................ 65
Figure 4.20: Clock multiplication in time and frequency domain. ............... 66
Figure 4.21: Multi-band DLL tuning range. ................................ 67
Figure 4.22: SLR comparison with digital-assisted dc offset calibration. ....... 68
Figure 4.23: (a) Correlation of 3b barker (b) Correlation of 5b barker (c)
Correlation of 7b barker (d) Auto-correlation of 3b barker (e)
Auto-correlation of 5b barker (f) Auto-correlation of 7b barker
(g) SLR of 3b barker (h) SLR of 5b barker (i) SLR of 7b barker. 69
Figure 5.1: Proposed mmWave and analog processing based PCR system. ... 73
Figure 5.2: LO and IF frequency plan in PCR mode and Point-2-point
mode. .............................................................. 74
Figure 5.3: Hybrid Dual Path FracN PLL for Bidirectional Transceiver. ....... 75
Figure 5.4: IF-correlation of received 7-bits barker codes with different tem-
plates. .............................................................. 76
Figure 5.5: Proposed bidirectional system specifications. ....................... 77
Figure 5.6: Received signal with offset and analog correlations. ............... 78
Figure 5.7: Baseband correlation with offset calibration. ....................... 79
Figure 5.8: Proposed IF-correlation system. ................................ 80
Figure 5.9: Illustration of a three-point estimation for the phase misalign-
ment in the correlation. .............................................................. 82
Figure 5.10: Proposed system for range sensing and data communication. .. 85
Figure 5.11: Insertion loss of SPDT switch. ................................ 86
Figure 5.12: 7-bits Barker code modulator and Correlator. ................. 87
Figure 5.13: (a) Transmitter test setup (b) Receiver single tone test (c) Transceiver link test setup with two chips. ......................... 88
Figure 5.14: Die microphotograph of proposed IF-correlation system. .... 89
Figure 5.15: Received 7-bits Barker code at 1Gb/s. ......................... 90
Figure 5.16: I/Q correlations of 7-bits Barker code at 1Gb/s. .............. 91
Figure 5.17: SLR of 7-bits Barker code at 1Gb/s. .......................... 92
Figure 5.18: PCR mode with barker code (a-c) 200Mb/s (d-f) 200Mb/s I/Q correlations (g-i) 1.5Gb/s (j-l) 1.5Gb/s I/Q correlations (m-o) SLR performances. ........................................... 93
Figure 5.19: (a) Modulator spectrum at 1.6 Gb/s and 3 Gb/s (b) Demodulator spectrum at 1.6 Gb/s and 3 Gb/s. ......................... 94
Figure 5.20: (a) Demodulator conversion gain (b) Measured 500 MHz I/Q signals. .......................................................... 94
LIST OF TABLES

Table 2.1: Performance Summary and Comparison ........................ 15
Table 3.1: Power Budget for A WDM SiP Interconnect .................. 24
Table 3.2: Transceiver Performance Comparison at 25 Gb/s .............. 39
Table 4.1: Performance Summary and Comparison ........................ 70
Table 5.1: Comparison of IF-Correlation and Baseband Correlation ...... 83
Table 5.2: Performance Summary and Comparison ........................ 96
ACKNOWLEDGEMENTS

First and foremost, I would like to sincerely thank my advisor Prof. James F. Buckwalter for his constant guidance and support throughout my graduate studies. I have been inspired greatly by his advice and the expertise he shared with me, and benefited from the interesting and challenging projects. Without his tireless mentorship, it would have been impossible for me to get to where I am today. Next, I would like to extend great appreciation to my committee members: Prof. Peter Asbeck, Prof. Dan Sievenpiper, Prof. Gert Cauwenberghs and Prof. Chung Kuan Cheng for their insightful suggestions and comments.

I would also like to thank my master advisor Prof. Woogeun Rhee for his encourage and support. He opened me the world of integrated circuit design and taught me the working attitude, skill and the way of thinking in this field. He is the teacher who helps me find my interests and encourages me to continue my Ph.D study.

Friendship has been immeasurable during these years. Many people have influenced my work and life at UCSD in many ways. I am grateful to fellow lab-mates Wei Wang, Mehmet Parlak, Tissana Kijsanayotin, Po-Yi Wu, Cooper Levi for discussions, encouragement and their friendship.

I would also like to take this chance to express my sincere appreciation to my friends from industry and academia: Xinhua Chen, Gang Zhang, Weibo Hu, Ruili Wu, Zheng Wang, Hao Wang, Deqiang Song in Qualcomm, Hongrui Wang, Weifeng Feng, Jingxuan Gong in Broadcom, Hao Liu in Maxlinear, Xuezhe Zheng, Guoliang Li, Ping Ma in Oracle hardware lab, Run Chen in University of South California and Jie Gu in Northwest University for their discussion, encourage and help in many ways.

My deepest and sincere gratitude goes to my parents - Mrs. Fengxian Liu and Mr. Yuanshan Li - for their unconditional love and support. I thank my parents for bringing me to the world, raising me, educating me and supporting me through every step of my life.

Finally, I have been fortunate to make many new friends at UCSD and in United States. They have not only helped and supported me at various occasions
but also made my stay in US very enjoyable. I will always cherish the great memories with them and hope that the friendships continue. It is my fortunate to go through my Ph.D study with Xin Zhao, Ruinan Chang, Zhanzhan Jia, Lan Liu, Jiang Long, Yang yang and Yanqin Jin together.

The material in this dissertation is based on the following papers which are either published, or submitted for publication. Chapter 2 is mostly a reprint of the material as it appears in Jun Li; Guoliang Li; Xuezhe Zheng; Raj, K.; Krishnamoorthy, A.V.; Buckwalter, J.F., "A 25-Gb/s Monolithic Optical Transmitter With Micro-Ring Modulator in 130-nm SOI CMOS,” Photonics Technology Letters, IEEE, 2013. Chapter 3 is mostly a reprint of the material as submitted to Jun Li; Xuezhe Zheng; Ashok V. Krishnamoorthy; Buckwalter, J. F.,” Scaling Trends for Picojoule per Bit WDM Photonic Interconnects in CMOS SOI and FinFET Processes,” Lightwave Technology, Journal of, IEEE. Chapter 4 is mostly a reprint of the material as it appears in Li Jun, H. Mukai, M. Parlak, M. Matsuo, and J. F. Buckwalter, A 1Gb/s reconfigurable pulse compression radar signal processor in 90nm CMOS, IEEE Custom Integrated Circuits Conference (CICC), 2013 and Jun Li; Parlak, M.; Mukai, H.; Matsuo, M.; Buckwalter, J. F.,” A Reconfigurable 50-Mb/s-1 Gb/s Pulse Compression Radar Signal Processor With Offset Calibration in 90-nm CMOS,” IEEE Transactions on Microwave Theory and Techniques, 2015. Chapter 5 is mostly a reprint of the material as it is submitted to Jun Li; T. Kijsanayotin; Buckwalter, J. F.,” A 3-Gb/s Radar Signal Processor using an IF-Correlation Technique in 90 nm CMOS,” IEEE Transactions on Microwave Theory and Techniques.

The dissertation author was the primary author of the work in these chapters, and co-authors have approved the use of the material for this dissertation.

Jun Li
San Diego, CA
December, 2015
VITA

2007 B. E. in Electron Information Science and Technique, Beijing University of Posts and Telecommunications, Beijing, China

2010 M. E. in Institute of Microelectronics (Integrated Circuit Engineering), Tsinghua University

2015 Ph. D. Electrical Engineering (Electronic Circuits and Systems), University of California, San Diego

PUBLICATIONS


ABSTRACT OF THE DISSERTATION

Low Power CMOS Circuit Techniques for Optical Interconnects and High Speed Pulse Compression Radar

by

Jun Li

Doctor of Philosophy in Electrical Engineering (Electronic Circuits and Systems)

University of California, San Diego, 2015

Professor James F. Buckwalter, Chair

High performance computing and high resolution range sensor motivates the intelligent system innovations such as smart car, smart home/community and 3D motion games. Most importantly, 3D graphics technique requires high performance computation to provide high quality and vivid real-time videos. Accurate motion sensing requires high resolution radar sensor. However, in general, data transmission limits the large scale computation while high resolution radar signal processor limits the detection accuracy. Therefore, low power design for high speed data transmission and high resolution radar signal processor are desired.

Energy efficient transceivers have been aggressively demanded by high performance computing applications. Recently, data center has reached TB/s in 2015.
and it would reach 20 TB/s in a few years. Therefore, low cost short range interconnects becomes one of the major limitations for high speed data transmission especially for processor-to-processor and processor-to-memory. Silicon photonics (SiP) has attracted great attentions for its low energy efficiency and less IO pins. In air interface, 15 cm sensing resolution and Gb/s data communication dual mode system enables a layer of network intelligence.

The first portion of the dissertation discusses the energy efficient transceiver design for optical interconnects. A monolithic micro-ring modulator based transmitter is presented in 130 nm CMOS SOI. In this process, the data rate limitation of monolithic integration is on the electrical driver. With optimized micro-ring modulator, the transmitter demonstrates 25 Gb/s data rate without pre-emphasis.

Secondly, the scaling trends from FD-SOI CMOS to FinFET process is discussed. Since the monolithic integration can not scale the energy efficiency as hybrid integration due to the limitation of optical devices development, hybrid integration keeps the energy efficiency scaling on behalf of the high $f_T$ device in advanced process. An optical and electrical co-design algorithm for the WDM link is proposed which focus on the energy efficiency optimization in hybrid integration. Link budget is analyzed and transceivers with planar 28 nm FD-SOI CMOS and 3D 14 nm FinFET are presented.

Thirdly, a high speed pulse compression radar (PCR) signal processor is presented in 90 nm CMOS. High speed analog correlation is proposed to replace the conventional digital correlation. This relaxes the high speed requirement in analog-to-digital converter (ADC) design. An analog correlation signal processor is implemented with a variable gain amplifier (VGA), a correlator, a delay-lock loop (DLL), a 4-bits DAC and a 4-bits ADC. Additionally, a new closed-loop calibration algorithm is proposed for DC offset and time misalignment.

Finally, a dual mode bidirectional pulse compression radar is proposed. The radar signal processor is demonstrated in 90 nm CMOS. The system allows 3 Gb/s data transmission in passthrough mode and 10 cm range resolution in radar mode. This is the first demonstration of pulse compression radar signal processor with the features of wireless data communication and high resolution range sensing.
Chapter 1

Introduction: Energy Efficient Optical Transceiver and High Resolution Range Sensor

Modern computing systems and short range radar sensor demand wide bandwidth for high speed data transmission and high resolution range sensing. Fig. 1.1 pictorially illustrates some of the applications composed of data transmission and range sensing systems. Intelligent home, community and high quality motion games comprise of the accurate range sensor and wireless communication brings challenges for low cost system innovation. E-band (71-76 GHz & 81-86 GHz) circuits are being explored for high capacity point-to-point link such as backhaul applications [1–3] and short range radar (77/79 GHz) is being explored for smart sensor systems [4,5]. In addition to air interface challenges, high performance computing in these applications stress the interconnect bandwidth between processors-to-processors and processors-to-memory. The input/output (I/O) bandwidth of high-end processors has exceeded 5 terabit-per-second (Tb/s) and will increase to 20 Tb/s within a few years [6, 7].

In data processing, demands of high performance computing poses significant challenges for high speed data transmission [8]. Silicon photonics (SiP) WDM link has emerged as an energy-efficient solution for dense intrachip and interchip I/O data rate, power consumption, and die area demands. CMOS compatibil-
Figure 1.1: High Speed Data Center and Pulse Compression Radar System.

ity ensures high yield, reproducibility, and high reliability at low cost per part. Recently, published work implemented parallel optical links with vertical cavity surface emitting lasers (VCSEL) integrated directly onto the chip package with large aggregate bandwidth to enable the large-scale system integration [9,10].

In air interfaces, increased path loss and inability to generate high transmit power at mm-wave frequencies pose additional challenges for the implementation of transceivers especially for accurate radar applications. Pulse compression radar (PCR) provides a practical solution to alleviate most of these issues [11]. In addition, automotive radar band (77/79 GHz) falls between two non-continuous E-band (71-76 GHz & 81-86 GHz). This allows a dual mode fully integrated transceiver which supports short range data transmission and high resolution range sensing with time division duplexing. It further enables the built of intelligent network such as smart car, home, community and other applications at low cost.
1.1 Monolithic Energy Efficient Transmitter

Silicon photonic (SiP) transmitter requires optical and electrical co-design for low fabrication, package cost and relative better energy efficiency. However, energy efficiency considerations have resulted in several variations of SiP system design. Previous work has demonstrated monolithic 25 Gb/s transceiver in 130 nm CMOS SOI [12]. With pre-emphasis technique, the energy efficiency of the transmitter is around 3.68 pJ/b. In Chapter 2, an optimized co-design of ring modulator and transmitter is presented which achieves 680 fJ/b at 25 Gb/s without employing pre-emphasis technique [13]. In advance process, better performance have been reached [14–17]. However, monolithic integration ensures high yield, reproducibility, and high reliability at low cost per part [18–20].

1.2 Scaling Trends for Silicon Photonic Interconnects in CMOS SOI and FinFET Process

Scaling Trends in CMOS SOI and FinFET process are critical to data centers and high-performance computing system design. In general, advanced process has higher $f_T$ that allows the transceiver system scaling in power and area. However, there is no significant $f_T$ improvement from 28 nm planar FD-SOI to 14 nm FinFET process [21]. Existing literature on optical interconnects does not discuss the FinFET advantages in terms of the power, area and integration cost. In Chapter 3, a systematic design flow is presented for optimum energy efficiency [22]. The algorithm considers the property of photonic elements such as modulator, mux/demux, photo detector and electrical circuits such as transmitter, receiver, thermal tuning sub-system. Laser efficiency is also considered in the system optimization. WDM systems have been synthesized with 28 nm FD-SOI and 14 nm FinFET. In conclusion, FinFET process is more energy efficient because of higher $f_T$ in P-type device and higher intrinsic gain.
1.3 High Speed Analog Radar Signal Processor With Offset Calibration

Advance in silicon technology have made possible new sensor applications at millimeter-wave bands that require low power and low cost. As a result, silicon integrated beamforming architectures and phased arrays have been demonstrated for millimeter-wave automotive radar system [23], high-definition content streaming [24], and satellite systems [25]. Millimeter-wave radar circuitry especially short-range radars is preferred for parking assistance or side-crash prevention which needs wide bandwidth for high range resolution. Other applications include range detection to maintain safe driving distance between vehicles in heavy traffic [26]. In general, two main radar systems have been proposed: frequency modulated continuous wave (FMCW) radar and pulse compression radar (PCR). FMCW radar measures the range by using linear frequency modulation which results in a low cost architecture, but requires two isolated antennas for high receiver sensitivity [27]. Pulse compression radar (PCR) uses digital signal modulation and time-division duplexing of the RF between transmit and receive [28]. Existing works have demonstrated analog correlation consume less power than digital correlator at bandwidth up to GHz [29]. However, the high dynamic range, DC offset saturation and template misalignment bring challenges to the usage of analog correlation. The received signal strength is proportional to the object size, angle and distance. Therefore, high dynamic range and linear variable gain amplifier (VGA) is required for analog correlation. Furthermore, DC offset can saturates the receiver chain due to the high gain of VGA. Thirdly, alignment of template and received signal is critical to the analog correlation. The sidelobe ratio (SLR) drops fast versus time misalignment between received signal and local template. In Chapter, an analog radar signal processor with offset calibration is presented to solve all these issues [30]. A closed-loop calibration algorithm removes the DC offset and extract the misalignments with multi echos. A wide band delay-lock loop is implemented to re-timing the local template and achieves $< T_{\text{symbol}}/N$ alignment resolution. Where $T_{\text{symbol}}$ is the symbol rate of the Barker code and $N$ is the bit number.
1.4 Radar Signal Processor With an IF Correlation Technique

Analog correlation and offset calibration have been proposed as a low power and low cost architecture for pulse compression radar (PCR). For successful detection, two critical steps needs to be accomplished all the time and multiple echo signal from the same object is necessary. First, the receiver sweeps the VGA gain to ensure the received signal has required swing. The DC offset calibration is applied with the gain configuration in a loop. Once the signal swing and offset meets the correlation requirements, delay-lock loop moves the template signal to find the optimum alignment. Then, the distance information can be calculated according to the round trip time. However, the proposed algorithm is effective for the object detection at low moving speed only. The calibration limits the processing time. In Chapter 4, a high speed detection processor is presented with IF-correlation. The proposed radar signal processor not only removes the calibration step but also improves the detection resolution. Furthermore, short range point-to-point communication is implemented in the same architecture. The processor has radar sensor and data communication two operation modes. This motivates the intelligent system innovations such as smart car, smart home and smart community. IF-correlation inherently removes the effects of DC offset. Complex correlation provides sub-symbol alignment information directly. In other words, there is no need for closed-loop calibration and the system is suitable for moving object detection.

1.5 Dissertation Organization

Background material, previous work and new contributions have been mentioned in Chapter 1. Importance of energy efficient transceiver for optical interconnects and high resolution radar signal processors have been outlined.

Chapter 2 presents a comprehensive analysis of monolithic optical transmitter design. An optical modulator and electrical driver co-design is proposed for optimum energy efficiency. This co-design analysis is extended to compare mono-
lithic and hybrid integrations and conclude that monolithic transceiver has the advantages of yield, reproducibility, and high reliability while hybrid integration has the advantages of energy efficiency results and low cost in long time view.

Chapter 3 illustrates the new analysis for energy efficient WDM system design. The link analysis includes laser, modulator, mux/demux, photodetector and electrical transceiver. The link budget and optimization algorithm is presented for minimum energy efficiency. High speed ring modulator, low loss mux/demux, and wideband photodetector are reviewed. Optimum data rate can be reached according to device cutoff frequency $f_T$ and intrinsic gain $A_V$. In addition, this analysis considers the tuning circuit for link stabilization.

Chapter 4 illustrates the concept of pulse compression radar and high speed analog correlator. High resolution and low cost are achieved with pulse compression and analog correlation techniques. DC offset in receiver chain and misalignment between template and received signal are discussed in detail. A closed-loop calibration algorithm is proposed to remove the impairments from DC offset and misalignment. This system comprises of a variable gain amplifier (VGA), an analog correlator, a wideband delay-lock loop, a 4-bits ADC and a 4-bits DAC. The DAC is designed to calibrate the DC offset in front of the correlator. An offchip calibration engine is implemented in FPGA. The proposed system is suitable for low speed but high accuracy object detection. For example, pedestrian.

Various concepts used to arrive at the final dual mode bidirectional radar signal processor that is presented in chapter 5. Bidirectional front-end achieves low cost in area, antenna and especially removing the Tx-to-Rx leakage in separate Tx/Rx architecture. Dual mode operation means radar mode and data communication mode. The processor can work with different functions in time division duplexing (TDD) mode. IF-correlation removes the complex calibration as discussed previously. The system is suitable for moving object detection and has less detection latency than conventional analog correlation. The concept of bidirectional front-end, system frequency plan and PVT insensitive frequency synthesizer are illustrated with simulations. The radar signal processor with IF-correlation technique is implemented and measured with back-to-back connection.
Chapter 2

Energy Efficient Transmitter for Optical Interconnects

2.1 Silicon Photonic Transmitter

Silicon photonic (SiP) links promise low energy efficiency and high throughput density and should hasten the migration from electrical to optical interconnects for CMOS VLSI. For chip-to-chip serial links, Si photonic circuitry should operate at energy efficiency under 1 pJ/bit. The SiP systems could be developed with hybrid and monolithic foundry processing. Each offers different advantages relative to cost and performance. Hybrid packaging decouples the Si process from the photonic device design and has been demonstrated at 10 Gb/s with a link efficiency of 530 fJ/b [15]. However, the hybrid approach increases manufacturing costs and constrains packaging. Monolithic silicon photonic circuitry would benefit from mature CMOS foundries but typically require a compromise in the electronic or photonic device performance. In this Chapter, a 25 Gb/s monolithic transmitter in 130 nm CMOS SOI is presented with the energy efficiency of 680 fJ/b.

2.1.1 High Speed Modulator

The choice for energy-efficient SiP based interconnects is limited to requirements of compact area, multigigabit-persecond (Gb/s), low capacitance, and com-
patibility with CMOS supply considerations. Silicon Mach-Zehnder interferometer (MZI) modulators typically consume high power due to the distributed nature of the phase shift in spite of recent work investigating segmented electrodes [31, 32]. Franz-Keldysch (FK) and ring-based modulators are more desirable options for better energy efficiency attribute to their compact designs with low capacitance, and drive voltage requirements for high speed modulation. Consequently, this work focus on reverse-biased depletion ring modulator with 2 V voltage swing for operation above 25 Gb/s. Recent research on SiP modulators has resulted in ring modulators that are optimized for speed, thermal tuning, and electrical drive requirements [33–35].

The structure of a photonic micro-ring resonator (PMR) modulator is based on a reverse-biased p-n diode implemented in an SOI process as shown in Fig. 2.1: (a) Cross section, (b) Measured DC characteristics and (c) electrical small signal model of Micro-Ring modulator.

Figure 2.1: (a) Cross section (b) Measured DC characteristics and (c) electrical small signal model of Micro-Ring modulator.

The structure of a photonic micro-ring resonator (PMR) modulator is based on a reverse-biased p-n diode implemented in an SOI process as shown in Fig.
Light is confined to the intrinsically-doped material by the highly-doped regions to the left and right and the oxide layer above and below the active silicon layer. The ring modulator consists of waveguide ring placed in close proximity to a linear waveguide. Light is confined within the ring by an outer p+ and inner n+ doped annular region. At a certain resonant wavelength, light couples from the linear waveguide into the waveguide ring. Changing the field across this intrinsic region changes the velocity of light within the waveguide and changes the resonant wavelength $\lambda_0$. This section characterizes trade-offs between the extinction ratio, modulation bandwidth, and voltage swing required for a photonic micro-ring resonator modulator in a photonic interconnect.

The optical power $T_{opt}$ transmission of a reverse-biased PMR is characterized as a Lorentzian line shape near a particular resonance wavelength $\lambda_0$ as

$$T_{opt} = 1 - \frac{K_{ring}}{1 + 4\left(\frac{\lambda - \lambda_0}{\lambda}\right)}$$

Where $\lambda$ is the laser wavelength, $\Delta\lambda$ is the linewidth of the resonance, and $K_{ring}$ is the coupling coefficient of the PMR [36]. Ideally the coupling coefficient is close to one and, when $\lambda = \lambda_0$, the optical transmission becomes small, i.e., $T_{opt} = 1 - K_{ring} \approx 0$. As $\lambda$ is shifted away from $\lambda_0$, the optical transmission approaches one.

Modulation of the laser intensity is possible since the resonance, and hence optical transmission, is shifted with an applied voltage. A linearized model of the reverse-biased PMR resonance is characterized as $\lambda_0 = \lambda_{0,nom} + K_\lambda V_S$, where $\lambda_{0,nom}$ is the nominal resonant wavelength, $K_\lambda$ is the coefficient of the voltage-dependence of the resonance, and $V_S$ is the voltage swing. By increasing $K_\lambda$, the transmission null is pushed further with a smaller applied voltage and suggests lower voltage drive. Herein lies one of the first tradeoffs in the link efficiency; $K_\lambda$ cannot be made arbitrarily large without degradation of the extinction ratio; current results indicate that $K_\lambda$ is on the order of 25 picometers-per-volt (pm/V) for SOI ring resonators to reach an extinction ratio of 8 dB. The measured transmission response is plotted in Fig. 2.1(b). With an applied voltage of 2 V, the extinction ratio is
more than 8 dB and acceptable from the standpoint of the link reliability.

The trade-off between extinction ratio and power consumption is calculated from equation (2.1). Assuming that $\lambda = \lambda_{\text{nom}}$, for an input power $P_{\text{in}}$, the output optical power for a transmitted zero and one are

\[ P_0 = P_{\text{in}}(1 - K_{\text{ring}}) \]  

\[ P_1 = P_{\text{in}}\left(1 - \frac{K_{\text{ring}}}{1 + 4\left(\frac{K_{\lambda}V_S}{\Delta\lambda}\right)^2}\right) \]

Therefore, the extinction ratio is

\[ E_r = \frac{1}{1 - K_{\text{ring}}} \left(1 - \frac{K_{\text{ring}}}{1 + 4\left(\frac{K_{\lambda}V_S}{\Delta\lambda}\right)^2}\right) \]  

This equation relates the extinction ratio to the voltage swing that must be applied to the diode and optical bandwidth $\Delta\lambda$. To achieve an extinction ratio above 8 dB, this expression indicates > 2 V voltage swing and linewidth that must be applied across the photodiode.

In this work, the ring modulator has a radius of 7.5 \( \mu \)m [37]. The entire ring waveguide is doped for PN diode modulation, with junction doping densities of \( 4 \times 10^{18} \) cm\(^{-3} \). Fig. 2.1(a) shows the cross-section schematic of the ring waveguide. The whole-ring modulation combined with high doping density helps to achieve large resonance shift under voltage modulation, low series resistance, and large photon lifetime-limited bandwidth due to low resonator quality (Q) factor. Fig. 2.1(b) shows the resonance spectrum at bias voltages from -0.5 V (forward bias) to 2 V (reverse bias). The ring modulator has a Q factor of 4000 and a voltage induced wavelength shift of 28 pm/V. When operated around 1553.87 nm wavelength and with a voltage swing from 0 V to 2.4 V, this ring modulator can achieve \( \sim 8 \) dB extinction ratio.

The high speed behavior of the ring modulator has been carefully studied using a circuit model as shown in Fig. 2.1(c). \( C_p \) represents the capacitance between the electrodes, \( C_J \) denotes the capacitance in the reverse-biased diode junction,
$R_s$ denotes the diode series resistance, $C_{ox}$ denotes the capacitance through the buried oxide and the bottom Si substrate, and $R_{si}$ is the resistance through the Si substrate. The modulation bandwidth of the ring modulator is subject to both the RC limit and the photon lifetime limit as expressed in

$$BW_{ring} = \sqrt{\left(\frac{1}{1 + \frac{j\omega}{\omega_{RC}}}\right) \cdot \left(\frac{1}{1 + \frac{j\omega}{\omega_{lifetime}}}\right)} \quad (2.4)$$

Based on the measured quality factor, the photon lifetime-limited bandwidth is $\sim 48$ GHz which is not the major limiting factor of the overall bandwidth; the RC-limited bandwidth is $\sim 24$ GHz, which is sufficient for 25 Gb/s operation.

### 2.1.2 Modulator Driver

![Figure 2.2: A low-power, reverse-biased PMR modulator driver implemented in a silicon-on-insulator (SOI) process.](image)

The modulator driver operates as a large-signal rather than a small-signal circuit. Fig. 2.2 illustrates a modulator driver that takes advantage of the isolated body contact. It is intended to be driven rail-to-rail to provide sufficient voltage swing across the reverse biased diode. Consequently, the driver is designed according to the slew rate rather than designing each stage according to a 3 dB $BW$. The
bit rate $B$ is related to the rise and fall time of the edge $tr = 0.35/B$. Based on the total modulator capacitance $C_M$ and voltage swing $V_S$, the maximum output stage current $I_D$ is

$$I_D = 2.86 \cdot V_S \cdot (C_M + C_{out}) \cdot B$$  \hspace{1cm} (2.5)

Where $C_{out}$ is the output capacitance of the driver. Since the voltage supply is fixed, the power consumption of the final stage is anticipated to scale proportionally with the bit rate. The current is used to determine the size of the transistor based on the ideal current density $J_{OPT}$ per unit width to satisfy the peak $f_T$ or power consumption. Larger transistors results in higher $C_{out}$ and iteratively require higher drain current. The optimal size for the modulator driver transistors is chosen according to

$$W_N = \frac{V_S \cdot C_M}{J_{OPT} \cdot tr - (2(C_{gd,N} + \beta \cdot C_{gd,P}) + (C_{d,N} + \beta \cdot C_{d,P})) \cdot V_S}$$  \hspace{1cm} (2.6)

Where $\beta$ is the scaling factor between the N- and P-type devices, i.e. $W_P = \beta \cdot W_N$. A similar expression could be derived for the rising edge transition which is primarily determined by the width of the PMOS transistor. To maintain rising and falling edge symmetry, a ratio of $\beta = 2.5$ is used.

In Fig. 2.3, the driver power consumption and energy efficiency is plotted versus the data rate for a given modulator capacitance and transition symmetry. As the data rate increases, the transistor width must increase to provide the marginal drive current. Once a critical speed is reached, the proposed driver cannot support the data, e.g., $t_r \leq (2C_{GD} + CD)V_s/I_{D,N}$. The energy efficiency increases dramatically as much more marginal power must be provided for a small change in data rate. Therefore, the process technology constrains not only the realizable speed for the type of proposed modulator driver but also the efficiency. For a 130 nm CMOS SOI node, the data rate cannot exceed 31 Gb/s. However, a 65 nm node can achieve higher rates and reduce the driver power consumption by a factor of 2.5.

For 25 Gb/s, a slew-rate limited driver ideally must offer a rise and fall time of 14 ps. Additionally, in order to achieve a sufficient extinction ratio, the voltage
Figure 2.3: Transistor sizing and minimum driver power consumption versus data rate for two different process technologies.

swing is designed to be $2.4 \ V_{pp}$ based on $+1.2 \ V$ and $-1.2 \ V$ supply. Since the peak $f_T$ in this process occurs for a current density of roughly $J_{D,N} = 0.125 mA/\mu m$, given a ring resonator capacitance of around $28 \ fF$, the optimum drain current is around $8 \ mA$ [12].

2.2 Measurement Results

Figure 2.4: A micro-photograph of the fabricated transmitter with the ring modulator and the driver circuit showing in the inset pictures.

Fig. 2.4(a) shows a micro-photograph of the fabricated test chip for the
monolithically integrated transmitter. The driver circuit and the ring modulator, which only occupy very small area in the chip, are highlighted and shown in the inset pictures. The waveguide routing for connecting the ring modulator to the input and output grating couplers (which are marked as Tx in and Tx out in the picture) is also highlighted although they are not visible under multilayer metal tilings. 26 probing pads are designed at the edge of the chip for power supplies, grounds, and high-speed signal inputs. Fig. 2.4(b) shows the high-speed test setup. An Agilent 4903B J-BERT and 28 Gb/s multiplexer N4876A is used to generate the input data for driver circuit. The signal voltage swing to the chip is 600 mV_{pp}. An Agilent 8164A tunable laser source generates light at 1553.874 nm with an output power of 5 dBm. The light is sent to a fiber probe and coupled into the on-chip grating couplers and waveguides. The modulated optical signal is coupled out and sent to an Agilent 81600 Communication Analyzer with a 50 GHz optical receiver module.

**Figure 2.5:** (a,c) 15 Gb/s and 25 Gb/s eye diagram (b,d) 15 Gb/s and 25 Gb/s bathtub xy axises in UI and BER
Table 2.1: Performance Summary and Comparison

<table>
<thead>
<tr>
<th></th>
<th>This Work</th>
<th>[12]</th>
<th>[38]</th>
<th>[39]</th>
<th>[40]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology ( nm )</td>
<td>130</td>
<td>130</td>
<td>130</td>
<td>32</td>
<td>90</td>
</tr>
<tr>
<td>Modulator</td>
<td>On-chip PMR</td>
<td>On-chip PMR</td>
<td>On-chip MZM</td>
<td>Off-chip VCSEL</td>
<td>Off-chip VCSEL</td>
</tr>
<tr>
<td>ER ( dB )</td>
<td>&gt; 8</td>
<td>8</td>
<td>3.5</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Data Rate ( Gb/s )</td>
<td>25</td>
<td>25</td>
<td>27</td>
<td>25</td>
<td>18</td>
</tr>
<tr>
<td>Power ( mW )</td>
<td>17</td>
<td>92</td>
<td>N/A</td>
<td>25</td>
<td>133.6</td>
</tr>
</tbody>
</table>

The measured eye diagrams of different data rates are shown in Fig. 2.5. An 8 dB extinction ratio is reached with +1.2 V/−1.2 V power supply. To evaluate transmission performance, a Photoline optical receiver (BW 27 GHz) is used to complete the whole link. In Fig. 2.5, the transmitter operates to 25 Gb/s with a $2^{31} - 1$ PRBS input signal and achieves $10^{-12}$ bit error rate.

Compared to previous works in Table 2.1 [12,38–40], this work has demonstrated the highest speed performance at the lowest energy efficiency (680 fJ/b excluding laser and tuning power) for a monolithic optical transmitter and removes the need for pre-emphasis circuit for 25 Gb/s operation. Hybrid approach might achieve lower power with the device of this work but monolithic integration gives low cost solution for production especially for WDM system. Some 45 nm [39,40] works have also demonstrated the monolithic integration with low power. But the forward biasing based modulator could not support 25 Gb/s operation without pre-emphasis techniques. 25 Gb/s rings that include a tuning mechanism are reported in [41].
2.3 Conclusions

This chapter presents a monolithic SiP transmitter. The co-design of micro-ring modulator and electrical driver removes the need for pre-emphasis circuit for 25 Gb/s operation. This work has demonstrated the highest speed performance at the lowest energy efficiency (680 fJ/b excluding laser and tuning power).

Acknowledgements

This chapter is mostly a reprint of the material as it appears in Jun Li, Guoliang Li, Xuezhe Zheng, Kannan Raj, A.V. Krishnamoorthy and J. F. Buckwalter, “A 25Gb/s Monolithic Optical Transmitter With Micro-Ring Modulator in 130nm SOI CMOS”, IEEE Photonics Technology Letters, 2013. The work presented in Chapter 2 was completed when the author was a visiting summer intern in Oracle Labs in San Diego and during follow collaboration with the Oracle team and is reproduced as-is in this thesis as Chapter 2 with permission form Oracle. This work was supported, in part, by DARPA under Agreements HR0011-08-09-0001 and W911NF-07-1-0529. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government, approved for public release, distribution unlimited. This dissertation author was the primary author of this material.
Chapter 3

Scaling Trends for Silicon Photonic Interconnects in CMOS SOI and FinFET Process

3.1 WDM Link for Optical Interconnects

Scaling of data centers and high-performance computing stresses the interconnect bandwidth between microprocessors and memory. Silicon photonics (SiP) has emerged as an energy-efficient solution for dense intrachip and interchip I/O data rate, power consumption, and die area demands. A wavelength division multiplexed (WDM) interconnect is illustrated in Fig.3.1. The SiP WDM link has several advantages compared to electrical I/O. First, WDM reduces the number of I/O connections or fibers needed. Second, SiP modulators and photodetectors have demonstrated high-speed (> 40 Gb/s) modulation without the need for equalization [42]. Additionally, ring-based modulators are relatively small (10s of microns) and the low capacitance associated with these devices appreciably lowers power consumption. Finally, CMOS compatibility ensures high yield, reproducibility, and high reliability at low cost per part. Employing pre-emphasis and equalization techniques, VCSEL links were successfully demonstrated with data rates up to 56.1 Gb/s, and high energy efficiency of 1.37 pJ/b at 15 Gb/s and 3.7
pJ/b at 25 Gb/s. On the other hand a digitalchip-to-chip WDM SiP link with an external laser source was reported with 1.37 pJ/b on-chip energy efficiency (4.2 pJ/b including laser efficiency) [19]. VCSEL links could offer advantages of simple packaging and hence lower cost but high-speed VCSELs typically require smaller aperture sizes and higher current densities for data rates beyond 25 Gbps [43].

Figure 3.1: WDM interconnect system.

3.1.1 Link Budget

WDM SiP interconnects optimize the design of optoelectronic devices, lasers, and integrated circuits to achieve minimum energy per bit per wavelength. The
energy efficiency of a photonic interconnect is defined by

\[
EPB = \frac{\text{energy}}{\text{bit}} = \frac{P_L + P_{TX} + P_{RX} + P_{TUNE}}{B} \tag{3.1}
\]

where \( P_L \) is the power consumption of the laser, \( P_{TX} \) and \( P_{RX} \) are respectively the power consumption of the electronic transmitter (TX) and receiver (RX), \( P_{TUNE} \) is the power required to tune the optical modulator and associated WDM multiplexer (mux) and demultiplexer (demux), and \( B \) is the bit rate. We will discuss the \( EPB \) as a metric per wavelength for WDM but the losses that are incurred are common to all wavelengths.

To optimize the link \( EPB \), the individual power consumption should be evaluated in terms of the bit rate \( B \). If we begin with the hypothesis that total link power consumption can be modelled as a function \( P_L + P_{TX} + P_{RX} + P_{TUNE} = P(B) = p_0 + p_1 B + p_X B^X \), where \( X \) is a superlinear exponent. Then, the minimum \( EPB \)

\[
EM_{min} = p_1 + ((X - 1)^{\frac{1}{X}} + (X - 1)^{\frac{X-1}{X}}) \frac{X-1}{p_0 X} p_X^{\frac{1}{X}} \tag{3.2}
\]

and the optimum bit rate is

\[
B_{opt} = \left( \frac{p_0}{(X - 1)p_X} \right)^\frac{1}{X} \tag{3.3}
\]

The relative contributions of the power-independent and superlinear coefficients are insightful in these two equations. While both components tend to increase the energy requirements of the link, more power-independent contributions tend to push the optimum bit rate to higher values. Rewriting (3.2) in terms of (3.3), the minimum energy is

\[
EPB_{min} = p_1 + \frac{X}{X - 1} \frac{p_0}{B_{opt}} \tag{3.4}
\]

In other words, approximating the bit-rate independent and linear power consumption coefficients, i.e. \( p_0 \) and \( p_1 \), and exponent of the superlinear term allows an approximation of the minimum \( EPB \). While the choice of an exponent of \( X \) might appear arbitrary, this will be motivated in subsequent circuit analysis. The relevance of the exponent \( X \) is demonstrated in Fig. 3.2. While all three
curves achieve the same minimum energy per bit and optimum bit rate, higher superlinear coefficients make the energy consumption less robust and undesirable from the standpoint of the link design.

![Figure 3.2: Energy efficiency as a function of data rate for different superlinear terms.](image)

### 3.2 Silicon Photonic Devices

In this section, the development of SiP devices is reviewed in terms of the scaling of optical path losses with bit rate as well as the impact on electrical device characteristics. The laser power consumption required to support the link margin depends on the receiver sensitivity $P_{SEN}$. As shown in Fig. 3.1, the useful laser output power $P_0$ is reduced by the total interconnect loss $L_T$, which includes waveguide losses as well as modulator and mux/demux insertion loss, and the laser source wall-plug efficiency ($WPE$). Therefore,

$$P_L = \frac{P_{SEN} \cdot L_T}{WPE}$$

(3.5)

Understanding the contributions to the optical losses will be discussed here.
3.2.1 Micro-Ring Modulator

![Image of micro-ring modulator](image)

**Figure 3.3:** (a) Waveguide cross-section diagram (b) circuit model for high speed ring modulator (c) diephoto of reverse-biased depletion ring modulator.

High speed ring modulator is desired in monolithitic integration for better energy efficiency. However, the advance process of 28 nm FD-SOI and 14 nm FinFET provide $\sim 4 \times f_T$ of that in 130 nm CMOS SOI. Therefore, hybrid integration is preferred for best energy efficiency. Fig. 3.3 illustrates the low loss design for optimum WDM link. In general, the link budget is limited by the laser output power due to its low efficiency ($\sim 10\%$). Recently, CMOS transceiver circuit is reported to consume $< 30 mW$ at 25 Gb/s in FD-SOI and FinFET process [22]. Therefore, less loss in modulator improves system energy efficiency when $> 3 \, dBm$ laser output power is required to complete the the WDM link.
3.2.2 High Speed Photodetector

High-speed photodetectors (> 40 Gb/s) have been demonstrated in SiP processes using Ge or III/V dopants with silicon. Silicon photodetectors can be implemented with responsivity, $\Re$, of $\sim 0.85 \text{ A/W}$, low capacitance $C_{PD}$, of a few $fF$ and low dark current 1 $\mu A$ [37, 44–46]. The responsivity of silicon photonic PDs has varied between 0.8 and 0.9 based on the doping of Germanium and the geometry of the photodiode. A common figure of merit in photodiode design is the product of the responsivity and the bandwidth. With high-performance p-i-n PD, recent work has demonstrated $27 \text{ GHz}$ bandwidth and 1 $\text{ A/W}$ responsivity for C and L band operation [47]. The sensitivity of a receiver with a p-i-n photodetector is

$$P_{SEN} = \frac{Q \sqrt{\overline{I^2}_{n,TIA}}}{\Re} + q \cdot \frac{Q^2 BW_n}{\Re}$$

where $Q$ is the Personick measure of SNR for OOK, $\overline{I^2}_{n,TIA}$ is the input-referred noise of the receiver, and $BW_n$ is the first order Personick noise bandwidth and is typically a factor of 1.11 times the receiver BW [48]. For BER of $10^{-12}$, $Q$ should be roughly 7. Whereas the $Q$ and $\Re$ are fixed, the dependence of the input-referred noise on the bandwidth illustrates the linear dependence on power consumption described in section 3.1.1.

3.2.3 Wavelength Multiplexing and De-Multiplexing

Wavelength multiplexing (MUX) and de-multiplexing (DEMUX) have been demonstrated using cascaded ring resonator based add/drop filters [49,50], echelle grating [51], and arrayed waveguide grating on silicon [52]. For silicon MUX and DEMUX with more than 8 wavelength channels, prior work reports a worst case power penalty of 3 dB including both the insertion loss and the power penalty for high speed modulation attributed to filter shape in both amplitude and phase, optical crosstalk, misalignment between the filter center wavelength and the carrier wavelength, and residual perturbation resulting from the closed-loop tuning controller. Nonetheless, this power penalty is expected to be largely bit rate independent.
3.2.4 Tuning and control of ring resonator devices

To overcome manufacturing tolerances and ambient temperature changes, closed-loop control is required for reliable operation of the ring resonator-based modulator and wavelength MUX/DEMUX. Closed loop control represents one of the more significant challenges to the use of high Q ring resonators. The power consumption of the tuning circuit comprises the power required to heat the ring to temperature $T_o$ from an ambient temperature $T_a$ of and the power consumption of the closed-loop controller [24]. Given a tuning sensitivity $K_\lambda$ in $pm/K$ for the ring modulator, the temperature of the ring is controlled to shift the initial wavelength $\lambda_a$ to the desired wavelength $\lambda_o$. This wavelength tuning range $\Delta \lambda = \lambda_o - \lambda_a$ could be as large as the free spectral range (FSR) of the ring. Synthetic resonance comb technique significantly reduces static tuning range $\Delta \lambda$ to overcome manufacturing tolerances to $\sim 1 nm$ [53]. For a WDM link using cascaded rings on a shared bus waveguide, additional dynamic tuning is required with range depending on the maximum ambient temperature range $\Delta T$.

In the worst case scenario, the rings have to be heated for the static shift $\Delta \lambda$ plus the maximum ambient temperature change $\Delta T$. The maximum power consumption required to control the ring temperature of the ring, therefore, consists of static temperature component to bias the ring and a dynamic component in response to a temperature change $\Delta T$. Therefore, the power consumption required to control the ring temperature of the ring consists of an ambient temperature component to bias the ring and the response to a temperature change $\Delta T$.

$$P_{TUNE} = \frac{1}{\eta_T K_T} (\frac{\Delta \lambda}{K_\lambda} + \Delta T) \quad (3.7)$$

The power depends on a heating efficiency $K_T$ in $K/mW$ that determines how much of the heating power induces a temperature change in the ring and a power efficiency $\eta_T$ which is the ratio of the heater power to the total power consumption of the ring tuning controller. Using a substrate removal technique [54], the tuning efficiency, i.e. the product of $K_\lambda K_T$, can be substantially improved to 2.5 $nm/mW$. Assuming that $\eta_T$ is 50 $\%$, the power required to tune a ring 1 $nm$ above room temperature value is 0.8 $mW$. We anticipate that the cost of WDM is that as
the channels must be spaced apart in wavelength and the tuning power increases linearly with the number of channels. While $P_{\text{TUNE}}$ is presumably independent of $B$, the power consumption of the $T_X$ and $R_X$ circuitry generally depends on $B$.

3.2.5 Photonic Component Scaling Trends

Recent literature has benchmarked efficient laser source and tuning mechanisms [55, 56]. A total optical loss for the routing of the WDM link is about 10 $dB$, including passive silicon waveguides and multiple interlayer couplers [57]. Although link distances are typically short for these networks, the total link losses are nevertheless substantial. Table 3.1 offers a sample power budget of a complete WDM link based on recent published work [57, 58]. The receiver sensitivity is an extremely important factor in determining exactly how much laser output power and, consequently, laser power consumption is required to close the link and significantly effects the $EPB$ and presumably the optimum data rate. For low power optical interconnects, silicon optical receivers are typically optimized for energy efficiency. Ultra efficient SiP receivers with sensitivity of $-17 \text{ dBm}$ for a bit error-rate (BER) of $10^{-12}$ have been demonstrated at 10 Gb/s to reach energy efficiency under 300 fJ/b [57].

Table 3.1: Power Budget for A WDM SiP Interconnect

<table>
<thead>
<tr>
<th>Component</th>
<th>Power Budget</th>
<th>Power Budget</th>
</tr>
</thead>
<tbody>
<tr>
<td>Receiver ( TIA ) ( nm )</td>
<td>$P_{\text{SEN}}$</td>
<td>-14 dBm</td>
</tr>
<tr>
<td>Modulator</td>
<td></td>
<td>8 dB</td>
</tr>
<tr>
<td>MUX</td>
<td></td>
<td>2.5 dB</td>
</tr>
<tr>
<td>DEMUX</td>
<td>$L_T$</td>
<td>2.5 dB</td>
</tr>
<tr>
<td>Routing</td>
<td></td>
<td>6 dB</td>
</tr>
<tr>
<td>WDM laser output</td>
<td>$P_\lambda$</td>
<td>5 dBm</td>
</tr>
<tr>
<td>Laser efficiency</td>
<td>$WPE$</td>
<td>10 %</td>
</tr>
<tr>
<td>Laser Power Consumption</td>
<td>$P_L$</td>
<td>32 mW</td>
</tr>
</tbody>
</table>
3.3 Driver and Receiver Circuit Design

Monolithic transceivers with SiP devices in CMOS SOI have been demonstrated in previous work [12, 59, 60]. However, hybrid bonding solutions enable integration of different device technologies to achieve better energy efficiency [14]. For dense interconnects in microprocessor applications, die area is assumed to be a premium. This compels circuit designs that eliminate peaking inductors, which has been shown to double the bandwidth of the circuit and impact the number of amplifier stages [61–63]. Bandwidth extension techniques such as pre-emphasis and inductor peaking are not necessary with advanced process for a data rate of 25 Gb/s. Given the device parameters in particular $C_M$ and $C_{PD}$ for the modulator and the photodiode capacitance from the previous section, the circuit design is discussed in the following sections to determine the bit rate dependence of each of the terms in (3.1).

3.3.1 Modulator Driver

In the case of the FinFETs, the number of fins $N_{FIN}$ and the number of fingers $N_F$ can be interchangeably used to provide the required current. We assume that the n-FINFET and the p-FINFET will use the same number of fins.

\[
N_{F,N} \cdot N_{FIN} = \frac{I_D}{J_{OPT,FIN}} \quad (3.8a)
\]

\[
N_{FP} \cdot N_{FIN} = \frac{I_D}{J_{OPT,FIN}} \quad (3.8b)
\]

Since $C_{OUT}$ is linear function of FinFET number, the total capacitance of modulator driver is

\[
C_{OUT} = C_{FIN}(N_{F,N} + N_{F,P})N_{FIN} \quad (3.9)
\]

Here, $C_{FIN}$ is the output capacitance per fin. From device simulations, the power consumption and energy efficiency versus data rate for different processes are shown in Fig. 3.4. As the data rate increases, the transistor must increase to provide marginal drive current. The power wall is reached when $tr < \frac{V_S \cdot (C_M + C_{D,out})}{I_D}$. This means the energy efficiency increases dramatically as much more marginal
power must be provided for a small change in data rate. With 80 fF output capacitor, 65 nm node has the limit at around 30 Gb/s while 28 nm FD-SOI and 14 nm FinFET are higher than 50 Gb/s.

![Figure 3.4: Transmitter power consumption and energy efficiency comparison.](image)

Finally, we should consider the drive requirements of the stages preceding the final modulator driver. Applying traditional digital fanout approaches, the ratio of the total output capacitance, here doubled to account for the positive and negative drive on the output stage, and the input capacitance is used to find the optimum number of transmit stages.

\[
N_{TX} = \frac{\log\left(\frac{2(C_M + C_{OUT})}{C_{IN}}\right)}{\log(F)}
\]  

where \(C_{IN}\) is the input capacitance of the first stage in transmitter and \(F\) is the optimal fanout. Therefore, the total transmitter power consumption is

\[
P_{TX} = C_{OUT}V_{DD}^2B\left(\sum_{i=1}^{N_{TX}-1} \frac{1}{FN-i} + 1\right)
\]
Note that we double the single-ended driver chain power for the total power calculation. The transmit power consumption can be shown to scale significantly due to the fanout factor which places most of the emphasis in the power consumption on the drive requirements of the modulator driver.

### 3.3.2 CMOS Push-Pull Amplifiers

Each technology can be compared from the performance of a push-pull (inverter) amplifier shown in Fig. 3.5. This circuit can be configured to serve as a transimpedance and transconductance stage as will be described in subsequent sections. The inverter is a desirable approach to designing an amplifier with the highest tolerance to process variations for several reasons. First, the p-FET devices - as will be seen - reaches a performance parity with NMOS devices in scaled technologies. Additionally, as CMOS scales towards 10 nm, the number of dopants in the channel becomes statistically significant and in applications where low power consumption is required small devices have high threshold voltage mismatch.

![Figure 3.5: Schematic of push-pull amplifier.](image)

A push-pull amplifier offsets changes in the n-FET and p-FET devices. The small-signal model of this inverter amplifier suggests a gain of

\[
A_V = \frac{g_{m,N} + g_{m,P}}{g_{ds,N} + g_{ds,P}} = \frac{G_m}{G_{ds}}
\]  

(3.12)
Where $g_{m,N}$ and $g_{m,P}$ are the transconductance for the n and p-type devices and $g_{ds,N}$ and $g_{ds,P}$ are the channel conductance for the n- and p-type devices. For simplicity, we express $G_m = g_{m,N} + g_{m,P}$ and $G_{ds} = g_{ds,N} + g_{ds,P}$. The input capacitance is $C_{in} = C_{gs,N} + C_{gs,P} + C_{miller}$ where $C_{miller} = (A_V + 1)(C_{gd,N} + C_{gd,P})$. The output capacitance is $C_{out} = C_{gd,N} + C_{gd,P} + C_{ds,N} + C_{ds,P}$.

### 3.3.3 Transimpedance Amplifier Stage

Receiver sensitivity plays a key role in determining link performance. Here, the fundamental limitations on receiver sensitivity are reviewed based on the implementation of a highly scaled CMOS process.

![Schematic of proposed receiver.](image)

**Figure 3.6**: Schematic of proposed receiver.

As shown in Fig. 3.6, the receiver starts with a transimpedance amplifier
(TIA) stage and will be subsequently followed with a transconductance amplifier (TCA). The TIA produces a low input impedance and an output impedance determined by the feedback resistance. The TCA has a large input impedance and large output resistance. Therefore, a TIA is followed by a TCA stage to produce high frequency poles in subsequent stages. The transimpedance is given by

\[ R_T = R_F \frac{A_V}{1 + A_V} \]  

(3.13)

where \( R_F \) is the input shunt feedback resistor and, hence, the input referred noise are determined according to a receiver bandwidth (BW) requirement. The input pole of each TIA stage is found at

\[ \omega_{p,i} = 2\pi BW = \frac{1}{R_{F,i} C_{in,i} + C_{out,i-1}} \]  

(3.14)

where \( i = 1 \) for the first TIA stage and \( C_{out,0} = C_{PD} \). To optimize the sensitivity, the input capacitance of the TIA should be equal to the photodetector capacitance [38]. Note that the intrinsic gain of the amplifier is assumed to be relatively constant given the device characteristics found in the previous section. Now, the noise contribution of the pushpull amplifier and the feedback resistor are calculated as an input referred noise current in (3.15).

\[ \frac{i_{n,TIA}^2}{R_{F,i}} = \frac{4K}{\pi} \frac{BW_n}{G_{m,i}^2} + \frac{4K}{\pi} \frac{BW_n}{G_{m,i}^2} + \frac{4K}{\pi} \frac{BW_n}{G_{m,i}^2} \left( C_{in,i} + C_{out,i-1} \right)^2 BW_n^2 \]  

(3.15)

In this expression, \( BW_{n2} \) is the second-order Personick noise bandwidth and is assumed to be 1.49 of the \( BW \), respectively, for a Butterworth second-order response. Additionally, the factor \( \Gamma \) represents the excess noise factor of the transistors. Substituting (3.14) into (3.15) and recognizing that \( C_{in} = C_{PD} \), this expression is simplified to (3.16).

\[ \frac{i_{n,TIA}^2}{R_{F,i}} = K \cdot (17.6\pi \left( \frac{C_{PD}}{A_V} \right)) BW^2 + \frac{32\pi^2 \Gamma}{G_{m,1} C_{PD}^2} BW^3 \]  

(3.16)

Several features are evident in this expression. The TIA noise contribution depends on the ratio of the photodiode capacitance and intrinsic gain and reducing the photodiode capacitance or increasing the intrinsic gain both effectively improve the sensitivity. Second, larger transconductance minimizes the contribution of the
channel noise. The first term indicates that the sensitivity scales linearly with bandwidth. For wideband design, the first term in (3.16) dominates the second term since $G_{m,i}R_{F,i} \gg \Gamma$.

The sensitivity for a TIA can be estimated by substituting (3.16) into (3.6) in different technologies as function of bandwidth. This bound has been plotted in Fig. 3.7 under two conditions. Assuming that the receiver power consumption is not a constraint, the prediction is that the $G_{m,1}$ is increased until channel noise is not a dominant noise contribution. At this point, receiver sensitivity is limited by the p-i-n noise and the feedback resistance and, as a point of reference, then sensitivity is $-20$ dBm for 30 Gb/s.

**Figure 3.7:** Sensitivity as a function of bit rate for TIA stage.

The sensitivity in (3.6) depends on the input-referred noise contribution in (3.16) which has the form of $\sqrt{\alpha_{TIA,2}B^2 + \alpha_{TIA,3}B^3}$. Applying a Taylor series approximation, the sensitivity is

$$P_{SEN} = \left(\frac{qQ^2}{\mathcal{R}} + \sqrt{\alpha_{TIA,2}}B + \frac{1}{2}\frac{\alpha_{TIA,3}}{\alpha_{TIA,2}}B^2\right)$$

(3.17)
More generally, the mean-square current noise contributions must be examined from circuit simulations to determine the dependence of the sensitivity on the superlinear exponent. Finally, the desired laser power is

\[ P_L = P_{LQ} + \frac{L_T}{WPE} \left( \frac{qQ^2}{R} B + \sqrt{\alpha_{TIA,2} B^2 + \alpha_{TIA,3} B^3} \right) \] (3.18)

where \( P_{LQ} \) is a DC threshold power for the laser.

### 3.3.4 Transconductance Amplifier Stage

The transconductance amplifier (TCA) provides a relatively large input impedance and drives the relatively low impedance of the subsequent transimpedance stage with a transconductance \( G_{m,i} \). The dominant pole of the LA is between the output of the TIA stage and the input of the push-pull stage is

\[ \omega_{p,i} = 2\pi BW = \frac{1}{R_{out,i-1}(C_{out,i-1} + C_{in,i})} \] (3.19)

where \( R_{out,i-1} = \frac{R_{F,i-1}}{1 + G_{DS,j-1}R_{F,j-1}} \). This bandwidth should be identical to (3.14) but may be smaller due to the absence of feedback to reduce the input impedance of the pushpull amplifier. The ratio of the capacitance between the two nodes determines whether the bandwidth constraint can be satisfied. In the event that the pole cannot be made larger by shrinking the input capacitance of the TCA stage, then the feedback resistor of the TIA is reduced to lower the output resistance. This reduces the transimpedance of the first stage and sacrifices the sensitivity. In most practical applications, the geometry of the push-pull amplifier must be minimum size to minimize the capacitance on the output node of the TIA stage. The output noise contribution of each LA stage is

\[ \overline{i_{n,LA,i}^2} = 4KTG_{m,i}BW \] (3.20)
3.3.5 Cascade of Transimpedance and Transconductance Stages

The overall transimpedance gain from the cascade of \( N_{RX} \) TIA and TCA stages is

\[
R_{T,N_{RX}} = \prod_{i=1,i=\text{odd}}^{2N_{RX}+1} \frac{A_{V}}{1 + A_{V}} R_{F,i} \prod_{i=2,i=\text{even}}^{2N_{RX}} G_{m,i} \tag{3.21}
\]

where \( A_{V} \) is assumed to be constant for all stages, \( G_{m,i} \) and \( G_{ds,i} \) is the transconductance and conductance of the \( i \)th TCA stage. The cascade of TIA and TCA pairs is repeated to reach the desired transimpedance where the expressions for the bandwidth are derived from (3.14) and (3.19). For a desired bandwidth \( BW \), (3.21) is

\[
R_{T,N_{RX}} = R_{F,1} \frac{2N_{RX}+1}{\prod_{i=1,i=\text{odd}} f_{T} F_{BW} F} \tag{3.22}
\]

where \( f_{T} \) is unity current gain cutoff frequency, \( F \) is the optimal fan-in ratio (typically \( 2 \sim 3 \)). Note that a clock and data recovery (CDR) circuit is usually designed to re-sample the data from the last amplifier stage and the input capacitor of sampling latch is smaller than input capacitor of first stage TIA due to large \( C_{PD} \) and noise optimization. Assuming -20 dBm sensitivity, the required transimpedance is \( 117 k\Omega \) and this determines the required number of stages. Assuming that each stage contributes a pole at the same frequency, i.e. \( \omega_{p} = \omega_{p,i} \), the overall \( BW \) is

\[
BW = \frac{\omega_{p}}{2\pi} \sqrt{2(1/N_{RX}) - 1} \tag{3.23}
\]

The receiver bandwidth decreases substantially as more receive stages are needed. Hence, the bandwidth of each stage needs to be larger as more stages are required to satisfy the overall transimpedance requirement. Finally, the total input referred noise contribution is (3.24). This expression for the overall receiver noise is used to recalculate the receiver sensitivity in (3.6). The calculated sensitivity, like the bandwidth, is then used to iterate on the number of required stages.

\[
\frac{i_{n}^{2}}{i_{n,TIA,i}^{2}} = \frac{i_{n,TIA,i}^{2}}{i_{n,TIA,i}^{2} + \frac{i_{n,LA,i}^{2}}{R_{T}^{2}G_{ds,i}^{2}}} + \frac{i_{n,LA,N_{RX}^{2}}}{{R_{T,N}^{2}G_{ds,N}^{2}}} \tag{3.24}
\]
Now, the power consumption of the receiver is calculated from the total power consumption of the TIA and TCA stages.

\[ P_{RX} = V_{DD} \prod_{i=1}^{N_{RX}} I_i \]  

(3.25)

where the current required per stage can be related to the bandwidth and the bit rate.

\[ I_i = \frac{G_{m,i}V_{DD}}{2} = \frac{(C_{in,i} + C_{out,i-1})f_TV_{DD}^2}{2} \]  

(3.26)

Substituting this into (3.25) gives

\[ P_{RX} = \frac{C_Tf_TV_{DD}^2}{2} \prod_{i=2,i=even}^{N_{RX}} \left(1 + \frac{1}{F(i - 2)}\right) \]  

(3.27)

where \( C_T \) is the total capacitance at the input of TIA and \( F \) is \( \sim 2.6 \). This expression indicates that the receiver power consumption is independent of the bandwidth and data rate except in the number of stages. Transistor parameters alone can be used to calculate the power consumption given photodetector capacitance and transistor speed. The first TIA stage consumes more power with higher \( f_T \) device while achieving higher sensitivity. The number of limiting amplifier stages are calculated:

\[ N_{RX} = 1 + \frac{\log \left( \frac{BWCTV_{DD}}{PSNR} \right)}{\log(f_T) - \log(BW)} \]  

(3.28)

Three conclusions are reached by studying (3.27) and (3.28). First, the scaled devices which offer higher \( f_T \) seems to also increase the power consumption. However, this increased power consumption will also be associated with improved sensitivity of the receiver. Additionally, the higher transimpedance suggests that the number of subsequent amplifier stages might be reduced. Third, high bandwidth (BW) requires more LA stages and it consumes high power due to the low gain of each stage.
3.3.6 CMOS Device Technology: 14-nm FinFET and 28-nm CMOS SOI

The device parameters are revisited for two advanced technologies, notably a 14-nm FinFET and 28-nm FD-SOI, to compare the potential power consumption.

Figure 3.8: $f_T$ simulation of (a) 28-nm FD-SOI and (b) 14-nm FinFET devices

The $f_T$ is simulated across drain current in Fig. 3.8. A comparison of the unity current gain cutoff frequency, $f_T$, of a 28-nm fully depleted SOI device and a 14-nm CMOS FinFET device indicates the SOI n-MOST reaches a peak $f_T$ of 400 GHz while the p-MOST reaches a peak $f_T$ of 220 GHz. Alternatively, the 14-nm n-FinFET reaches a peak $f_T$ of 380 GHz but the p-FinFET reaches a peak $f_T$ of 280 GHz. While n-MOST speed is higher than the scaled n-FinFET, the p-FinFET speed does improve and raises the question of how the analog performance of an optoelectronic front-end compares in these technologies [20].

The intrinsic voltage gain is plotted across current consumption in Fig. 3.9. The intrinsic gain for the FD-SOI devices is between 17 and 20.5 which suggests that the push-pull amplifier will have a gain that is bounded by the lower of the intrinsic gains. The intrinsic gain for the FinFET is between 28 and 32 and, notably the PMOS rather than the NMOS device has higher intrinsic gain. Based on minimum intrinsic gain of either device. The FinFET device offer an advantage of $6 \sim 8dB$ in intrinsic gain. Moreover, it is important to note that the intrinsic
Intrinsic gain $A_V$ of 28 nm FD-SOI and 14 nm FinFET devices gain does not change significantly over a range of currents but begins to reduce as the device is pushed into triode.

Finally, the noise of the FD-SOI versus FinFET devices are shown in Fig. 3.10. The spot noise at 1 GHz is simulated for comparison. With similar $G_m$ for PMOS and NMOS, FINFET devices offer lower noise than FD-SOI devices. Therefore, the high intrinsic gain of the FinFET device achieves lower input referred noise.
3.3.7 Proposed Transceiver in 14-nm FinFET and 28-nm CMOS SOI

The proposed single-ended receiver is shown as Fig. 3.11 and consists of a single-ended TIA, TCA and self-biased feedback circuits. A self-biased feedback circuit alleviates the process, voltage and temperature (PVT) variation effects to reach an optimum bias voltage. The self-biased feedback circuits consist of a RC low-pass filter stage and a TCA stage. The unity gain bandwidth of the RC filter stage decides the receiver cutoff frequency.

![Proposed single-ended receiver with self-biased feedback.](image)

**Figure 3.11:** Proposed single-ended receiver with self-biased feedback.

Fig. 3.12 plots the simulated receiver energy efficiency and sensitivity versus data rate. At the top of the figure, $N_{Rx}$ is the number of limiting amplifier stages required to achieve a fixed transimpedance, and the higher data rate requires
more stages to achieve enough gain. Additionally, the sensitivity degrades at high data rate. Since the transistor intrinsic gain $A_V$ of FinFET devices is about twice as large as a comparable planar FD-SOI device [21], larger feedback resistors can be used to achieve the same bandwidth. Consequently, lower input referred noise enables better receiver sensitivity design. For the same BER, the link budget in FinFET design will potentially require less laser power and achieve better receiver energy efficiency. The receiver is implemented with a 80 fF photodetector. For 25 Gb/s operation, a receiver with 17.5 GHz bandwidth is desired. Since the receiver sensitivity is mainly determined by the TIA stage, a $>15mS$ $G_m$ is required for $-16dBm$ sensitivity. According to equation (3.14), a $300\Omega$ shunt feedback resistor ($R_{F1}$) is implemented for 19 GHz bandwidth and $46dB\Omega$ gain. With $50\mu A$ input swing, another $40$ dB gain with bandwidth of 27 GHz is implemented by 4 stage limiting amplifiers (LA). Then, the receiver has $86dB\Omega$ gain and 18 GHz bandwidth. According to equation (3.6), the receiver rms current noise should be $<4\mu A$ for a bit error rate (BER) of $10^{-12}$. In simulation, the rms noise is $1.71\mu A$.

![Figure 3.12: Receiver energy efficiency and sensitivity as a function of bandwidth.](image-url)
The transmitter is implemented assuming an 80 fF capacitor load from the modulator. Based on slew rate, a minimum drain current of 10 mA is required for 25 Gb/s operation. A minimum number of stages is determined for Cin of 10 fF. For F of 2.9, \( N_{TX} \) of 2 is achieved and the total power consumption of transmitter is then 6.325 mW.

![Energy efficiency of the transceiver.](image)

**Figure 3.13:** Energy efficiency of the transceiver.

Fig. 3.13 plots the energy efficiency of the transmitter and receiver circuits. The analysis of the energy efficiency matches circuit level simulations at high data rate but more power is consumed at low frequency since the first stage is fixed due to the ratio of \( \frac{C_{mode}}{C_{in}} \).

Additionally, the receiver simulations show higher energy efficiency than analysis since the simulations take the large signal behaviour into account. The large signal gain of limiting amplifier is lower than the small signal analysis. Therefore, additional power is consumed for desired receiver gain.

Table 3.2 compares the 25 Gb/s 28 nm FD-SOI simulations with 14 nm FinFET. With the same fanout factor \( f \), \( N_{TX} \) equals 2 and the total power consumption of the transmitter is 8.75 mW. Receiver is implemented with the same
PD capacitance, a > 30 mS gm is required for -13 dBm sensitivity. Since FinFET intrinsic gain usually has 6 ~ 8 dB higher than 28 nm CMOS FD-SOI device, $R_{F,i}$ needs to be 2-3 times smaller for the same bandwidth. In simulation, a 148 Ω resistor ($R_{F,1}$) is implemented for 21 GHz bandwidth and 41 dB gain. With 100 μA input swing, another 40 dB gain is achieved by 4 stage limiting amplifiers (LA). The tuning power is as described above. In summary, for BER of $10^{-12}$, a -16 dBm sensitivity receiver is simulated in 14 nm FinFET while -13 dBm sensitivity receiver is simulated in 28 nm CMOS FD-SOI neglecting crosstalk and supply noise impairments. With 14 nm FinFET, the total energy efficiency is then 0.763 pJ/b at 25 Gb/s excluding the laser power. With 28 nm CMOS FD-SOI, the total energy efficiency is 1.19 pJ/b at 25 Gb/s.

**Table 3.2**: Transceiver Performance Comparison at 25 Gb/s

<table>
<thead>
<tr>
<th>Technology</th>
<th>14 nm FinFET</th>
<th>28 nm FD-SOI</th>
</tr>
</thead>
<tbody>
<tr>
<td>$N_{TX}/P_{TX}$</td>
<td>2/6.325 mW</td>
<td>2/8.75 mW</td>
</tr>
<tr>
<td>TIA Transimpedance</td>
<td>46 dBΩ</td>
<td>41 dBΩ</td>
</tr>
<tr>
<td>$N_{RX}/P_{RX}$</td>
<td>4/12.75 mW</td>
<td>4/21 mW</td>
</tr>
<tr>
<td>$\sqrt{i_{n,TIA}^2}$</td>
<td>$\sim 1.71 \mu A$</td>
<td>$\sim 5.69 \mu A$</td>
</tr>
<tr>
<td>Sensitivity</td>
<td>$-16 \text{dBm}$</td>
<td>$-13 \text{dBm}$</td>
</tr>
<tr>
<td>Tuning Power</td>
<td>3.125 mW</td>
<td>3.125 mW</td>
</tr>
</tbody>
</table>

### 3.3.8 Contribution of Photonic Elements

Table 3.2 shows that 28 nm FD-SOI designs achieve −13 dBm sensitivity. Hence, 1 dB higher laser power would be required to complete the link. Therefore, the 14 nm transceiver would achieve better efficiency when taking the photonic components into account. Fig. 3.14 depicts the power penalty and energy of photonic elements. The loss of routing and MUX/DEMUX are fixed versus data rate which increases the $p_0$ of $P(B)$. The power penalty of a reverse-biased ring modulator is approximately proportional to data rate which increases $p_1$ term of $P(B)$. The laser power is calculated to complete the link is related to $p0$, $p_1$ and $p_{1.5}$.
of $P(B)$. Thus, laser power requirements for different data rate are determined by the corresponding transceiver simulations and the assumption of 10% WPE. The tuning power is estimated to be bounded by $0.8mW/nm$ with $\eta_T$ of 50% [54]. Other related work has achieved a tuning power of $1.93mW/nm$ [64]. For a $>1Tb/s$ WDM system, 40 channels are desired between 1530 nm and 1580 nm. We assume the tuning efficiency of $2.5mW/nm$. Therefore, 1.25 nm tuning range for each channel consumes 3.125 mW tuning power While many other link configurations are possible, this value is consistent with low-power projections for tuning and control of ring-resonator-based optical links [57].

3.3.9 WDM Optical Interconnect Energy-per-Bit

Careful photonic-electronic co-design is desired to achieve the optimum WDM parameters since optical sources of loss impact the dc and dynamic power consumption of the system. In this work, we discuss the trade off in system co-design to achieve optimum link energy efficiency. As shown in Fig. 3.15, a system
level design flow is proposed to synthesize the parameters of transmitter and receiver circuits. As illustrated in Table 3.1 and equation (3.1), the link energy depends on the transceiver system, tuning circuit, laser and data rate. The laser source efficiency is typically low, especially for WDM laser source. For a given on-chip laser power required for a particular link, its efficiency is further limited by the coupling interface. We assume a minimum of 5 dBm on-chip waveguide-coupled laser power is needed with a 10% wall-plug efficiency. The mux, demux and routing is fixed if the link architecture is chosen and we assume a 10 dB loss in total. Hence, the system trade off mainly comes from the ring modulator, photodetector and transceiver circuits. The target is to minimum the laser output power while retaining margin for desired BER.

To optimize the energy of the link, we start the design of the photodetector. Given the photodiode capacitance and low dark current of less than 1µA, high responsivity (> 0.9A/W) leaves margin for the ring modulator. With the photodiode capacitance, the TIA is designed to achieve the best sensitivity at 10−12 with desired bandwidth and determines the power consumption. Then, the LA gain

**Figure 3.15:** Proposed WDM link co-design flow.
is calculated and the number of LA stages is selected. With the given receiver sensitivity and channel loss, the required laser power is determined by the total power penalty of the modulator and the optical losses. By iterating through the algorithm shown in Fig. 3.15, the link is optimized with respect to energy in a specific technology.

**Figure 3.16**: Energy efficiency of the WDM link with 10 dB loss.

Fig. 3.16 describes the energy for both technologies considering the photonic parameters previously discussed. The curve labelled without laser includes proposed transceiver power consumption and tuning circuit power consumption. A second set of curves accounts for the laser power consumption and the transceiver power consumption. We conclude that FinFET technology has an advantage of lower energy from the circuit perspective and this advantage is substantial when considering the laser power consumption. The minimum energy of the 28 nm link is around 3.3 pJ/bit while the 14 nm link drops to around 2 pJ/bit. We find the optimum data rate to be around 20 Gb/s. Furthermore, the two technologies do not produce substantially different optimum data rates.
Finally, we would like to assess the impact of optical link loss on the energy per bit as predicted in (3.3) and (3.4). Eq. (3.3) indicates that the optimum data rate and energy per bit should shift with different loss. Fig. 3.17 compares energy efficiency comparing a lossless link to a link with 10 dB and 20 dB of routing and MUX/DEMUX loss. Modulator loss is included in all the simulations. As seen in the figure, 0 dB loss achieves a 30 Gb/s optimum data rate since the laser output power may be smaller than transceiver circuits. Furthermore, the low contribution of frequency independent losses ensures that as the data rate increases the energy efficiency changes gradually. Increasing the loss to 10 dB indicates that the optimum is clearly between 20 and 25 Gb/s. The receiver sensitivity drops at high data rates and consumes more power because the lasers low efficiency dominates the power consumption. Additionally the minimum energy has increased. Finally, 20 dB loss requires higher energy and is minimized for a 10 Gb/s data rate. The optimum power occurs when laser power consumption is similar to transceiver power consumption.

![Figure 3.17: Energy efficiency of the WDM link with different loss.](image)
3.4 Conclusions

This chapter describes the co-design of electronic and photonic components for a silicon photonic WDM interconnect for optimum link energy efficiency [65]. The serial interconnect link design trade-offs are discussed in terms of link budget, power consumption and optimum data rate operation. Proposed transceivers are simulated to show the scaling trends in FinFET and CMOS SOI process and how these different device technologies will impact the energy of the link as well as the bit rate. CMOS device scaling from planar CMOS technologies to FinFET-based CMOS will continue to allow energy scaling in optical interconnects particularly for the optical transmitter and receiver. When laser efficiency was taken into account (assumed to be 10% in this paper), a bit rate optimized energy-efficiency of approximately 2 pJ/bit at the optimum bit rate of 25 Gbps was found. This can be scaled to even lower energies and higher bit-rates with improvements in optical link loss and waveguide-coupled laser wall-plug efficiency.

Acknowledgements

This chapter is mostly a reprint of the material as it appears in Jun Li; Buckwalter, J. F., ”Energy Efficiency of Optoelectronic Interfaces in Scaled FinFET and SOI CMOS Technologies,” Optical Interconnects Conference, IEEE, 2015 and Jun Li; Xuezhe Zheng; Ashok V. Krishnamoorthy; Buckwalter, J. F.,” Scaling Trends for Picojoule per Bit WDM Photonic Interconnects in CMOS SOI and FinFET Processes,” is submitted to Lightwave Technology, Journal of, IEEE. This research was developed with partial funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and /or findings contained in this article/presentation are those of the author(s)/presenter(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. This dissertation author was the primary author of this material.
This chapter describes the proposed technique, circuit concepts and prototype demonstration of a high speed analog radar signal processor with offset calibration. The reconfigurable circuit presented in this work is demonstrated with 90 nm CMOS and it achieves 15 cm range resolution with power consumption of 42 mW. Section 4.1 illustrates the pulse compression radar system. Radar sensitivity and range resolution trade-offs are discussed in section 4.2. The design of the prototype analog radar signal processor is presented in section 4.3.

4.1 Pulse Compression Radar System

Advances in silicon technology have made possible new sensor applications at millimeter-wave bands that require low power and low cost. As a result, silicon integrated beamforming architectures and phased arrays have been demonstrated for millimeter-wave automotive radar system [4], high definition content streaming [24], and satellite systems [25]. This work focuses on millimeter-wave radar circuitry such as shortrange radars for parking assistance or side-crash prevention which needs wide bandwidth for high range resolution. Other applications include range detection to maintain safe driving distance between vehicles in heavy traffic [26]. Intelligent adaptive cruise control (ACC) system is also possible with
long-range radars to perform a real-time response by means of the braking system or other protective mechanism. With proper control algorithms, anti-collision systems would greatly reduce traffic casualties.

In general, two main radar systems have been proposed [28]: frequency modulated continuous wave (FMCW) radar [4,27,66] and pulse compression radar (PCR) [29,67–70]. FMCW radar measures the range by using linear frequency modulation which results in a low cost architecture, but requires two isolated antennas for high receiver sensitivity. Pulse compression radar (PCR) uses digital signal modulation and time-division duplexing of the RF between transmit and receive.

![Proposed mmWave and analog processing based PCR system.](image)

**Figure 4.1**: Proposed mmWave and analog processing based PCR system.
In this paper, a high dynamic range baseband signal processor is presented for PCR. As shown in Fig. 4.1, the direction and range of a target is determined through a combination of the spatial selectivity of a millimeter-wave beamformer and analog signal processing. In earlier work, a low-power analog signal correlator and delay-locked loop (DLL) was demonstrated for broadband signals [11]. However, the analog correlator can not offer the high-dynamic range required for a radar receiver. Instead, this requires the combination of a variable gain amplifier (VGA) which can dynamically adjust the gain of the receiver in anticipation of the round-trip loss of the signal. In this chapter, the requirements on a high dynamic range PCR implemented in a CMOS process are discussed and, particularly, the dc offsets from the VGA require calibration techniques. The PCR receiver introduces a digital-to-analog converter (DAC) to compensate the offsets with low power overhead. Compared to conventional analog calibration techniques, the proposed calibration technique does not need additional monitoring and pulse processing circuits. Furthermore, it is synthesizable and scalable in CMOS. This chip is fabricated in 90-nm CMOS and consumes 42 mW at a peak data rate of 1 Gb/s for a 15-cm range resolution.

4.2 Trade-offs in the PCR System

The range resolution, \( \Delta R \), of a conventional pulse compression radar (PCR) system is determined by bandwidth \( B \) of the transmitted pulse, the types and sizes of targets, and transceiver specifications. The minimum detectable signal \( (P_{R,MIN}) \) of a RF receiver is

\[
P_{R,MIN} = -174dBm + NF + 10\log_{10}B + SNR_{min}
\]  

(4.1)

where NF is the total receiver noise figure, and \( SNR_{min} \) is the minimum signal to noise ratio (SNR) required to reliably receive a specific modulation format. From this expression, the maximum range of a radar system is determined;
\[ R_{\text{MAX}} = \sqrt[4]{\frac{P_T}{P_{\text{R,MIN}}}} \frac{\lambda^2 G_T G_R \sigma}{4\pi^3} \] (4.2)

where \( P_T \) is the transmitted power, \( \lambda \) is the carrier wavelength, \( G_T \) and \( G_R \) are the respective transmit and receive antenna gains, and \( \sigma \) is the radar cross section (RCS). At 77 GHz, the reflected wave of an PCR system would be attenuated by approximately -105 dB at a distance of 10 meters when \( \sigma \) equals 100 cm\(^2\) and the antenna gain of 18 dBi. With 12 dBm of transmit power and 0 dB SNR, the minimum receiver sensitivity should be -93 dBm. Therefore, the range of a PCR system integrated in a CMOS technology will be fundamentally limited by the peak transmit power and the receiver sensitivity. The peak transmitted power is limited by the CMOS process while receiver sensitivity is improved by increasing the dynamic range of VGA. However, there is an opportunity to control the maximum range through a trade-off with the resolution of a PCR system. The range resolution is determined by the bandwidth (B) - and consequently - minimum symbol period \( T_s = 1/B \) that can be transmitted and received.

\[ \Delta R = \frac{c T_s}{2} = \frac{c}{2B} \] (4.3)

where \( c \) is the speed of light. Considering point target detection, Fig. 4.2 indicates the trade-off in the range resolution and the range of a radar system as a function of the signal bandwidth. Increased bandwidth improves the range resolution to centimeter scales. However, the additional bandwidth comes at a penalty of higher noise, reducing the SNR, and range of detection also illustrated in Fig. 4.2.

Therefore, we reach two conclusions. First, a trade-off between range and range-resolution is evident. To improve the range resolution, the signal BW should be increased. To improve the range, the signal BW should be decreased. Second, the ratio of the range to the range resolution is rapidly improving as more bandwidth can be exploited for the PCR system. Furthermore, high receiver sensitivity needs to overcome dynamic range issue of the received signal. Therefore, including a VGA is necessary to compensate for signal path loss. A specialized PCR baseband processor can introduce two reconfiguration features. First, the bandwidth
Figure 4.2: Maximum range $R_{MAX}$ and range resolution $\Delta R$ as a function of bandwidth under a 12 dBm peak power and 0 dB SNR constraint.

is adapted to improve the range at the expense of the resolution [11]. Second, the gain is adjusted to the signal power variation at the signal correlator of the receiver. We explore both of these techniques in the following sections.

4.3 High Speed Analog Correlation Technique

Since a CMOS-based system has limited peak transmit power, the proposed range is relatively short compared to other radar systems. This implies that the round-trip propagation time of the signal is short and does not allow the use of long radar codes. For example, 1 meter allows for only 6.6 ns of round-trip travel. For pulse compression radar, many different codes have been developed. Barker codes are particularly well-suited for PCR because the auto-correlation of the code features optimally flat sidelobes.

A 7-b Barker code is illustrated in Fig. 4.3. For a Barker code of length $N$, a
peak auto-correlation value of $N$ occurs for zero lag and the sidelobes are below 1 for all other signal lags. The correlation between the template and the received signal can be implemented in the digital domain or the analog domain at different circuit costs. Fig. 4.4(a) and Fig. 4.4(b) illustrate the digital and analog correlation circuitry to detect the received Barker code. After round trip loss compensation implemented with a VGA, a digital correlator samples the received signal first at the symbol rate and performs auto-correlation in digital domain. Since the ADC must operate at the symbol rate, this approach consumes substantial power to produce the symbol rate correlation. Additionally, a high-speed digital multiplier is also required in the digital signal processing (DSP).

The analog correlation approach substantially lowers the power required to
implement the correlation. Since the correlation occurs in the analog domain, the sampling rate of the ADC is determined by the code rate, which is lower than the symbol rate by a factor of N. However, analog correlation relies on accurate alignment of the template signal with the received signal and the elimination of dc offsets which might impact the integrator. In this work, a delay-locked loop is proposed to provide an 8-phase clock to retiming the template for analog correlation. Digital-assisted calibration techniques are proposed to cancel the dc offset and reduce the timing misalignments. Since the power consumption of DLL is relatively low, the analog correlation is preferable. Varying the speed from 50 Mb/s to 1 Gb/s, the minimum range detection can be configured to be 3 m and 15 cm.

**Figure 4.5:** Dynamic range requirements of each block in the receiver are detailed to illustrate how the baseband signal processing circuit can reduce the required dynamic range.

Before examining design details, the specifications of the PCR system must be evaluated. Fig. 4.5 outlines an intuitive example of the maximum and minimum detectable signal of a PCR system. With proper baseband signal processing, VGA gain and code type would be adaptive to compensate received signal and achieve enough SNR for the successful detection. However, when dc offset exists in signal path and is amplified with highest gain for low power input signal, the correlation
could be wrong due to the saturation of receiver chain.

### 4.3.1 Variable Gain Amplifier (VGA)

A variable gain amplifier (VGA) maximizes the dynamic range of the PCR receiver. VGAs can be built in discrete gain step with digital control signal [71–73] or in continuous gain with an analog control signal [74, 75]. In general, digitally-controlled VGAs use binary-weighted arrays of resistors or capacitors for gain variations and analog VGAs adopt a variable transconductance or resistance to control the gain.

![Variable Gain Amplifier Schematic](image)

**Figure 4.6:** Schematic of variable gain amplifier.

For PCR system, a wideband and wide dynamic range VGA is needed to compensate the large received power variation. Therefore, a current splitting approach as shown in Fig. 4.6 is proposed in this paper. With current splitting, both load and bias current of input transistor are constant which gives low VGA group delay imbalance and high gain control range. By cascading two stages, 6-bits digital control for each stage, 52 dB dynamic range and 800 MHz bandwidth is achieved.
4.3.2 Wideband Analog Correlator

Analog correlators have been previously investigated for high-speed UWB transmitters [76]. Fig. 4.7 shows the analog correlator and integrator. Since Friis loss has been compensated by the VGA, the input signal swing to the correlator exhibit less dynamic range.

![Diagram of analog correlator](image)

**Figure 4.7:** Schematic of analog correlator.

When the template and received signal have the same polarity, the output current charges the load capacitor $C_{int}$. When the template and received signal have reverse polarity, the load capacitor is discharged. Consequently, if the Barker code template aligns correctly with the received signal, the voltage on the capacitor is charged to its maximum value and represents the desired auto-correlation of the signal. Apart from previous work [29], the correlator is controlled digitally to reduce power consumption and system complexity. Note that one potential problem with the analog integrator occurs when the system is operated over a wide range of symbol rates. At low rates, the voltage that results from the integration increases rapidly because of the long period of the symbol. To compensate, a 2-b capacitor bank is designed for different template speed. By changing the slew rate
properly, the total charge on the capacitor would be the same even with different symbol rate.

### 4.3.3 Wide Range Delay Lock Loop (DLL)

Since misalignment between received signal and template signal will degrade the analog correlation performance, re-timing circuitry fine-tunes the phase of the template signal. This alignment covers the entire symbol rate range from 50Mb/s to 1 Gb/s. There are many ways to achieve this feature. For example, designing a high frequency wide-band LC-VCO based PLL and take the output signal from VCO or divider outputs as wide range clock [77–79]. Ring-VCO based PLL with injection locking might also helps wide range operation [80–82]. However, the system needs low power and low jitter solution to achieve > 100% tuning range. A multi-range, multiplying type wideband DLL is proposed in this paper.

![Figure 4.8](image)

**Figure 4.8:** (a) Multi-range delay lock loop (b) Proposed Wideband DLL.

As shown in Fig. 4.8(b), proposed voltage controlled delay line (VCDL) is composed of 32 current-starved delay cells. For low jitter design, a 4-bits thermometer controlled current source breaks the delay tuning into 4 discrete curves [83,84] which reduce the gain $K_{DL}$. Then, the minimum delay is controlled with a fine-tuning current source while the other 3 timing steps are digitally controlled to cause an overlapping tuning curve as shown in Fig. 4.8(a). For wide range operation, an edge combiner is used to further increase the working range [85,86].
Therefore, 12 discrete bands achieve continuous tuning from 50 MHz to 1 GHz. Since we need multiphase output clock to shift the template signal, a 2 to 1 MUX array based phase selection logic provides proper combination of clock signals to edge combiner.

![Figure 4.9: Delay cell of proposed VCDL.](image)

Fig. 4.9 illustrates the detail of delay cell. Current starving delay cell is designed with analog and digital control. Phase detector (PD) and charge pump (CP) are same as previous work [87]. In order to achieve better jitter performance, the CP is designed to work from 0.3 V to 0.9 V. To generate 8 phases, different combinations of \( MC < \ast > \) are chosen by the digital control bits. These signals re-time the template until it aligns with the received signal. Consequently, the DLL can align the template signal with received signal to within 1/8 of the symbol period \( T_s \).

### 4.3.4 High Speed Analog-to-Digital Convertor (ADC)

To sample the auto-correlation of a Barker code \((N \lg 7)\), the ADC resolution should be at least 3 bits. A high-speed 4-bits Flash ADC illustrated in Fig. 4.10(a) is designed by utilizing average and interpolation techniques [88, 89]. Interpolation reduces the number of pre-amplifiers from 15 to 6 and the output of
the integrator in the correlator implements a zero order hold filter for the input of the ADC. To reduce glitches, a gray coding encoder between thermometer and binary code is inserted. In this work, the ADC can work up to 2 GS/s.

Figure 4.10: (a) 4-bits Flash ADC (b) preamplifier with average technology.

Fig. 4.10(b) shows the proposed pre-amplifier with interpolation technique [90]. Conventional designs use two differential amplifiers to detect the positive and negative signals simultaneously. This requires that the differential pairs are linear over the full signal range which results in low transconductance and high load resistance for high gain. In addition, the comparator becomes narrowband design due to the high resistance load. By swapping the $V_{refp}$ and $V_{INN}$, the pre-amplifier detects the positive and negative signal separately. Then, the linear range is reduced which results in higher transconductance and lower load resistance for high speed design. As for interpolation, effective resistor $R_X$ can be achieved based on equation

$$R_X = R_A \left(1 + \frac{\sqrt{1 + 4R_L/R_A}}{2}\right)$$  \hspace{1cm} (4.4)

By splitting the resistor $R_A$ into three small series connected resistors, two more interpolated nodes are available to quantize the output. Due to high-speed data outputs, the quantized signal is transmitted to FPGA with an on-chip LVDS interface. In this work, the common mode voltage of the LVDS transmitter is 1.25 V with a 2.5 V power supply. A 100 ohm differential termination is placed on-chip for matching. A level shifter is designed to support 2.5V operation of LVDS.
4.4 Calibration Techniques for PCR System

The dc offsets caused by the baseband circuits reduce the sensitivity of the PCR system and high-resolution ADC blocks. For narrowband applications, techniques such as chopping [14] or correlated double sampling (CDS) [15] have been proposed to solve this issue. For wideband applications, continuous time analog calibration is preferable [16]. However, in the PCR system, the signal is wideband for the fine range resolution and the dc offset should be eliminated to maintain SNR.

Figure 4.11: (a) Analog calibration (b) Proposed digital assisted calibration.

Fig. 4.11 illustrates continuous-time analog dc offset calibration and the proposed digitally-assisted dc offset calibration. Analog calibration is suitable for wideband systems but it needs peak detector and integrator circuits which consume more power. The circuit continuously integrates the amplified signal to extract the dc offset information. In general, the integration time for accurate dc offset extraction varies from $\mu S$ to $mS$.

However, PCR systems work with pulse signals. Due to the limitation of peak power, the pulse is usually too short to provide enough time for analog integration. Therefore, we propose digital assisted calibration techniques to reduce the dc offset extraction time. As shown in Fig. 4.11(b), according to the ADC output, the calibration engine does LMS searching to eliminate dc offset effects in signal path. Furthermore, this engine is pure digital and process scalable. Here, DAC position is just an example. The trade-off of inserting DAC into different nodes would be discussed in the following.
4.4.1 Digital-assisted Offset Calibration

![Figure 4.12](image)

**Figure 4.12:** Definition of gain and offset in the baseband circuit. Several proposed offset cancellations using a DAC are proposed in red for comparison.

An offset calibration could be introduced at several points in the PCR signal processing circuit as shown in Fig. 4.12(ad). Digital calibration shown in Fig. 4.12(d) requires that the correlated signal must not saturate the analog circuitry and still requires a high ADC resolution to detect the integrated dc offset independent from the signal. Alternatively, if the offset cancellation voltage is introduced before the ADC as illustrated in Fig. 4.12(c), the dc offset from the VGA may still saturate the analog integration circuitry. Fig. 4.12(a) has the advantage of directly cancelling the offset at the input to the VGA. However, the calibrations must accommodate the dynamic range of the VGA and this requires high-resolution and, consequently, high power in the DAC. Therefore, Fig. 4.12(b) is the architecture that is proposed in this work. After round-trip loss compensation, the VGA is intended to amplify the signal to a relative low dynamic range as shown in Fig. 4.4(c) to $120mV_{pk-pk}$ with effective dc offset up to $60mV_{pk-pk}$. Since analog correlator also has voltage gain, the additional offset voltage referred from ADC block does not dominate the analog signal processing circuitry.

Fig. 4.13 presents simulation results of the proposed offset calibration. Fig. 4.13(a) describes the compensated signal with amplified offset of 60 mV. Fig. 4.13(b) shows the calibrated signal with $< 10mV$ offset after the DAC. Fig. 4.13(c) shows the correlation results with and without calibration. In the presence of a dc offset, the integrated signal voltage would change and degrade the
SNR performance or result in an incorrect detection. For example, if calibrated integration (315 mV) generates digit 7 in ADC which is 45 mV for 1 LSB, the uncalibrated integration (210 mV) reduces the peak signal to sidelobe level from 16.9 dB to 12 dB. Therefore, calibration is required to recover the 4.9 dB and reach the maximum detection SNR.

Figure 4.13: (a) Rx with offset (b) calibrated Rx (c) correlation w/wo dc offsets.

The proposed digitally-assisted calibration adaptively configures the DAC and DLL re-timing circuit to cancel the dc offset and misalignment effects in closed loop. Firstly, considering the template inputs of correlator are digital signals, the offset of those ports can be treated as timing error in the digital domain. Here, the templates are assumed to be perfectly symmetric to limit the discussion of dc
offset calibration at analog inputs. Then, the dc offset is expressed as

$$V_{OFF} = V_{VGA}G_{VGA} + V_{COR} + \frac{V_{ADC}}{G_{COR}} \quad (4.5)$$

where $V_{OFF}$ denotes the dc offset introduced before the correlator, $V_{VGA}$ and $G_{VGA}$ are the effective dc offset and voltage gain of the VGA, $V_{COR}$ and $G_{COR}$ are the effective dc offset and voltage gain of the correlator, $V_{ADC}$ is effective offset of the ADC.

To determine the appropriate offset compensation for $V_{OFF}$, two steps need to be completed. If the template is set to generate digital values of +1 and −1 shown here as a differential comparison, then the difference is filtered as shown in Fig. 4.14(a), the offset voltage would be integrated over the correlation period, in this case the length of the Barker code $N$. The voltages at each nodes are expressed as

$$V_R = V_{SIG} + V_{OFF} \quad (4.6)$$

where $V_{SIG}$ is the compensated signal after VGA and $V_R$ is $V_{SIG}$ shifted by the undesired offset $V_{OFF}$. Therefore, the output of the integrator is

$$V_{OUT} = \frac{V_{OUT+} - V_{OUT-}}{2} = NV_{OFF} + \sum_{n=1}^{N} V_{SIG}(n) \quad (4.7)$$

where $V_{OUT+}$ and $V_{OUT-}$ are the differential outputs when the received signal is multiplied with voltage polarity +1 and −1. Since the second term in eq. (4.7) is the sum of the Barker code, the integrated signal should be ±1LSB. Therefore, the offset voltage is calibrated when the DAC nulls the first term of eq. (4.7), i.e. $NV_{OFF} = 0$. Since $NV_{OFF}$ accuracy would be limited by ADC resolution, the calibrated offset voltage would be below $1/N$ of 1LSB. To remove the undesired offset $V_{OFF}$ before the integrator, a DAC voltage $V_{DAC}$ is added to the signal to produce offset cancellation.

$$V_{R-COMP} = V_R + V_{DAC} \quad (4.8)$$

where $V_{OFF} \approx -V_{DAC}$ Therefore, the correlation result is
Figure 4.14: (a) dc offset extraction (b) dc offset calibration.

\[ V_{DET} \approx \left| \sum_{n=1}^{N} V_{R-COMP}(n) \times V_{TEMP}(n) \right| \approx 1LSB \]  

(4.9)

where \( V_{TEMP} \) is the template signal which should be less than 1LSB after calibration.

The side-lobe reduction (SLR) determines how well the PCR receiver detects the Barker code and is defined as

\[ SLR \approx 20\log(|Q[\sum_{n=1}^{N} V_{SIG}(n) \times V_{TEMP}(n)]|) \]  

(4.10)

where \( Q[\ast] \) is the quantization operator. Fig. 9 illustrates the SLR versus offset-cancelling DAC resolution. For Barker codes of length less than 7, a 4-bit ADC is capable of quantizing the peak correlation results and the SLR is calculated based on the digital output. Since the DAC resolution limits offset calibration, the
DAC requires a minimum of 4-bits with full-scale range of $\pm 60mV$ to achieve $< 5mV$ offset residue for a $120mV_{pk-pk}$ input signal.

4.4.2 Template Alignment and Duty Cycle Distortion

Timing misalignment between the received baseband signal and template also degrades the correlation performance. Therefore, a DLL-based re-timing circuit fine-tunes the phase of the template signal. The proposed DLL circuit has been discussed above but is implemented here in conjunction with the offset calibration techniques. Here, we focus on the calibration method of template misalignment and duty cycle distortion.

To analyze the effects of template misalignment and duty cycle distortion, the re-timed template $V_{TEMP}$ is modelled as the sum of $V_{TEMP}$ and a voltage error $V_{PULSE}$, i.e. $V_{TEMP} = V_{TEMP} + V_{PULSE}$. This $V_{PULSE}$ is the sum of deterministic clock pulse position errors (PPE) and pulse width errors (PWE) illustrated in Fig. 4.16. Since these timing errors modify the correlation results and degrade the SLR performance, the proper DLL clock phase should be selected to provide the template signal closest to the correlator. Given a DLL time resolution of $\pm T_s/M$ where $M$ is the number of phases, the correlation result for a phase $0 \leq m \leq M$
in the presence of the compensated signal $V_{R-COMP}$ from (7) is

$$V_{OUT} = \frac{1}{M} \sum_{k=1}^{N \cdot M} V_{R-COMP}(k) \times V'_{TEMP}(k - m)$$

(4.11)

where the final approximation assumes that the offset compensation is appropriately handled. The appropriate DLL clock phase $m$ is chosen to maximize $V_{OUT}$. In the absence of any PPE and PWE errors, the peak value of the correlator integration would be degraded by a phase offset between the received signal and the template according to

$$\frac{1}{M} \sum_{k=1}^{N \cdot M} V_{R-COMP}(k) \times V_{PULSE}(k - m) = N(1 - \frac{m}{M})$$

(4.12)

This suggests that the appropriate phase must be chosen to correct for the pulse error at the possible expense of lower peak voltage. If the maximum value of $V_{OUT}$ is $N$, the appropriate phase is

$$m \geq \frac{1}{N} \sum_{k=1}^{N \cdot M} V_{R-COMP}(k) \times V_{PULSE}(k - m)$$

(4.13)
Fig. 4.17 illustrates the SLR as a function of different timing errors. After dc offset calibration, the analog signal for correlator would have $120mV_{pk-pk}$ swing, but the template signal could have PPE and PWE with respect to the received signal. The re-timing clock phase m removes the PPE and the target SLR could be achieved. Since a DLL is used to generate multiphase clock, the additive jitter is small and should allow the PWE to be neglected. Otherwise, a higher resolution DLL would be needed for calibration.

4.5 Experimental Results

The system measurement setup is shown in Fig. 4.18. The chip is reconfigured through a connection to an FPGA board. Three signal generators (864C and E4438C) are used to provide both a 500 MHz and 2 GHz clock to the FPGA and DAC, 125/250 MHz clock to the on-chip DLL. Agilent real-time oscilloscope MSO8104A and spectrum analyzer E4448A are used to measure the DLL output performance. To evaluate the clock performance and check the phase shifts with different configuration, a DLL test buffer is designed. The SPI transmitter and Barker code signal generator are implemented by Xilinx ML605 FPGA board. An Analog Device DAC AD9738a-FMC-EBZ board is also used to generate the
template signal. The DAC outputs swing can be programmed from 10 mV to 200 mV. Different amplitudes are used to emulate different received signal swing. The dc offset penalty is also measured as comparison versus different Barker code and bandwidth.

The circuit is fabricated in a 90-nm digital CMOS process. The chip microphotograph is shown in Fig. 4.19 and has a measured active area of 1.3 $mm^2$. The circuit operates from a 1.2 V supply and consumes 42 mW including 10 mW for the VGA, 22 mW for the Flash ADC, 8.5 mW for the DLL, and 1.5 mW for the analog correlator.
4.5.1 Wide Range Delay Lock Loop

Wide-range (multiplication) and multi-phase clock are evaluated in time and frequency domain. First, Fig. 4.20(a) shows time domain waveform of the frequency multiplication. A 250 MHz clock is used as input signal, then the DLL generates 250 MHz, 500 MHz and 1 GHz outputs in ×1, ×2 and ×4 modes. The PWE clearly becomes worse when enabling the multiplication. However, the retiming circuit is only rising edge-sensitive the PPE is the more significant concern.

Fig. 4.20(b-d) illustrates the spur performance in different modes when DLL input clock is 250 MHz. The spur mainly comes from the duty cycle distortion due to mismatch of delay cells in the delay line. In ×1 mode, 250 MHz frequency offset spur is measured. In ×2 mode, 500 MHz clock is achieved but the frequency spur is still at 250 MHz offset. Same thing happens in ×4 mode for 1

**Figure 4.20**: Clock multiplication in time and frequency domain.
GHz clock. Therefore, for same output clock frequency, low multiplication factor or high frequency clock source is desired.

![Figure 4.21: Multi-band DLL tuning range.](image)

Fig. 4.21 demonstrates the proposed multi-range concept. In \( \times 1 \) mode (band 1-4), the DLL output range is from 50 MHz to 380 MHz. In \( \times 2 \) mode, the edge combiner multiplies the frequency by 2 and covers 100 MHz to 760 MHz. Due to mismatch of the delay cells and edge combining paths, \( \times 4 \) mode can only work up to 1.2 GHz. With 500 MHz output in \( \times 2 \) mode, the power consumption of the DLL is 8.5 mW.
4.5.2  DC Offset Calibration

The proposed dc offset calibration technique is measured as shown in Fig. 24. It compares the SLR performance with different symbol rates and barker codes. Without the calibration, short code correlation is easily destroyed by dc offset. Long code also has the SLR degradation due to the integration of offset. After using proposed calibration, the SLR performance could be mostly recovered. The proposed technique would minimize the dc offset effect below 1 LSB which does not need high performance peak detector and integrator. In addition, because proposed calibration has limitation of ADC and DAC resolution, there is still some performance drop when barker code runs up to 1Gb/s. Better calibration could be achieved if the ADC and DAC has more resolutions which would also cost more power.

![Figure 4.22: SLR comparison with digital-assisted dc offset calibration.](image)

4.5.3  Misalignment And Duty Cycle Calibration

Since the system is reconfigurable from 50 Mb/s to 1 Gb/s, two symbol rate and three types of Barker code are shown in Fig. 4.23. Fig. 4.23(a-c) plots the output of the ADC at symbol rates of 125 Mb/s and 1 Gb/s for a 3, 5, and 7b Barker code. In each case, the peak correlation is expected to be equal to the code
length $N$. The Barker code is repeated every 400 ns. The swing of the correlator input signal is $\sim 60mV$. For different DAC output amplitudes, the received signal has already been compensated by the VGA. Fig. 4.23(d-f) superimposes the peak auto-correlation found for two symbol rates. In both cases, the general shape of the Barker code auto-correlation is evident.

**Figure 4.23:** (a) Correlation of 3b barker (b) Correlation of 5b barker (c) Correlation of 7b barker (d) Auto-correlation of 3b barker (e) Auto-correlation of 5b barker (f) Auto-correlation of 7b barker (g) SLR of 3b barker (h) SLR of 5b barker (i) SLR of 7b barker.

Finally, the system SLR degradation is plotted in Fig. 4.23(g-i) as a function of DLL phase offset. Note that dc offset calibration technique is applied already. The system use misalignment calibration technique to reduce the alignment error
< 0.125\text{T}_s. In conclusion, the SLR degradation is more severe for higher data rates and more tolerant to timing misalignment at lower data rates.

Table 4.1: Performance Summary and Comparison

<table>
<thead>
<tr>
<th></th>
<th>This Work</th>
<th>[4]</th>
<th>[67]</th>
<th>[29]</th>
<th>[68]</th>
<th>[70]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>90 nm CMOS</td>
<td>65 nm CMOS</td>
<td>180nm CMOS</td>
<td>90 nm CMOS</td>
<td>130 nm CMOS</td>
<td>130 nm CMOS</td>
</tr>
<tr>
<td>Radar</td>
<td>PCR</td>
<td>FMCW</td>
<td>PCR</td>
<td>PCR</td>
<td>IRUWB</td>
<td>IRUWB</td>
</tr>
<tr>
<td>Bandwidth (BW)</td>
<td>1.2 GHz</td>
<td>700 MHz</td>
<td>∼ 5 MHz</td>
<td>1 GHz</td>
<td>6-10 GHz</td>
<td>2 GHz</td>
</tr>
<tr>
<td>Template (Code)</td>
<td>2/3/5/7 Barker</td>
<td>NA</td>
<td>Chirp</td>
<td>2/3/5/7 Barker</td>
<td>BPSK</td>
<td>100Mb/s OOK</td>
</tr>
<tr>
<td>Active Area (mm²)</td>
<td>1.3</td>
<td>1.045</td>
<td>5.67</td>
<td>NA</td>
<td>NA</td>
<td>4</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>42</td>
<td>243</td>
<td>62.6</td>
<td>22.4</td>
<td>253.6</td>
<td>15.4</td>
</tr>
</tbody>
</table>

Table 4.1 compares the performance with previous works. This work demonstrates the lowest power consumption for a PCR signal processing circuit. Compared to reference [67], this work implements reconfigurable wideband system for high range resolution detection. Compared to previous work [29], this work implements the system with all the blocks and propose the correlator with lower power. Event driven SAR type ADC can also be used to further reducing the system power.

4.6 Conclusions

This paper presents an analog correlation based system architecture for the detection of pulse compression radar signals. The signal processing circuit includes a VGA, a DAC, an analog correlator, a Flash ADC, and a multiplying type DLL.
Digital-assisted calibration techniques are proposed for dc offset cancellation and misalignment correction. Automatic calibration engine could be synthesized on chip in future work. This chip is implemented in 90 nm CMOS and consumes 42 mW at a peak data rate of 1 Gb/s for 15 cm range resolution. Therefore, significant power and area can be saved by this architecture which leads to a low cost solution. Moreover, this work provides a realization example even with 52 dB dynamic range of the received signal, which reveals promising potential for future automotive applications.

Acknowledgements

Chapter 5

3 Gb/s Radar Signal Processor
With an IF-Correlation Technique

This chapter presents a dual mode IF-signal processing circuit for pulse compression radar (PCR) and symbol recovery. A half-duplex architecture is proposed to support signal transmission and reception. For range sensing, the proposed IF correlation technique supports 1.5 GHz bandwidth and 3/5/7 bits Barker codes for 10-cm range resolution. For data communication, the modulator and demodulator support up to 3 Gb/s QPSK signal. Section 5.1 illustrates the dual mode bidirectional PCR system. The design of the prototype high speed radar signal processor is presented in section 5.2.

5.1 Bidirectional System for PCR and Point-to-Point Communication

Automotive sensors at 77 GHz in the US and 79 GHz in Japan have been demonstrated the potential for radar applications using silicon technology with low power and low cost. While these sensors are based on frequency-modulated continuous wave (FMCW), pulse compression radar (PCR) techniques are also possible for short-range radar with high range resolution. Pulse compression radar (PCR) uses digital signal modulation and time-division duplexing of the RF between trans-
mit and receive. In particular, wideband channels at 57-64 GHz, 71-76 GHz, and 81-86 GHz have been proposed for high data rate communication for short range and backhaul applications. Therefore, dual-mode transceiver techniques that alternately support data communications and radar could make a significant impact in public safety systems. A network of beamforming sensors which could both detect the range of people and objects in its field of view as well as support communication with these objects enables a layer of network intelligence that is not currently possible [91].

Figure 5.1: Proposed mmWave and analog processing based PCR system.

This work presents a dual-mode, low-power mixed-signal processing circuitry for wideband channels. As shown in Fig. 5.1, I/Q baseband signals are up-converted to a complex intermediate frequency (IF) at 5 GHz. An IF SPDT switch is used to interface with a bidirectional beamformer. The direction and range of the target is determined through a combination of the spatial selectivity of the millimeter-wave beamformer and PCR signal processing.

On the receive path, the IF signal is correlated with an I/Q representation of the IF signals. The product of the IF mixing is integrated to determine the
range of the signals. The use of this IF-correlation in the receiver has to the authors knowledge not been proposed. In earlier work, a low-power baseband signal correlator and a delay-locked loop (DLL) was demonstrated for broadband signals. Due to the existence of DC offset, a DLL is required to align the template with received signal and requires lots of signal processing steps to complete the time alignment and offset calibration.

The motivation for an IF signal correlation is that DC offset calibration is not required within the IF channel. The PCR system that implements the IF correlation is discussed in Section II. System specifications and the performance trade-off are introduced. The AC coupling in VGA and correlator design avoids the DC offset calibration. The phase information can be achieved through I/Q correlations and the DLL is not necessary any more. Fewer signal processing steps for fractional bit period alignments is possible. To demonstrate the concept, a QPSK system with 1.5 GHz bandwidth is demonstrated with low latency. Additionally, the configurable demodulator/correlator can be used for classical data transmission by disabling the template modulation. Furthermore, the proposed system is also suitable for short range point to point wireless communication with the data rate up to 3 Gb/s.

5.1.1 Bidirectional System Frequency Plan

![](image)

Figure 5.2: LO and IF frequency plan in PCR mode and Point-to-point mode.

Fig. 5.2 describes the IF-correlation frequency plan for the PCR and point-to-point communication. The IF band extends from 4 to 7.5 GHz and is centered...
at 5.6 GHz or 6 GHz. In PCR mode, local oscillator (LO) frequency is centred at 73 GHz. In point-to-point communication mode, LO\textsubscript{1} is centred at 67.8 GHz. The template upconversion mixer is configured to be the 2\textsuperscript{nd} downconversion LO\textsubscript{2} generator. The correlator is configured to be IF downconversion mixer. Then, the QPSK data can be demodulated at baseband. This IF implementation suggests that the front-end should provide image rejection. If, for instance, the proposed system works on the 71-76 GHz band, an image rejection filter is required to reject the popular 60 GHz band.

### 5.1.2 Hybrid Dual-Path PLL for Bidirectional System

![Hybrid Dual Path FracN PLL for Bidirectional Transceiver](image)

**Figure 5.3**: Hybrid Dual Path FracN PLL for Bidirectional Transceiver.

Fig. 5.3 illustrates a hybrid dual-path fractional PLL for bidirectional system. A narrow band hybrid PLL has been proposed at low frequency [92]. Hybrid mixed mode topology makes the control loop not sensitive to the PVT variation which has been employed for short range low power applications [93–95]. With
wideband VCO, the PLL can be employed for bidirectional system [79]. Other technique as automatic amplitude control (AAC), automatic frequency control (AFC), CP-PLL fast settling can improve the phase noise and locking time [96–98].

5.1.3 IF-Correlation Techniques for PCR

The proposed IF correlator could support multiple pulse compression coding schemes such as Barker and complementary codes. Fig. 5.4 illustrates the IF-correlation of 7-bit Barker code. The maximum signal-to-noise (SNR) has a peak value that depends on the length \(N\) and occurs when the template aligns with the received signal. Received signal and local templates are modulated signal at IF band. When received signal and template signal is misaligned, the side-lobe reduction (SLR) drops according to the template shift. The Barker code has optimally flat auto-correlation sidelobes while the complementary code has a minimum sidelobe. The SLR results in one or zero is detected if the received signal shifts by a symbol. If the received signal shifts by a fractional symbol period, the correlation result is proportional to \(N \cos(\theta)\), where \(\theta\) is the phase difference between template and received Barker code which provides the fractional symbol.

\[ |R_{xx}| \]

\[ M/N \ldots \]

\[ -\Delta T \]

\[ 3\Delta T \]

\[ 7\Delta T \]

\[ 0 \]

\[ 1 \]

\[ 7 \]

\[ -7\Delta T \]

\[ -3\Delta T \]

\[ -\Delta \theta \]

\[ \text{Relative Delay} \]

Figure 5.4: IF-correlation of received 7-bits barker codes with different templates.
period alignment information. The range of $\theta$ is $[-\pi/2, \pi/2]$. The larger the $\theta$ is, the more attenuation the correlation would have. In addition, if the dc offset is included, the correlation result would drop from $N \cos(\theta)$ to $N \cos(\theta) - \alpha$, where $\alpha$ is the DC offset effect.

### 5.1.4 System specifications and the performance trade-off

![Proposed bidirectional system specifications](image)

**Figure 5.5**: Proposed bidirectional system specifications.

Fig. 5.5 illustrates the bidirectional system budget in two modes. In PCR mode, the range resolution, $\Delta R$ is determined by bandwidth $B$ of the transmitted pulse, the types and sizes of targets, and transceiver specifications. Given the minimum detectable signal ($P_{R,MIN}$) of a RF receiver and transmit power $P_T$, the maximum range of a radar system is determined from

$$R_{MAX} = \sqrt{\frac{P_T}{P_{R,MIN}} \frac{\lambda^2 G_T G_R \sigma}{(4\pi)^3}}$$  \hspace{1cm} (5.1)$$

where $\lambda$ is the carrier wavelength, $G_T$ and $G_R$ are the respective transmit and receive antenna gains, and $\sigma$ is the radar cross section (RCS). At 79 GHz, the
reflected wave of an PCR system would be attenuated by approximately -97 dB at a distance of 20 m when $\sigma$ equals 1 $m^2$ assuming the system has 6 dBi antenna gain with 16 elements, 12 dBm $P_T$, 100 MHz bandwidth, 9 dB NF and 0 dB SNR. With 2 GHz bandwidth, the maximum range is reduced to 10 m. While the range of the system reduces with the bandwidth increasing, the range resolution reduces faster resulting in an net increase in the ratio of the maximum range to the range resolution. In addition to PCR, this system can be reconfigured for short range point-to-point wireless communication. The maximum distance for data transmission is extended to be 40 m due to the link budget does not calculate round trip loss as that in PCR mode. Since the E-band frequency resource for backhaul network are not continuous, the data rate would be limited by 5 GHz available bandwidth. This work implements two mode operation with 3 GHz IF bandwidth. In point-to-point mode, the correlator in Gray color would be configured to be the $2^{nd}$ downconversion mixer.

5.1.5 Baseband Correlation and IF Correlation Techniques

Figure 5.6: Received signal with offset and analog correlations.
Prior work has demonstrated low power correlation with an analog circuit implementation [30]. However, multiple echo signal is required to accomplish the object detection. In other words, the detection procedure brings larger latency. Therefore, the closed-loop calibration limits the moving object detection. With IF correlation, DC offset calibration is not necessary. This avoids the closed-loop calibration and makes the quick response possible. However, I/Q architecture is required to recover the correlation amplitude and remove the carrier phase misalignment effects.

Figure 5.7: Baseband correlation with offset calibration.

The baseband correlator correlates the template code with received signal at baseband. Therefore, classical homodyne or zero-IF architecture can be employed as the RF front-ends. With homodyne receiver, dc offsets would be created by LO self-mixing, blocking signal self-mixing and receiver nonlinearity. The PCR system must calibrate the dc offset to avoid the receiver saturation or false detections. The on-chip path of self-mixing can potentially be eliminated in the layout. But for the nonlinearity created offset, a calibration circuit is required. Reported work proposed an IP2 calibration circuit to improve the nonlinearity of mixer [99]. Unfortunately, this calibration is not suitable for analog correlation PCR due to the usage of template. In addition, the LO leakage in the reflected signal would
still mix with the receiver LO and then creates the dc offset. This offset is highly
dependent on the object size, angle, distance and it is degraded when overlapping
signals arrive. Therefore, a real-time digital-assisted pulse signal calibration has
been proposed and it has demonstrated the low power baseband correlation with
an analog circuit implementation [30]. Fig. 5.6 illustrates the effect of dc offset
in the baseband correlator. The SLR degrades from $N \cos(\theta)$ to $N \cos(\theta) - \alpha$,
where the dc offset effect is normalized as $\alpha = \frac{V_{DC,OFF}}{V_{CORR}}$. $V_{DC,OFF}$ is the
dc offset and $V_{CORR}$ is the signal swing at the input of baseband correlator. The
offset voltage $V_{DC,OFF}$ is removed by the DAC calibration term $V_{DAC}$. Fig. 5.7
describes the baseband correlation and calibration circuits. A 4-bit DAC removes
the dc offsets before the correlation product is integrated and sampled.

**Figure 5.8:** Proposed IF-correlation system.

Fig. 5.8 depicts the alternative IF correlation. With ac coupling, the dc
offset is removed for the bandpass correlation. In addition, the real-time calibration
loop can be removed as well to improve the sensor latency. However, the IF
correlation result would be affected by the phase difference $\beta$ between received
carrier and local carrier, where $\beta$ in $[0, 2\pi)$. For example, if $\beta = \frac{\pi}{2}$, the sampled
integration could be always 0 no matter how well the template aligns with the
received Barker code. Therefore, in IF correlation, misalignment of the carrier
brings an additional factor of $\cos(\beta)$ and the I/Q complex architecture can remove the $\cos(\beta)$ coefficient.

The IF correlation is calculated by creating an in-phase and quadrature temple for the signal. The template signals are represented as a complex bandpass signal.

$$I_{Temp}(t) = \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s}{T_s}\right] \cos(2\pi f_{IF}t)$$ (5.2)  
$$Q_{Temp}(t) = \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s}{T_s}\right] \sin(2\pi f_{IF}t)$$ (5.3)

where $a(n)$ is $' + 1'$ or $' - 1'$ for each symbol, $\text{rect}[*]$ is the window function for Barker code and $f_{IF}$ is the IF frequency. The transmitted signal is amplified by a factor of $\gamma$.

$$I_{Tx}(t) = \gamma \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s}{T_s}\right] \cos(2\pi f_{IF}t)$$ (5.4)  
$$Q_{Tx}(t) = \gamma \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s}{T_s}\right] \sin(2\pi f_{IF}t)$$ (5.5)

Then, the complex transmitted signal is expressed in the IF band as

$$S_{Tx}(t) = \gamma(I_{Tx}(t) + jQ_{Tx}(t)) = \gamma \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s}{T_s}\right] e^{j(2\pi f_{IF}t)}$$ (5.6)

The received signal is impacted by a signal loss $\epsilon$, misalignment $\theta$ and unknown phase $\beta$.

$$S_{Rx}(t) = \gamma \epsilon \sum_{n=0}^{N-1} a(n) \text{rect}\left[\frac{t - nT_s - \theta T_s}{T_s}\right] e^{j(2\pi f_{IF}t + \beta)}$$ (5.7)

The sampled I/Q IF correlation result after the low pass filter is

$$I_{COR}(t)|_{t=t_s} = H(f) \otimes [S_{Rx}(t) \otimes I_{TEMP}(t)] \approx \gamma \epsilon \frac{N \cos(\theta) \cos(\beta)}{2}$$ (5.8)  
$$Q_{COR}(t)|_{t=t_s} = H(f) \otimes [S_{Rx}(t) \otimes Q_{TEMP}(t)] \approx \gamma \epsilon \frac{N \cos(\theta) \sin(\beta)}{2}$$ (5.9)
Where $t_s$ is the sampling time of the correlator, $\otimes$ is the convolution operator and $H(f)$ is the low pass filter. Finally, the amplitude and phase of the correlation results are expressed by

$$\sqrt{I^2_{COR}(t_s) + Q^2_{COR}(t_s)} = \gamma e \frac{N \cos(\theta)}{2} \tag{5.10}$$

$$\arctan\left[\frac{Q}{I}\right] = \beta \tag{5.11}$$

Therefore, IF correlation eliminates the need for offset calibration. The I/Q correlation architecture recovers the correlation amplitude which is proportional to $N \cos(\theta)\sqrt{\cos^2(\beta) + \sin^2(\beta)} = N \cos(\theta)$.

**Figure 5.9:** Illustration of a three-point estimation for the phase misalignment in the correlation.

Now, the SNR of the detected correlation is penalized by ambiguity about the sampling phase that is represented by $\theta$. The $\cos(\theta)$ can be corrected with a three-point estimation. As shown in Fig. 5.9, this angle is inferred when comparing samples at different symbol period lags. In the absence of sampling error, the correlation results $[x_n]$ for sampling points of $[S_{-1}, S_0, S_1] = [S(t_s - T), S(t_s), S(t_s + T)]$ would be $[x_{-1}, x_0, x_1] = [0, N, 0]$. With $\theta$ lead or lag, the correlation results are $[\sin(\theta), N \cos(\theta), \sin(\theta)]$. When the timing misalignment is larger than one symbol, the correlation results are $[\sin(\theta), \cos(\theta), \sin(\theta)]$ or $[\cos(\theta), \sin(\theta), \cos(\theta)]$. 
In IF correlation, the $\beta$ estimate aids $\cos(\theta)$ correction especially when the alignment is bounded by $\pm 0.5T_s$ alignments is reached. The misalignment can be corrected to be $\frac{\theta}{\pi}T_s + \frac{\beta}{2\pi}T_{IF}$, where $T_{IF}$ is the period of IF carrier. At low symbol rates, the $\beta$ estimation is less important since a large number of cycles occur within one symbol period. At a symbol rate of 1.5 Gb/s, the phase difference $\beta$ could provide maximum $T_{IF} = 0.25T_s$ alignment information because each symbol has 4 periods of the 6 GHz carrier. The fractional symbol alignment information can be extracted by

$$\theta = \arctan\left[\frac{N \cdot x-1}{x_0}\right] = \arctan\left[\frac{N \sin(\theta)}{N \cos(\theta)}\right] \quad (5.12)$$

where I and Q are the correlated results of each channel. This implementation limits the type of code that can be used. For example, a Frank code is a phase modulation code with frequency modulation property. The received code can be expressed as

$$S_{RX}(t) = \gamma e^{j\varphi_n T_{IF} t + \beta} e^{j(2\pi f_{IF}t + \beta)}$$

where $\varphi_n$ can be 0, $\frac{\pi}{2}$, $\pi$ or $\frac{3\pi}{2}$. The correlation of $e^{j\varphi_n}$ can be $\pm1$ and 0. Therefore, Frank code will not be demodulated since the $\varphi_n$, $\theta$ and $\beta$ are mixed together and IF I/Q correlation could not recover the amplitude and phase. In other words, polyphase code requires RF I/Q mixers in addition to IF I/Q correlator and complex signal processing in digital domain.

<table>
<thead>
<tr>
<th>Table 5.1: Comparison of IF-Correlation and Baseband Correlation</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Correlation Technique</strong></td>
</tr>
<tr>
<td>DC offset</td>
</tr>
<tr>
<td>Missalignment</td>
</tr>
<tr>
<td>Calibration Technique</td>
</tr>
<tr>
<td>Detection latency</td>
</tr>
</tbody>
</table>
Table 5.1 compares the baseband and IF correlation. In terms of circuit overhead, the IF correlation requires an additional mixer to upconvert the baseband template to the IF band in the transmitter and receiver but it removes the need for a DAC and DLL. The misalignment information is extracted according to three subsequent correlations. In addition, the three point estimation requires one correlation per symbol shift. In baseband correlation, the co-existence of dc offset and misalignment prevents the usage of this estimation. As a result, the calibration engine has to remove the dc offset first and then shifts the template in fractional symbol steps to maximize the SLR. Furthermore, since the dc offset is dependent on reflected signal (LO self-mixing), different reflected LO leakage power could create different dc offset. Therefore, the system needs to do real-time iterations for dc offset calibration and multiple echo signal is required. Then, the detection time could be longer as expressed in Table 5.1 (detection latency).

5.2 Circuit Implementation

The proposed PCR correlation system includes an I/Q clock generator, a SPDT switch, a modulator, template generator, and a reconfigurable demodulator/correlator. By inserting a highly-linear SPDT switch as shown in Fig. 5.10, the proposed system is compatible with bidirectional time-division duplexing (TDD) system.

In PCR mode, the I/Q clocks are generated by divider 2 block. The local template codes modulate the I/Q clock to generate the template signal. The IF-correlator correlates the template with received signal by a Gilbert type mixer. A 4-bits capacitor bank is designed for different data rates. As for the transmitter, I/Q data is directly converted to IF band. In simulation, inductor peaking technique extends the BW beyond 10 GHz. In point-to-point communication mode, the receiver Gilbert type mixer is configured to be the demodulator. The receiver demodulator configures the the 4-bit capacitor bank to achieve maximum bandwidth. According to the frequency plan above, the IF clock can be obtained from /2 prescaler block of fractional-N synthesizer. The output I/Q clock are buffered
with additional clock driver to the modulator and demodulator. Each buffer stage consumes 3.5 mA and provides 350 mV\textit{pk}–\textit{pk} swing. The divider works up to 14 GHz which supports proposed two mode operations.

![Diagram](image)

**Figure 5.10**: Proposed system for range sensing and data communication.

### 5.2.1 High Linear SPDT Switch

A highly-linear SPDT switch is required for bidirectional transceiver. Low loss and high speed switches at RF band have been proposed in [100–102]. In this work, a two stage shunt-shunt SPDT is implemented for showing TDD operation and back-to-back measurements.

Since the system has 3 GHz BW with carrier at around 5.6 GHz, the bandwidth should exceed 7.5 GHz for pulse signal settling ($T_{\text{settling}} > \tau_{\text{SPDT}}$), $\tau_{\text{SPDT}}$ is the time constant of SPDT. For low loss, two 40\textmu m/300nm devices are cascaded in series. As shown in Fig. 5.11, 1.8 dB loss and > 10GHz -3dB BW is achieved.
5.2.2 PCR Transmitter and QPSK modulator

Fig. 5.10 describes the proposed pulse compression radar transmitter and QPSK modulator. I/Q channels are summed in current domain. In PCR mode, 3/5/7-bits Barker codes modulate the 5.9 GHz IF clock to be the transmitter pulse signal. Inductor peaking technique extends the bandwidth $> 10\text{GHz}$ and compensate the SPDT loss at high frequency. In point-to-point communication mode, the modulation ports in template generator are configured to be 1/0 pair. Then, the PCR transmitter becomes traditional IF modulator.

5.2.3 IF Correlator and QPSK Demodulator

Since the pulse rate is up to 1.5 Gb/s, the demodulator/correlator inputs ports are designed with bandwidth $> 10\text{GHz}$ for pulse signal settling ($T_{barker} > 6\tau_{SPDT}$). $T_{barker}$ is the symbol rate and $\tau_{SPDT}$ is the time constant of SPDT. $6\tau$ ensures 99% amplitude settling of the pulse signal. Note that the input capacitor of the demodulator/correlator increase the time constant of SPDT. Demodulator and analog correlator are reconfigurable as shown in Fig. 5.10. First, the analog correlator is a block with IF inputs but low frequency outputs. This allows the signal processing in later stage such as after ADC. Second, the correlator is working as

![Figure 5.11: Insertion loss of SPDT switch.](image)
the 2nd downconversion mixer if the template signal is pure IF clock. Therefore, we propose a two stage analog correlator in this work. In PCR mode, the I/Q clock signal is modulated by a Barker code and the template signal are generated with carrier frequency of 5.9 GHz. In point-to-point communication mode, the modulated signals are configured to be 1/0 pair. Then, the circuit becomes clock buffer. Then, the second stage becomes traditional downconversion mixer.

**Figure 5.12:** 7-bits Barker code modulator and Correlator.

Fig. 5.12 illustrates the simulations of modulator/correlator in PCR mode. 7-bits Barker code and its correlation is simulated. The I/Q baseband data Tx-I and Tx-Q are ‘111-1-11-1’ sequences. The pulse signal has the period of 50 nS and the bit rate is 1 Gb/s. Transmitted pulse Tx-IF has 250 $mV_{pk-pk}$ signal swing while the SPDT insertion loss is 1.8 dB at IF frequency. The received signal has 500 $mV_{pk-pk}$ swing and it correlates with the upconverted template signal Temp-I-IF.
and Temp-Q-IF to generate OUT-I and OUT-Q. Each channel of the correlator outputs has -160 mV to 160 mV differential signal range. This sets the ADC LSB to be around 16 mV. The integration bandwidth is adaptive to the signal bandwidth. In data communication mode, the pulse signal becomes classical QPSK data, then the template generator is disabled and the correlator works as demodulator. The integrator is configured with the smallest capacitor to increase the bandwidth back to 1.5 GHz.

5.3 Measurements

![Figure 5.13](image)

**Figure 5.13:** (a) Transmitter test setup (b) Receiver single tone test (c) Transceiver link test setup with two chips.

The system measurement set-ups are shown in Fig. 5.13 (a-c). Two chips are configured in Tx mode and Rx mode. Agilent 81134 pulse pattern generator
(PPG) provides Barker code and I/Q modulation signal. An Agilent signal generator E4438C is used to provide 11.3 GHz or 12 GHz clock to the chips. The system is evaluated in three steps.

Fig. 5.13(a) describes the test setup of Tx mode. The transmitted Barker code and QPSK signal are measured with Agilent spectrum analyzer E4448A. Fig. 5.13(b) illustrates the baseband test setup of Rx mode. Receiver gain is measured with single tone input. An Agilent real-time oscilloscope MSO8104A is used to evaluate I/Q gain and phase mismatch. Fig. 5.13(c) describes the transceiver setup with two chips. One is in Tx mode and the other is in Rx mode. Then, the received signal is achieved from the output of Tx mode chip. Two 180° hybrid couplers and one IF amplifier are inserted to build the TRx link. One coupler is 2-8GHz narda model 4343 and the other coupler is 2-18 GHz krytar model 402180. Since this link setup brings distortion and phase imbalance, the received I/Q signals have gain and phase error. In QPSK transceiver mode, template I/Q signal are biased to be 1/0 pair. In pulse compression radar (PCR) mode, another agilent pulse pattern generator (PPG) 81134 is used to generate the receiver template Barker codes. 3/5/7-bits Barker code correlations with speed from 200 Mb/s and 1.5 Gb/s are measured.

Figure 5.14: Die microphotograph of proposed IF-correlation system.
The proposed system circuits are implemented with 90-nm CMOS devices. The chip microphotograph is shown in Fig. 5.14 and has a measured area of 1.74 \( \text{mm}^2 \). The circuit operates with a 1.3 V supply for transceiver and a 2.5V supply for SPDT switch. Since SPDT is implemented with thick oxide devices for better headroom, a 2.5V digital control bit is designed to configure the chip to work in transmitter mode or in receiver mode. The proposed system consumes 79 mW including 25 mW for the modulator, 26 mW for the correlator, 28 mW for the current mode logic (CML) divider and clock buffers in PCR mode. In point-to-point communication mode, the correlator is reconfigured to be demodulator which consumes 20 mW while the transceiver system consumes 69 mW. The other 4 mW power deduction comes from reducing the bias currents of I/Q template generators, clock buffers and CML divider circuits. Note that the proposed system is working in half duplexing mode. Therefore, the power consumption is maximum 54 mW in PCR mode and 49 mW in point-to-point communication mode.

### 5.3.1 Measurement Results in PCR Mode

**Figure 5.15**: Received 7-bits Barker code at 1Gb/s.

For PCR measurements, the IF carrier frequency is tuned to 5.93 GHz. Fig. 5.15 plots the 7-bits Barker code BPSK at 1 Gb/s. The I/Q channel data
share the same code but the signals are filtered by Mini-Circuits VLFX-1050+. As seen in Fig. 5.15, > 500\( mV_{pk-pk} \) signal are received for the correlator. The local upconverted template signal has 400 \( mV_{pk-pk} \) swing in simulation. The maximum output voltage from the correlator ranges from -160 mV to 160 mV which means 16 mV ADC resolution is required. A 4-bits binary coded capacitor bank is designed for different data rate. Therefore, the integrating capacitor needs to be adapted to the signal bandwidth. Due to longer switching time of the filtered data and output network, there is DC wander and AM distortion. Note that DC wander is different from the DC offset. The DC wander is more like a low frequency signal which affects the bias of correlator.

An agilent 81134A PPG generates the template code for receiver chip. Fig. 5.16 describes the 7-bits Barker code correlations with \( \alpha \) of 0 degree and 30 degree. Due to the mismatch of the I/Q conversion gain, the amplitude has around 3% error. There is also high frequency ripples in the correlation result due to LO leakage.

Fig. 5.18 describes the 7-bits Barker code SLR performance with different phase. With the proposed bandpass I/Q correlator, the SLR amplitude ends up with < 8% error in peak detection. The transmitted 3-bits, 5-bits and 7-bits Barker codes at 200 Mb/s and 1.5 Gb/s are shown in Fig. 5.18 (a-c) and Fig.
Figure 5.17: SLR of 7-bits Barker code at 1Gb/s.

5.18 (g-i). The I/Q channel correlation results are measured as Fig. 5.18 (d-f) and Fig. 5.18 (j-l). With proposed I/Q architecture, the SLR peak value calibration in post signal processing is relaxed. Fig. 5.18 (m-o) shows the SLR performance with 3-bits, 5-bits and 7-bits code length at 200 Mb/s and 1.5 Gb/s. The amplitude is calculated according to equation (??). Phase information are provided also. At 1.5 Gb/s Barker code, the integration results are more sensitive to the template data switching. Due to the low gain of receiver, 600 mV_{pk−pk} template swing is set in the measurements. Lower swing would reduce the ripples and switching effects.

5.3.2 Measurement Results in Communication Mode

In passthrough mode, two proposed baseband circuits are connected back-to-back where one circuit is configured to be the modulator while the other one is configured to be the demodulator. The proposed reconfigurable correlator or demodulator requires more than 450 mV_{pk−pk} signal swing. An amplifier is inserted between the two circuits to model the VGA and provide enough signal swing for the demodulator. As shown in Fig. 5.13(c), there are two hybrid couplers inserted between TRx chips and IF amplifier. The proposed circuits are measured at data rate of 1.6 Gb/s and 3 Gb/s with the IF carrier frequency at 5.65 GHz.

Fig. 5.19 plot the measured single-ended spectrum of transmitter and re-
Figure 5.18: PCR mode with barker code (a-c) 200Mb/s (d-f) 200Mb/s I/Q correlations (g-i) 1.5Gb/s (j-l) 1.5Gb/s I/Q correlations (m-o) SLR performances.
receiver. The I/Q signals are generated by two $2^{31} - 1$ pseudo-random binary sequence (PRBS) channels. The transmitter spectrum shows the additional loss from 6.1 GHz to 7 GHz. Receiver spectrum has more loss at high frequency due to the link set-up, PCB parasitic capacitors, bonding wire inductor and SPDT on-state resistance. Note that this problem is mainly because of the link set-up. It would not be an issue with on chip bidirectional front-end because the signal is upconverted to 72-78.5 GHz band and more flat channel property can be expected.

**Figure 5.19**: (a) Modulator spectrum at 1.6 Gb/s and 3 Gb/s (b) Demodulator spectrum at 1.6 Gb/s and 3 Gb/s.

In order to determine the 3-dB bandwidth of the receiver, a small-signal
measurement is shown in Fig. 5.20(a). The template generator is configured to provide a IF sinusoidal clock and the conversion gain of the IF to baseband can be measured. We find that both channels behave consistently with a 3-dB bandwidth of 1.4 GHz but a 1dB amplitude mismatch exists between the channels.

Fig. 5.20(b) shows the I/Q mismatch of the demodulator. The single-ended 0.5 GHz demodulated I/Q signals are plotted when IF signal is at 5.15 GHz and LO is at 5.65 GHz. The phase error is 3.5° but 2.5° is accounted from PCB routing and coupler. Additionally, wideband 180° hybrid coupler has 0.5 dB amplitude error and < 1° phase error. The power consumptions at a peak data rate of 3 Gb/s are 25 mW in transmitter mode and 20 mW in receiver mode.

Table 5.2 summaries the performance of this work. The signal bandwidth is reconfigurable from 200 MHz to 1.5 GHz for pulse compression radar (PCR). QPSK modulation improves the data rate up to 3 Gb/s. Compared to previous work [30], proposed double downconversion system uses AC coupling between variable gain amplifier (VGA) and baseband circuits. So offset calibration algorithm is not required if the demodulator is designed properly. The 5.65 GHz and 5.93 GHz local oscillator (LO) signals in proposed frequency plan can be implemented with wideband fractional-N frequency synthesizer. Compared to [11, 68, 103], higher data rate and better modulation scheme have been demonstrated.

5.4 Conclusions

This chapter presents a dual mode baseband architecture for bidirectional pulse compression radar (PCR) system and short-range communication system. The signal processing circuit includes a I/Q clock generator, a high linear SPDT, a QPSK modulator, a template generator and a reconfigurable QPSK demodulator/analog correlator [105]. This chip is implemented with 90 nm CMOS devices and it consumes 54 mW with bandwidth of 1.5 GHz for 10 cm range resolution. In point-to-point communication mode, it consumes 49 mW at a peak data rate of 3 Gb/s. Analog correlation reduces ADC sampling rate to 1/N in PCR mode. Therefore, significant power and area can be saved by proposed architecture which
Table 5.2: Performance Summary and Comparison

<table>
<thead>
<tr>
<th></th>
<th>This Work</th>
<th>[4]</th>
<th>[30]</th>
<th>[67]</th>
<th>[68]</th>
<th>[104]</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Technology</strong> (CMOS)</td>
<td>90 nm</td>
<td>65 nm</td>
<td>90 nm</td>
<td>180 nm</td>
<td>130 nm</td>
<td>65 nm</td>
</tr>
<tr>
<td><strong>Application</strong></td>
<td>PCR</td>
<td>FMCW</td>
<td>PCR</td>
<td>PCR</td>
<td>IRUWB</td>
<td>IRUWB</td>
</tr>
<tr>
<td><strong>Bandwidth (GHz)</strong></td>
<td>1.5</td>
<td>0.7</td>
<td>1</td>
<td>0.005</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td><strong>Data Rate (Gb/s)</strong></td>
<td>0.2-3</td>
<td>NA</td>
<td>0.05-1</td>
<td>0.005</td>
<td>2</td>
<td>0.5</td>
</tr>
<tr>
<td><strong>Template</strong></td>
<td>Barker</td>
<td>NA</td>
<td>Barker</td>
<td>Chirp</td>
<td>BPSK</td>
<td>OOK</td>
</tr>
<tr>
<td><strong>Modulation</strong></td>
<td>QPSK</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
<td>BPSK</td>
<td>OOK</td>
</tr>
<tr>
<td><strong>Area (mm²)</strong></td>
<td>1.74</td>
<td>1.045</td>
<td>1.3</td>
<td>5.67</td>
<td>NA</td>
<td>4</td>
</tr>
<tr>
<td><strong>Power (mW)</strong></td>
<td>54</td>
<td>243</td>
<td>42</td>
<td>62.6</td>
<td>253.6</td>
<td>13.3</td>
</tr>
</tbody>
</table>

leads to a low cost solution. Moreover, the short-range communication feature enables the possibility for future automotive wireless network and other applications such as intelligence of things (IoTs).

**Acknowledgements**

This chapter is mostly a reprint of the material submitted to Jun Li; T. Kijsanayotin; Buckwalter, J. F.,"A 3-Gb/s Radar Signal Processor using an IF-Correlation Technique in 90nm CMOS," IEEE Transactions on Microwave Theory and Techniques. This dissertation author was the primary author of this material.
Chapter 6

Conclusions

This dissertation presents the analysis, design and prototype results in the area of low power integrated circuits for optical interconnects and high speed pulse compression radar. Firstly, a monolithic integrated silicon photonic transmitter is presented. Derived equations and analysis describes the WDM link design flow by considering the optical and electrical blocks in system level. Optimum energy efficiency is obtained according to the proposed algorithm. The tuning circuit and laser efficiency are included in the total power optimization. With specific process, the optimum data rate can be obtained.

Secondly, an analog correlation circuit is presented that enables the low power and high speed signal processing for pulse compression radar (PCR). The receiver system includes VGA to demonstrate the dynamic range performance. In addition, DC offset and time misalignment issues are discussed and a digital assisted calibration algorithm is presented. Closed-loop calibration removes DC offset and a DLL allows template signal to do fractional bit period alignment. To the best of author’s knowledge, the measured prototype achieves the state-of-the-art performance in terms of dynamic range, bandwidth and power consumption. This work also demonstrates the digital assisted closed-loop calibration using on-chip digital-to-analog converters (DAC).

Finally, a technique to perform IF correlation in the bidirectional radar transceiver is proposed. The system is reconfigurable for high resolution range sensing and short range data communication. Bidirectional architecture allows the
time division duplexing (TDD). The sensing and communication features enable the possibility of innovative hardware especially for intelligent network platforms. The circuit includes modulator, correlator/demodulator and clock divider. System frequency plan is discussed for two mode operation. With IF correlation, no DC offset is required. This improves the radar sensing latency and also relax the need of DLL. The fractional bit period alignment can be extracted according to the correlation result. This work demonstrates 10 cm range resolution and up to 3 Gb/s data rate at maximum power consumption of 54 mW. To the best of author’s knowledge, this work is the first demonstration of two mode pulse compression radar (PCR) processor up to 3 Gb/s and 10 cm range resolution in a monolithic circuit.
Bibliography


