Cognitive Serial Interface with Multi-Band Signaling and

Channel Learning Mechanism

A dissertation submitted in partial satisfaction of the
requirements for the degree Doctor of Philosophy

in Electrical Engineering

by

Yuan Du

2016
ABSTRACT OF THE DISSERTATION

Cognitive Serial Interface with Multi-Band Signaling and Channel Learning Mechanism

by

Yuan Du

Doctor of Philosophy in Electrical Engineering

University of California, Los Angeles, 2016

Professor Mau-Chung Frank Chang, Chair

With the data rate of peripheral serial I/O for PCs and mobile computing platforms continuing to scale to meet high-bandwidth applications, the conventional serial interfaces are more than more challenging to achieve high bandwidth (>10Gb/s per lane), high energy efficiency (<10pJ/bit) and low cost at the same time, especially on complicated channel conditions, such as multi-drop buses and low-cost connectors/cables.

Conventional equalization solutions are very power-hungry, and there is no universal solution capable of handling all different channel conditions. The dissertation will introduce a multi-band signaling with channel response learning mechanism, and a cognitive serial interface system will
be proposed to deal with the challenges mentioned above. The cognitive multi-band scheme mitigates equalization requirement and enhances the energy efficiency by avoiding frequency notches and utilizing the maximum available signal-to-noise ratio and channel bandwidth.

A cognitive tri-band transmitter (TX) and receiver (RX) with a forwarded clock using multi-band signaling and high-order digital signal modulations are designed, implemented and measured for serial link applications. The cognitive serial interface system features learning an arbitrary channel response by sending a sweep of continuous wave, detecting power level at RX side, and then adapting modulation scheme, data bandwidth and carrier frequencies accordingly based on detected channel information. The supported modulation scheme ranges from NRZ/QPSK to PAM-16/256-QAM. The proposed highly reconfigurable transceiver architecture is capable of dealing with low-cost serial channels, such as low-cost connectors, cables or multi-drop buses (MDB) with deep and narrow notches in the frequency domain (e.g., 40 dB loss at notches). The adaptive multi-band scheme mitigates equalization requirements and enhances the energy efficiency by avoiding frequency notches and utilizing the maximum available signal-to-noise ratio (SNR) and channel bandwidth. The implemented cognitive serial interface prototype consumes 14.7 mW /15.2 mW power and occupies 0.016mm² /0.024 mm² on TX and RX side respectively. It achieves a maximum data rate of 16 Gb/s with forwarded clock through one differential pair and the most energy-efficient Figure of Merit (FoM) of 20.4 µW/Gb/s/dB for TX and 21.1 µW/Gb/s/dB. The FoM is calculated based on power consumption of transmitting and receiving per Gb/s data and simultaneously overcoming per dB worst-case channel loss within the Nyquist frequency.
The dissertation of Yuan Du is approved.

Danijela Cabric

Jason Cong

Xiaochun Li

Mau-Chung Frank Chang, Committee Chair

University of California, Los Angeles

2016
To my beloved wife and son -- Ying Liu and Kysen Du:

“You are my North, my South, my East and West,
You are my working week and my Sunday rest,
You are my noon, my midnight, my talk, and my song.
I love the sun, the moon and you.
   The sun for the day,
   The moon for the night,
   And you forever.”
# Table of Contents

LIST OF ACRONYMS ........................................................................................................ viii
LIST OF FIGURES ........................................................................................................... ix
LIST OF TABLES ............................................................................................................... xi
ACKNOWLEDGEMENT ...................................................................................................... xii
VITA .................................................................................................................................. xiv
PUBLICATIONS ................................................................................................................ xiv

CHAPTER 1 INTRODUCTION ......................................................................................... 1
  1.1 Motivation .............................................................................................................. 1
  1.2 Conventional SerDes Link Overview ................................................................. 5
  1.3 State-of-the-art Equalization Solutions ............................................................... 8
  1.4 Major Work and Organization of Thesis ............................................................. 11

CHAPTER 2 MULTI-BAND SIGNALING ....................................................................... 13
  2.1 Concept Description of Multi-Band Signaling ................................................. 13
  2.2 Comparison with Conventional Base-Band Signaling ...................................... 15
  2.3 Self-Equalization Effect .................................................................................... 18
  2.4 Source-Synchronization/ Forwarded-Clock Architecture ................................. 21
  2.5 Summary of Previous Works on Multi-Band Signaling Serial Link Transceivers .. 22

CHAPTER 3 CHANNEL LEARNING MECHANISM ....................................................... 26
  3.1 Non-coherent Channel Learning ....................................................................... 26
  3.2 Coherent Channel Learning ............................................................................. 28
  3.3 Channel Response Modeling ............................................................................ 31
  3.4 Carrier Synchronization Implementation with the Limited Hardware Resources .. 37
  4.1 Link Budget Analysis ......................................................................................... 40
  4.2 Delay Mismatch Analysis .................................................................................. 43
  4.3 Inter-Band Interference (IBI) Analysis .............................................................. 45
  4.4 IQ Interference Analysis (IQI) ......................................................................... 46
  4.5 Carrier Phase Noise Analysis .......................................................................... 47

CHAPTER 5 COGNITIVE CHANNEL LEARNING TRANSCEIVER ARCHITECTURE .. 48
  5.1 System Architecture ......................................................................................... 48
5.2 Cognitive Algorithm Design ................................................................. 50

CHAPTER 6  CIRCUITS DESIGN OF KEY BUILDING BLOCKS ....................... 52
6.1  4-bit Digital-to-Analog Converter (DAC) ............................................. 52
6.2  Broad-Band Summation Block ............................................................ 54
6.3  Receiver Front End ................................................................................. 56

CHAPTER 7  IMPLEMENTATION AND MEASUREMENT RESULTS ANALYSIS ...... 57
7.1  Measurement Platform ........................................................................... 57
7.2  Frequency-Domain Measurement Results ............................................. 59
7.3  Time-Domain Measurement Results ....................................................... 61
7.4  Power Consumption and Die Photo ....................................................... 63

CHAPTER 8  CONCLUSIONS AND FUTURE WORK .................................. 64

REFERENCE ............................................................................................... 66
## LIST OF ACRONYMS

<table>
<thead>
<tr>
<th>Acronyms</th>
<th>Meaning</th>
<th>Acronyms</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>HSEL</td>
<td>High-Speed Electronics Lab</td>
<td>RFI</td>
<td>Radio Frequency Interconnect</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed Circuit Board</td>
<td>MDB</td>
<td>Multi-Drop Buses</td>
</tr>
<tr>
<td>TX</td>
<td>Transmitter</td>
<td>RX</td>
<td>Receiver</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal-to-Noise Ratio</td>
<td>FoM</td>
<td>Figure of Merits</td>
</tr>
<tr>
<td>NRZ</td>
<td>Non-Return-to-Zero</td>
<td>PAM</td>
<td>Pulse-Amplitude Modulation</td>
</tr>
<tr>
<td>QPSK</td>
<td>Quadrature Phase-Shift Keying</td>
<td>QAM</td>
<td>Quadrature Amplitude Modulation</td>
</tr>
<tr>
<td>FFE</td>
<td>Feed-Forward Equalization</td>
<td>CTLE</td>
<td>Continuous-Time Linear Equalization</td>
</tr>
<tr>
<td>DFE</td>
<td>Decision Feedback Equalization</td>
<td>BER</td>
<td>Bit Error Rate</td>
</tr>
<tr>
<td>CDR</td>
<td>Clock and Data Recovery</td>
<td>ISI</td>
<td>Inter-Symbol Interference</td>
</tr>
<tr>
<td>SerDes</td>
<td>Serializer/De-serializer</td>
<td>IBI</td>
<td>Inter-Band Interference</td>
</tr>
<tr>
<td>DAC</td>
<td>Digital-to-Analog Converter</td>
<td>OFDM</td>
<td>Orthogonal Frequency-Division Multiplexing</td>
</tr>
<tr>
<td>ADC</td>
<td>Analog-to-Digital Converter</td>
<td>IQI</td>
<td>IQ Interference</td>
</tr>
<tr>
<td>UART</td>
<td>Universal Asynchronous Receiver/Transmitter</td>
<td>PRBS</td>
<td>Pseudo-Random Binary Sequence</td>
</tr>
</tbody>
</table>
LIST OF FIGURES

Fig. 1.1 Two different channel conditions, (a) Multi-drop buses channel for memory interface application; (b) low-cost connector and cable channel for peripheral serial I/Os .................................. 2
Fig. 1.2 Examples of high-speed via, packaging, connector, and cable technologies from Samtec Inc. .............................................................................................................................. 3
Fig. 1.3 Technology trend with functions per chip, data bandwidth, and CPU clock rate ........... 6
Fig. 1.4 Evolution of serial interface over the several decades .................................................. 7
Fig. 1.5 Trend of power consumption of server-side serial interfaces........................................ 7
Fig. 1.6 Conventional comprehensive combination of equalization, (a) TRX architecture; (b) insertion loss, single-bit response and received eye diagram on MDB channel; (c) insertion loss, single-bit response and received eye diagram on low-cost cable channel .................................. 8
Fig. 2.1 Concept of multi-band signaling with PAM-8 and 64-QAM modulators ......................... 13
Fig. 2.2 Concept comparison of multi-band signaling and baseband signaling ......................... 14
Fig. 2.3 Comparison of multi-band signaling and baseband NRZ signaling on MDB channel ... 16
Fig. 2.4 Comparison of multi-band signaling and baseband NRZ signaling on low-cost cable channel .............................................................................................................. 17
Fig. 2.5 Concept of self-equalization effect on AM modulation and demodulation ................. 18
Fig. 2.6 Concept of self-equalization effect on quadrature modulation and demodulation ....... 19
Fig. 2.7 Time-domain signal in-phase and quadrature components analysis ............................. 20
Fig. 2.8 Design specification: Self-equalization effect with non-linear channel loss ............... 20
Fig. 2.9 Traditional source-synchronization or forwarded-clock architecture ......................... 21
Fig. 2.10 On-chip multi-band RF interconnect transceiver in 2009 VLSI .................................. 22
Fig. 2.11 On-board multi-band RF interconnect transceiver in 2012 ISSCC ............................ 23
Fig. 2.12 On-board multi-band RF serial interconnect transceiver with QPSK in 2015 CICC .... 24
Fig. 2.13 On-board multi-band RF serial interconnect transceiver with 16-QAM supporting multi-drop bus (MDB) channel in 2016 ISSCC ...................................................... 25
Fig. 3.1 Non-coherent channel learning scheme ...................................................................... 27
Fig. 3.2 Coherent channel learning scheme ............................................................................ 29
Fig. 3.3 Optimal phase detection for broadband signaling with two-tone test ....................... 30
Fig. 3.4 Multi-drop bus channel modeling and measurement .................................................. 31
Fig. 3.5 Low-cost cable channel modeling and measurement .................................................. 32
Fig. 3.6 Complete channel with package, transmission line on PCB, and HDMI Connect/Cable modeling ....................................................................................................... 33
Fig. 3.7 Measurement results of 2m&3m HDMI connector/cable with wire-bonded packages .. 34
Fig. 3.8 Cable/connector coupling/ cross talking measurement platform .......................... 35
Fig. 3.9 Cable/connector coupling/ cross talking measurement results .................................. 35
Fig. 3.10 3-D electrical-magnetic (EM) simulation and model ................................................ 36
Fig. 3.11 Comparison of with or without phase offset ........................................................... 37
Fig. 3.12 The proposed phase calibration scheme with 1-bit ADC .................................................. 38
Fig. 4.1 Link budget calculation ........................................................................................................... 40
Fig. 4.2 Delay mismatch analysis for different bands (a) different physical channel delay mismatch; (b) MDC channel insertion loss and group delay; (c) low-cost cable channel insertion loss and group delay ........................................................................................................ 44
Fig. 4.3 Design specification: inter-band interference (IBI) ................................................................ 45
Fig. 4.4 Design specification: IQ interference (IQI) ................................................................................ 46
Fig. 4.5 Design specification: Jitter/Phase noise requirement for 16-QAM ............................................. 47
Fig. 5.1 System architecture of proposed cognitive tri-band transmitter ........................................... 49
Fig. 5.2 Cognitive algorithm flow ......................................................................................................... 51
Fig. 6.1 4-bit DAC and double-balanced mixer schematic ..................................................................... 53
Fig. 6.2 Broad-band summation block schematic ............................................................................... 55
Fig. 6.3 Receiver front-end schematic ................................................................................................... 56
Fig. 7.1 Measurement platform ............................................................................................................ 58
Fig. 7.2 Frequency-domain measurement analysis, (a) MDB channel with enabled channel learning; (b) low-cost cable channel with disabled channel learning; (c) low-cost cable channel with enabled channel learning .................................................................................. 60
Fig. 7.3 Time domain measurement results ........................................................................................ 62
Fig. 7.4 Die photo and power consumption breakdown ....................................................................... 63
LIST OF TABLES

Table I: ADC ENOB specification for the different modulation schemes ........................................ 38
Table II: Link budget merits for different modulation schemes ..................................................... 41
Table III. Performance comparison with other state-of-the-art works ....................................... 65
ACKNOWLEDGEMENT

I would like to gratefully and sincerely thank Prof. M.C. Frank Chang for his guidance, understanding, patience and most importantly, his love during my graduate studies at UCLA. He told me to think big and behave effectively. He always tried his best to offer me the most opportunities and the least limitation. His mentorship was paramount in providing a well-rounded experience consistent my long-term career goals. He encouraged me not only grow as an engineering but also as an independent thinker. I would also like to thank Professor Danijela Cabric, Professor Jason Cong and Professor Xiaochun Li for their valuable discussions, assistances and their effort to serve on my committee.

The members of the High-Speed Electronics Lab (HSEL) have contributed immensely to my personal and professional time at UCLA. The group is always my source of friendships as well as good advice and collaboration. Especially for radio frequency interconnect (RFI) team members – Dr. Po-Tsang Huang from National Chiao Tung University (NCTU), Dr. Weihan Cho, Dr. Yilei Li, Dr. Sheau Jiung Lee, Chien-Heng Wong, and Jieqiong Du, we ran like a start-up company with super high efficiency generated not only many publications but also created several product-level prototypes. I would also like to thank Dr. Hao Wu, Dr. Zuow-Zun Chen, Dr. Yu-Hsiu Wu, Dr. Arian Tang from Jet Propulsion Laboratory (JPL), Dr. Jianhua (Jack) Lu from Archiwave Micro, who provided valuable suggestions and offered in-depth discussion at any time.

I would like to express my special appreciation and thanks to my colleagues -- Dr. Yan Zhao, Dr. Richard Al Hadi and Rod Yanghyo Kim and Dr. Boyu Hu. I would like to thank them not only for encouraging my research and also for allowing me to join their amazing projects. We set several new world records together by developing the first 128-channel high-resolution
terahertz CMOS image system and the first 200 Gb/s per inch wave-connect contactless connector and cable for wireline communication system.

I was lucky and thankful for receiving Broadcom Fellow Scholarship. I would like to thank my great mentor Dr. Afshin Momtaz from Broadcom for valuable advice on academic research and industry career path. It is my great honor to work on the first generation WIFI 802.11ac transmitter design with Chungyeol Paul Lee, Dr. Henry Jian, Dr. Ali Parsa and Dr. Rodney Chandler in Broadcom RFIC WIFI group in summer of 2013.

I would especially like to thank our lab assistant Janet Lin, who helped me to revise my papers and reports words by words and to order any components or instruments I need super efficiently, and the manager of the Center for High Frequency Electronics (CHFE) Minji Zhu, who supported me to build measurement platform and chip package/assembly. Without their great effort, I would have never been able to get my chip work in time. I also would like to thank the staff members in Electrical Engineering department -- Deona Columbia, Ryo Arreola, and Mandy Smith for dealing with paper works and arrangement for my final defense.

I would finally like to express my greatest gratitude toward my family; my parents, my parents-in-law, my lovely wife, and son for the unlimited love and support they gave to me throughout my entire life without which I would have never been even close to where I am today.
VITA

<table>
<thead>
<tr>
<th>Year</th>
<th>Degree</th>
<th>Institution</th>
</tr>
</thead>
<tbody>
<tr>
<td>2009</td>
<td>B.S., Information Science and Technology</td>
<td>Southeast University, China</td>
</tr>
<tr>
<td>2012</td>
<td>M.S., Electrical Engineering</td>
<td>University of California, Los Angeles</td>
</tr>
<tr>
<td>2016</td>
<td>Graduate Student Researcher</td>
<td>University of California, Los Angeles</td>
</tr>
</tbody>
</table>

PUBLICATIONS

Conference Paper


3. Wei-Han Cho, Y. Li, Y. Kim, P. Huang, **Yuan Du**, S. Lee, and M.-C. Frank Chang, "a 5.4-mW 4-Gb/s 5-Band QPSK transceiver for frequency-division multiplexing memory interface", in IEEE CICC Dig. Tech. Papers, Sept. 2015.
Journal Paper


Patent

1. Yuan Du, Y. Li, W. Cho, C. Wong, J. Du, S. Lee, M.-C. Frank Chang,“ARBITRARY CHANNEL RESPONSES LEARNING APPARATUS AND METHODS IN MULTIBAND RF-INTERCONNECT (MRFI) FOR WIRELINE COMMUNICATION,” (provisionally approved)
CHAPTER 1 INTRODUCTION

1.1 Motivation

The data rate of peripheral serial input/output (I/O) for PC and mobile computing platforms continue to scale to meet high-bandwidth applications including high-resolution displays, camera sensors and large-capacity external storage [1]. With ever increasing data rate, signal, and power integrity become more challenging issues because of various channel loss mechanisms, as well as discontinuities caused by vias, solder balls, packages, routing wire impedance mismatches, and connector or cable transitions, which set the upper boundary of bandwidth capacity. Such examples of non-idealities are shown with the multi-drop bus (MDB) channel for memory interface and low-cost peripheral serial I/Os with connector and cable in Fig. 1(a) and (b), respectively. When considering a cable-only case, the dielectric and conduction loss would exhibit a simple low-pass characteristic, depicted by the dashed curve in Fig. 1(b). However, the packages, solder balls, bonding wires, vias, traces, and connectors make the entire channel suffer from a higher loss at certain frequencies, depicted by the solid curve in Fig. 1(b). The phenomenon is more pronounced in low-cost packaging, PCB, cable and connector technologies. To make the matter even worse, the frequency response varies over different packages and PCB designs.
Fig. 1.1 Two different channel conditions, (a) Multi-drop buses channel for memory interface application; (b) low-cost connector and cable channel for peripheral serial I/Os

One obvious and straightforward solution to reduce such effects is to improve the dielectric material quality for the substrate of the Printed Circuit Board (PCB) routing [25-27]. With system speeds increase, it is hard to maintain sufficient signal integrity using lower-cost FR-4 epoxy laminate printed circuit board (PCB) materials. High-performance laminates are available from several vendors but at a significant cost increase. As an example, RO4003™ PCB material from ROGERs Corp. is a ceramic filled hydrocarbon laminate with woven glass reinforcement. With its low and consistent value, the material has been developed for broadband analog applications through millimeter-wave frequencies and low-distortion, high-speed digital applications through 25 Gb/s.
Another solution to alleviate signal integrity issues for high-speed serial interfaces is to invest more resource in via, packaging, connector and cable technologies [28-30]. It is a trend to take high-speed signals out of the board using twin-ax copper cables. On-board cable jumpers can connect point-to-point or directly to the backplane or I/O interface. The cable backplane option is becoming an increasingly popular way to greatly decrease routing complexity, laminate layer number, and cost with extended communication distance. For example, the high-speed via, packaging, connector, and cable technologies from Samtec Inc. are shown in Fig. 2.

![Fig. 1.2 Examples of high-speed via, packaging, connector, and cable technologies from Samtec Inc.](image)

Furthermore, depending on data rate requirements relative to the available channel bandwidth and severity of potential noise sources, a comprehensive combination of equalization schemes, such as feed-forward equalization (FFE), continuous time linear equalization (CTLE) and decision feedback equalization (DFE), are employed at transmitter (TX) side or receiver (RX) side [9-15]. While elegant, backed by rigorous mathematical proof and digital signal processing concepts, the approach above inevitably increases the overall system complexity and total power consumption.
The main motivation of this dissertation is to introduce multi-band signaling into wireline communication. The channel conditions for different applications are very different. There is no universal solution existing, it requires circuits, packages, channels, equalization algorithms co-design together and also highly-accurate electric-magnetic modeling. Even with all of them, the serial interface still suffers from process, fabrication variations of the channel. The existing solutions are not perfect. Passive compensation is costly, sensitive and not adaptable. Moreover, active equalization is complicated and power-hungry. It is a highly hard trade-off between high-performance requirement and power/cost budget. A low-power, low-cost, high-efficiency and fully adaptable serial interface is highly in demand. In the rest of the dissertation, the thesis will introduce multi-band signaling, which will bring entirely new domain into this extremely hard trade-off.
1.2 Conventional SerDes Link Overview

The blue curve in Fig. 3 is technology scaling, and the red curve is the number of functions per chip, increasing dramatically with technology scaling. More interesting is the bottom figure. We can find that the CPU clock rate does not change much over 15 years, mainly because power and heat dissipation becomes a severe problem. Moreover, similarly, the IO pads number per chip also remains relatively constant but for a different reason, mainly because of packaging cost. Nowadays, the packaging plus testing have already taken more portion than fabrication in the semiconductor industry, especially for high-speed high-pin-count chips. It is very obvious that there is a large gap between IO data rate and internal data rate, 15 ~ 20 times difference.

With ever-increasing data rate, the gap will be larger and larger. That means we need more and more powerful functional block to serialize the parallel internal data to serial IO data and deserialize the high-speed IO data to relatively low-speed internal data. That is the serial interface. Serial Interface is everywhere in today’s computing platform, including the interface with the memory system, the interface with graphic card/GPU or high-definition display, the interface with multi-computing cores or accelerators and the interface with peripheral devices.
Serial interface design evolved a lot over these years as Fig. 4 shows. Starting from 1980’s, when the data rate around tens of Mb/s, the channel could be modeled as a capacitor. The transceiver design was CMOS inverter-based driver. Then in 1990’s and 2000’s, data rate gets to hundreds of Mb/s and Gb/s level, the channel has to be modeled as a transmission line, which has more distributive effects. At that time, a lot of parallel interfaces with the multi-drop feature, like PCI, old generation DDR were replaced by point-to-point serial interfaces.

Today, the data rate of mobile/PC interfaces is going to reach more than 10 Gb/s. It is happening. However, it relies on complicated and power-hungry equalization technique. Moreover, transmission line channel model is not accurate enough, all the discontinuities and non-ideal effects, including vias, bumps, bonding wires, pads, traces in packages, connectors, etc., should be modeled carefully. It is tough to meet both data rate requirement and power and cost budget at the same time. It is one of the hottest areas both in academia and industrial.
Fig. 1.4 Evolution of serial interface over the several decades

In term of power consumption of conventional server-side serial interface, Fig. 5 shows an example from Oracle processors. It is suggested that the serial interface power consumption is almost comparable with computing core’s power. Another example published by Intel, saying the serial interface power is going to exceed 50% of total CPU power with higher and higher IO data rate in the very near future.

Fig. 1.5 Trend of power consumption of server-side serial interfaces
1.3 State-of-the-art Equalization Solutions

Fig. 6 (a) illustrates the common serial link transmitter and receiver architecture with a comprehensive combination of all the equalization mentioned above techniques. There are FFE at TX side and CTLE and DFE at RX side.

Fig. 1.6 Conventional comprehensive combination of equalization, (a) TRX architecture; (b) insertion loss, single-bit response and received eye diagram on MDB channel; (c) insertion loss, single-bit response and received eye diagram on low-cost cable channel
Fig. 6 (b) channel condition is the memory controller with two DIMMS per channel. It could be the situation if the controller were trying to communicate with DIMM0, the disabled DIMM1 will serve as an open-stub transmission line. It will create strong resonance at a specific frequency, depending on the length of these traces.

In the time-domain analysis, if a single bit is sent out onto the channel, there will a long tail existing, which is very severe inter-symbol-interference (ISI). From the frequency-domain point of view, if the data bandwidth is lower than the first notch, for example, 2Gb/s, there would not be the effect of these notches observed. However, if we are trying to send more than 10Gb/s data, eye diagram will completely close due to strong ISI and bit error rate (BER) will be awful.

Similarly, for the low-cost connector and cable channel, there are a lot of discontinuities, such as bumps, vias, traces in the package, traces on the PCB, connector transition. All of these in the signal path might create strong resonances. They are also sensitive to fabrication variations. We can find for this particular low-cost cable; a single-bit transmission created 24 unit intervals (UI) after the channel at the receiver input. In this situation, if two or more bits are sent out onto the channel continuously, the receiver cannot tell it is one or zero without any equalization techniques.

To make the matter worse, the location and depth of these frequency notches are very sensitive to PCB or connector fabrication variations. It is pretty challenging and power-hungry to use equalization technique to equalize this frequency notches. We will see how hard it is later.

There are several equalization methods commonly used. They are FIR filter equalization at the transmitter side, CTLE (continuous time linear equalization) and nonlinear DFE (decision feedback equalization) at the receiver side. All of them have their advantages and disadvantages.
The practical way is to combine all them together to achieve optimal operation point regarding energy efficiency, maximum data bandwidth.

The system level study reveals that, if the conventional comprehensive combination of equalization schemes were utilized, much energy would be wasted in the notch frequencies, which is because of the worst-case notch compensation principle. Single-bit pulse responses of the two different channels in Fig. 6 (b) and (c) indicate that both of them end up with very long tails (18 UIs in Fig. 6 (b) and 24 UIs in Fig. 6 (c)) due to high-frequency loss and strong reflection at specific frequency notches. With two given channels (multi-drop memory bus (MDB) and low-cost cable/connector), even with 3-tap TX FFE, 1-tap CTLE and 18/24-tap DFE, the horizontal and vertical eye opening is still very limited as shown in Fig. 6 (b) and (c). Each tap requires high linearity, high dynamic range to achieve this performance. In this study, less than 100 mV vertical opening means very limited SNR, and less than 20 ps horizontal opening leads to a requirement of the high-performance clock and data recovery (CDR) system to achieve a reasonable bit error rate (BER).
1.4 Major Work and Organization of Thesis

This dissertation is a summary and continuous work of [16-17]. It begins with reviewing the motivation of the thesis. Chapter 1.2 summarizes the conventional serializer/de-serializer (SerDes) link and power consumption issues of modern high-speed serial interfaces. Chapter 1.3 introduces the state-of-the-art equalization solutions and points out the advantages and disadvantages of them.

The organization of the rest thesis is as follows:

Chapter 2 gives a general review of multi-band signaling scheme. It starts with Chapter 2.1 introducing the concept with simple examples. Chapter 2.2 compares the multi-band signaling with conventional baseband signaling. Several simulation results are performed to compare between multi-drop bus memory interface channel condition and more general “linear” loss channel condition. Chapter 2.3 comes up with a self-equalization effect which is unique for the multi-band signaling system, beneficial to signal integrity with conversion up and conversion down operations. Source-synchronization or forwarded-clock architecture is introduced in Chapter 2.4 which is also unique for a multi-band signaling system without increasing channel and IO number, served for clock forwarding purpose. At last, Chapter 2.5 summarizes all the previous works on multi-band signaling serial link transceivers.

Chapter 3 discusses the channel condition learning mechanism, including non-coherent channel learning and coherent channel learning in Chapter 3.1 and 3.2. Chapter 3.3 describes how to model the channel response reliably. At last, Chapter 3.4 studies the carrier synchronization implementation, which needs to be handled within hardware and power limitation.
Chapter 4 studies the system-level design specifications and link analysis. Link budget analysis, delay mismatch analysis, inter-band interference (IBI) analysis, inter-channel interference (ICI) analysis, cross talking analysis, carrier phase noise analysis, and I/Q mismatch analysis are discussed from Chapter 4.1 to Chapter 4.6 respectively.

Chapter 5 gives an introduction to the system architecture of the multi-band signaling serial interface system, including the transmitter, the receiver, the cognitive controlling, and extra hardware overhead serving for channel condition learning purpose. The details of the cognitive algorithm are studied in Chapter 5.2.

Chapter 6 describes the circuits design of the main building blocks. The digital-to-analog converter (DAC), the broadband summation block, and the receiver front end are discussed in Chapter 6.1, 6.2, and 6.3 respectively.

Chapter 7 introduces how the prototype system is built and tested. Chapter 7.1 describes the measurement platform with instruments and discrete components. Chapter 7.2 and Chapter 7.3 analyze both the frequency-domain measurement results and time-domain measurement results.

Chapter 8 draws the conclusion and discusses the potential work for future.
2.1 Concept Description of Multi-Band Signaling

The fundamental consideration of multi-band signaling is precisely the same as the cable TV system or wireless orthogonal frequency-division multiplexing (OFDM) system. However, both cable TV and wireless OFDM system are relatively narrow band systems, while the serial interface is broadband. Channel conditions of serial interface are also very different.

In Fig. 2.1, PAM-8 and 64-QAM are shown as an example. 15 parallel data streams are running at 1 Gb/s as a data source. The PAM-8 modulator modulates three of them, the time-domain waveform of which are still in base-band but with multi-level features. Six of them pass to the 64-QAM modulator, the time-domain waveform of which is modulated by RF carrier frequency $f_1$. Similarly, another 6 of them are modulated by another RF carrier frequency $f_2$. Then, all of these waveforms are summed together. There are one baseband, one RF band at $f_1$ and another RF band at $f_2$ in the frequency domain, respectively.
Fig. 2.2 Concept comparison of multi-band signaling and baseband signaling

The merits of frequency-domain multi-band signaling over time-domain baseband are emphasized in Fig. 2.2:

(1) Multi-band signaling can offer simultaneous and orthogonal communication channels in freq. domain;
(2) It is easy to adapt with channel frequency notches by smartly choosing carrier frequency;
(3) Multi-band signaling can relax equalization requirement because of self-equalization effect;
2.2 Comparison with Conventional Base-Band Signaling

To understand the multi-band signaling in a more intuitive way, we compare simulated multi-band results with those of conventional base-band only NRZ signaling by assuming the same data rate requirement of 15 Gb/s with MDB channel. As shown in Fig. 2.3, the energy of the current baseband NRZ signal is distributed relatively uniformly over the frequency band. When uniformly distributed signal passing through MDB channel with multiple frequency notches, signal distortion happens. Severe reflections occur at different notch frequencies, leading to strong ISI and complete closure of the data eye. Complicated equalization with large power consumption is necessary to re-open the data eye. On the other hand, in the multi-band signaling case, the energy distribution is re-shaped on purpose based on the channel profile. After the demodulator, the data eyes are clearly opened under the same data rate and channel condition assumption.

For the second simulation shown in Fig 2.4, the channel condition from Multi-drop memory interface is replaced by more common ‘linear’ loss channel. (The quotation mark on linear means it is not really linear relationship, just more linear compared with multi-drop channel.) This channel loss profile is due ESD, pad loading, skin effect of the metal traces and dielectric loss of substrate material. The data eye will completely close when the loss is more than 12dB at Nyquist frequency. Complicated and power-hungry equalization is necessary to open the data eye.

Another factor worth noticing is that the time scales for multi-band and base-band only cases are very different. In multi-band signaling, the entire bit stream is divided into multiple sub-bands, each of which would operate at much lower speed compared with the total bit rate. As a result, it relaxes the clock-data-recovery (CDR) system design complicity and power consumption.
Multi-Band signaling of MDB

Base-Band

RF-Band₁

RF-Band₂

Baseband Signaling of MDB

Fig. 2.3 Comparison of multi-band signaling and baseband NRZ signaling on MDB channel
Multi-Band Signaling of Low-Cost Cable Channel

Baseband

RF-Band$_1$

RF-Band$_2$

Baseband Signaling of Low-Cost Cable Channel

Fig. 2.4 Comparison of multi-band signaling and baseband NRZ signaling on low-cost cable channel
2.3 Self-Equalization Effect

The multi-band signaling offers additional benefit in self-equalization effect. The self-equalization effect related theory was first detailed in our previous work [18]. Taking the basic amplitude modulation (AM) signal as an example, the signal after up-conversion is double side-banded; two sidebands contain duplicated information. The channel loss from upper side band is typically higher than that of lower side band. After down conversion, the lower sideband can compensate that of the upper sideband. In case the loss is symmetrical to the carrier frequency, the reconstructed signal will be evenly attenuated over the broadband frequencies, as shown in Fig. 2.5. A system simulation is conducted to verify this self-equalization effect and compared with that of conventional baseband NRZ signaling, as shown in Fig. 2.4 MDB channel is here replaced by a frequently used “linear” loss channel.

![Concept of self-equalization effect on AM modulation and demodulation](image)

*W. H. Cho, 2016 ISSCC*
The Fig. 2.5 shows the AM modulation case only, which is equivalent to the in-phase component only. Fig. 2.6 illustrates the complete picture with quadrature modulation. We can find in-phase terms (I path) self-equalized pretty well. Moreover, the quadrature term not fully canceled, there is some residue, which is the IQ interference. (Similar to Q path at the bottom.) The main interference happens in the transition period rather than stable data period. As a result, the eye quality does not degrade too much due to this quadrature residue, as Fig. 2.7 shown. The Fig. 2.8 shows the limitation of self-equalization effect on the non-linear loss channel [18].
Fig. 2.7 Time-domain signal in-phase and quadrature components analysis

Fig. 2.8 Design specification: Self-equalization effect with non-linear channel loss

*W. H. Cho, 2016 ISSCC*
2.4 Source-Synchronization/ Forwarded-Clock Architecture

A traditional source-synchronized or forwarded-clock system is shown in Fig. 2.9, which reduces the power and complexity of clock generation and data recovery circuits, at the cost of a dedicated physical channel with clock I/O pins for clock forwarding. In contrast, the multi-band signaling benefits from source-synchronized or forwarded-clock communication without paying the cost of the extra clock I/O pins and channel, since the baseband path in multi-band architecture can be configured for clock forwarding purpose, thus eliminating the need for dedicated extra IO pin and channel.

In summary, multi-band signaling can enable simultaneous and orthogonal communication channels in the frequency domain. It offers options to avoid channel frequency notches by carefully allocating carrier frequencies. Multi-band signaling also works well with forwarded-clock schemes without even increasing the number of channels and I/O pins.

Fig. 2.9 Traditional source-synchronization or forwarded-clock architecture
2.5 Summary of Previous Works on Multi-Band Signaling Serial Link Transceivers

This chapter will review previous work regarding multi-band signaling for serial interfaces. The first multi-band signaling transceiver was published in 2009 VLSI by Dr. Tam [19] as shown in Fig. 2.10. 30 GHz and 50 GHz carrier frequencies were used. It achieved 10Gb/s aggregated data rate. However, the channel is only 5mm on-chip transmission line for point-to-point communion. The modulation scheme was non-coherent on-off key.

![On-chip multi-band RF interconnect transceiver in 2009 VLSI](image)

Another version of multi-band signal serial interface system was demonstrated by Dr. Kim in 2012 ISSCC [20] as shown in Fig. 2.11. 18GHz RF carrier frequency was used. And it achieved 8Gb/s aggregated data rate. The channel condition was much more challenging. It included 5 cm PCB traces on the FR4 material. However, it was still point-to-point communication with on-off key modulation.
Fig. 2.11 On-board multi-band RF interconnect transceiver in 2012 ISSCC

The more advanced version of multi-band RF interconnects transceiver is realized [21] as Fig. 2.12 shown. It is proved that if one wants to extend communication distance and makes the whole serial interface more industrial friendly. The carrier frequencies have to be reduced from mm-wave frequency to within 10GHz rather than using millimeter wave frequency range, in which range, it was difficult to achieve high energy efficiency due to skin effect metallic loss and dielectric material loss at high frequency. For this implementation, five carrier frequencies are used. It achieved 4 Gb/s aggregated data rate. Moreover, the channel is 2-inch transmission line on FR-4 PCB for point-to-point communication. It is the first time to demonstrate coherent modulation with QPSK in multi-band signaling serial interface transceiver.
Fig. 2.12 On-board multi-band RF serial interconnect transceiver with QPSK in 2015 CICC

The updated version of multi-band RF serial interconnects transceiver uses 3GHz and 6GHz carrier frequencies to achieved 10 Gb/s aggregated data rate per differential pair [22], as shown in Fig. 2.13. The channel is 2-inch copper traces on FR-4 PCB. More significantly, both point-to-point communication and multi-drop channel communication are demonstrated with 16-QAM coherent modulation scheme.
Fig. 2.13 On-board multi-band RF serial interconnect transceiver with 16-QAM supporting multi-drop bus (MDB) channel in 2016 ISSCC

Firstly, in summary, all the previous works have fixed channel condition, either the on-chip/on-board point-to-point transmission lines or multi-drop bus memory interface. The question is that whether it is possible to make system adaptable for different channel conditions or fabrication variations. To achieve this, a channel learning and adaptive algorithm is necessary. The details will be discussed in Chapter 5.2.

Secondly, all the previous works have the fixed carrier frequencies and a modulation order. The second question is that whether it is possible to make both them reconfigurable, which will give the user and serial interface system designer much more system flexibility to maintain high performance and high energy efficiency under different channel conditions. The possibilities will be analyzed in Chapter 5.1.

At last, previously, the serial communication channel condition measurement need network analyzer, which is an enormous and expensive machine. The question is that whether it is possible to measure and calibrate the channel on-chip real-time without using a network analyzer. The potential solutions will be studies in Chapter 3.
CHAPTER 3  CHANNEL LEARNING MECHANISM

Channel condition is the key to all the serial interface system design. The impedance matching, equalization scheme, coefficient training are related to specific channel conditions. Based on different channel conditions, the optimal carrier frequency allocation, modulation scheme and data bandwidth allocation in each band can be very different. As a result, to make one universal multi-band signaling transceiver, it must be adaptive with various channel condition. The first step is to learn the channel condition.

3.1 Non-coherent Channel Learning

As shown in Fig. 3.1, the non-coherent channel learning is very straightforward. TX side sweeps the frequencies over interested bands using an external oscillator, which is controlled by the cognitive controller. At RX side, only one power sensor and low-speed ADC is needed to extract useful channel information, such as notch frequencies, bandwidth, and frequency-dependent channel loss. In practice, another pair of power sensor and low-speed ADC is necessary at TX side to calibrate frequency dependency of TX output power level. This non-coherent detection extracts magnitude information only and provides no phase information. Channel learning process runs only once at the beginning of data transfer. As long as the channel conditions remain stable during the operation, there is no need for additional channel learning, and therefore the power overhead during data transfer operation can be ignored.
With non-coherent channel learning the scheme, only the magnitude response of the channel is measured. The impedance matching condition and in-band group delay, which has a major effect on the signal integrity of the receiver signal, could not be learned.
3.2 Coherent Channel Learning

Coherent channel learning is more interesting than the non-coherent channel. It can detect, the non-linear phase response of the practical channel. Nonlinear phase response is the most unwanted effect of channel conditions because it results in group delay variance, which leads to dispersion or inter-symbol interference, and degrades signal integrity and bit-error rate (BER).

Most non-linear phase response is due to non-ideal impedance matching. With non-ideal matching, at the receiver input, there are an incident signal from TX and reflected signal, which might be reflected multiple times between TX and RX. The RX sees the superposition of all incident and reflected signals. This non-linear phase response also depends on the length of the channel because there is standing wave on the channel if the length of channel changed a little bit due to process variations, the standing wave pattern changes a lot and so does non-linear phase response. The transceiver needs to have the ability to calibrate it out. To make the matter worse, it also depends on the data rate. If one did phase calibration by a single frequency, it might end up with un-optimized for broadband data transfer application. The Fig. 3.2 shows the system architecture for coherent channel learning.
\[ \theta_1 = \arg\left[ e^{-j\theta_0}(1 + \gamma) \right] = \arg\left[ e^{-j\theta_0}(1 + |\gamma|e^{-j\theta}) \right] \]

Fig. 3.2 Coherent channel learning scheme
To get optimized code for broadband data transfer, a two-tone test method is proposed. Rather than sending a constant value on I-Path, a periodic clock signal is sent. The frequency of this signal is the same as the data rate we are targeting. By this means, the calibration phase code is much more optimized for broadband data transfer application, compared previous single tone calibration. This method extracts more broadband channel phase information.
3.3 Channel Response Modeling

The channel measurement platform includes a network analyzer, probe station, v-band cable, which support up to 60GHz measurement, as shown in Fig. 3.4. All the connectors, bonding wire and traces on PCB and traces on-chip are modeled in HFSS EM simulator and measure with the measurement platform. Fig. 3.4 - 3.10 demonstrate the multi-drop bus channel and low-cost cable channel’s measurement and modeling results. The measured s-parameter file will be used for transceiver design simulation, impedance matching estimation, and in-band group delay analysis.

Fig. 3.3 Channel measurement platform with network analyzer and probe station
Fig. 3.4 Multi-drop bus channel modeling and measurement

Fig. 3.5 Low-cost cable channel modeling and measurement
Fig. 3.6 Complete channel with package, transmission line on PCB, and HDMI Connect/Cable modeling
Fig. 3.7 Measurement results of 2m&3m HDMI connector/cable with wire-bonded packages
Fig. 3.8 Cable/connector coupling/ cross talking measurement platform

**FEXT**
- D2 to D1 coupling
- D2 to D0 Coupling
- D2 to CLK Coupling

**NEXT**
- D2 to D1 coupling
- D2 to D0 coupling

Fig. 3.9 Cable/connector coupling/ cross talking measurement results
Fig. 3.10 3-D electrical-magnetic (EM) simulation and model
### 3.4 Carrier Synchronization Implementation with the Limited Hardware Resources

Wireless and serial interface phase recovery requirements are very different. For a wireless system, phase recovery should be real-time and track fast changing channel characteristics, which is handled by baseband DSP. For serial interface system, channel condition is almost constant; simple calibration should work. There is no need for the dynamical track.

![Phase Offset Comparison](image)

- **wo/ Phase Offset**
  - Bit error rate (BER) $\ll 10^{-12}$
- **w/ $15^\circ$ Phase Offset**
  - BER $> 10^{-1}$

Fig. 3.11 Comparison of with or without phase offset

Phase calibration is critical to the serial interface system. 16-QAM is taken as an example. Phase offset leads to the constellation rotation, with 15-degree rotation, the data eye completely closed, BER is worse than $10^{-1}$. Phase recovery or calibration is required to achieve a reasonable eye quality and BER as Fig. 3.11 shown.

If phase calibration were realized by DPS like what traditional wireless system did, the system would end up with using high-resolution ADC, which is probably out of power budget of the most serial interface systems. ADC resolution requirement for data capture and phase
calibration are very different. Take 16-QAM as an example; only a 2-bit ADC is required for data capture, however, for phase calibration an 8-bit ADC is necessary. As calculated in Table I, it shows that 256-QAM needs 12-bit ADC ENOB running at multi-GHz. The state-of-the-art ADC consumes more than 5 Walts power to achieve this specification, which is too much for the most serial interface system.

<table>
<thead>
<tr>
<th>Mod. Scheme</th>
<th>RMS Jitter Spec (’) @ BER 10^{-12}</th>
<th>RMS Jitter Spec (ps) @ 6GHz</th>
<th>ADC ENOB Spec</th>
</tr>
</thead>
<tbody>
<tr>
<td>QPSK</td>
<td>6.4</td>
<td>2.96</td>
<td>4</td>
</tr>
<tr>
<td>QAM16</td>
<td>2.2</td>
<td>1.02</td>
<td>8</td>
</tr>
<tr>
<td>QAM64</td>
<td>1.1</td>
<td>0.51</td>
<td>10</td>
</tr>
<tr>
<td>QAM256</td>
<td>0.4</td>
<td>0.19</td>
<td>12</td>
</tr>
</tbody>
</table>

Table I: ADC ENOB specification for the different modulation schemes

![Diagram](https://via.placeholder.com/150)

Fig. 3.12 The proposed phase calibration scheme with 1-bit ADC

The proposed phase calibration scheme is shown in Fig. 3.12 [23]. Before data transfer, the lower Q-path is turned off, and a constant input is sent at upper I-path. From the equation (1) and (2) in Fig. 3.6, the I-path output at RX side and the I-path output at RX side are both sinusoidal...
signal with delta theta phase error. $\Delta \theta$ is phase delay due to the channel and is an unknown but constant value for a serial interface. It depends on channel length, substrate dielectric and channel dimensions.

\[
\begin{align*}
I \text{ - path: } &\left[-(1 + \Delta) \sin \omega t\right] \cdot \cos(\omega t + \theta) = 0.5(1 + \Delta) \cdot [\sin \theta] + 0.5(1 + \Delta) \cdot [-\sin(2\omega t + \theta)] \\
\text{LPF with gain of 2} &\rightarrow (1 + \Delta) \sin \theta \\
Q \text{ - path: } &\left[-(1 + \Delta) \sin \omega t\right] \cdot \sin(\omega t + \theta) = 0.5(1 + \Delta) \cdot [-\cos \theta] + 0.5(1 + \Delta) \cdot [\sin(2\omega t + \theta)] \\
\text{LPF with gain of 2} &\rightarrow -(1 + \Delta) \cos \theta
\end{align*}
\] (1)

The RX will sweep the theta value to calibrate delta theta out. When theta is equal to delta theta, the I-path output is 1 and Q-path output is 0. Moreover, RX will save this phase code for data transfer; this phase code rotates the constellation back. In the proposed approach, we only need a 1-bit ADC because it is only necessary to detect zero cross point, which indicates the optimal phase code. Consequently, high-resolution ADC and complicated baseband DSP are avoided.

\[
\begin{align*}
I \text{ - path: } &\left[-(1 + \Delta) \sin \omega t\right] \cdot \cos(\omega t + \theta) = 0.5(1 + \Delta) \cdot [\sin \theta] + 0.5(1 + \Delta) \cdot [-\sin(2\omega (\frac{3}{4}) t + \theta)] \\
\text{LPF with gain of 2} &\rightarrow (1 + \Delta) \sin \theta \\
Q \text{ - path: } &\left[-(1 + \Delta) \sin \omega t\right] \cdot \sin(\omega t + \theta) = 0.5(1 + \Delta) \cdot [-\cos \theta] + 0.5(1 + \Delta) \cdot [\sin(2\omega (\frac{3}{4}) t + \theta)] \\
\text{LPF with gain of 2} &\rightarrow -(1 + \Delta) \cos \theta
\end{align*}
\] (2)

The equation (3) and (4) explain the situation even with IQ imbalance; phase calibration still works well because phase offset error is decoupled from IQ mismatch. Zero crossing point is still zero crossing point. IQ imbalance just changes the slope around zero crossing point but doesn’t alter the location of zero crossing point. As long as we have sensitive comparator in ADC front-end, we can get reasonable accurate phase code.
CHAPTER 4 SYSTEM-LEVEL DESIGN SPECIFICATIONS AND LINK ANALYSIS

4.1 Link Budget Analysis

A top-level link budget analysis is the first step of wireless radio system design. It is also the initial step the cognitive and adaptive channel learning algorithm starts.

Fig. 4.1 Link budget calculation

As shown in Fig. 4.1, starting from the transmitter output power, the signal passes through the frequency-dependent lossy channel. When arriving at RX input, the received signal power needs to be higher than the RX sensitivity, which is defined in Equation (5) in dBm [24].

\[ P_{RX_{sen}} = -174 dBm/Hz + NF + 10\log B + SNR_{required} \]  \hspace{1cm} (5)

where -174dBm/Hz is thermal noise floor at room temperature, NF is the RX noise figure in dB, B is the signal bandwidth in Hz and SNRrequired is the required signal-to-noise ratio in dB for different modulation schemes.
With detected channel loss information, the cognitive controller could set TX output power level based on the Equation (6) and (7) by digitally tuning the unit current source in DAC based on link budget calculation result.

\[ P_{TX} = P_{RX, sen} + L_{CH} + Margin \quad (6) \]
\[ P_{TX} = -174 dBm/Hz + NF + 10 \log B + SNR_{required} + + L_{CH} + Margin \quad (7) \]

where \( P_{TX} \) is TX output power in dBm, \( L_{CH} \) is channel loss in dB at interested frequency and Margin is the link budget margin in dB. Table I summarized the required SNRs, bits per symbol, data rates and required (Error vector magnitude) EVMs (normalized to signal average power) for QPSK, 16-QAM, 64-QAM and 256-QAM to achieve \( 10^{-12} \) BER.

<table>
<thead>
<tr>
<th>Required SNR @ BER &lt; ( 10^{-12} )</th>
<th>QPSK</th>
<th>QAM16</th>
<th>QAM64</th>
<th>QAM256</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bits per Symbol</td>
<td>17dB</td>
<td>24dB</td>
<td>30dB</td>
<td>36dB</td>
</tr>
<tr>
<td>Channel BW</td>
<td>2</td>
<td>4</td>
<td>6</td>
<td>8</td>
</tr>
<tr>
<td>Data Rate</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
<td>1GHz</td>
</tr>
<tr>
<td>Required EVM (Norm. to Avg. Power)</td>
<td>-3dB</td>
<td>-13.3dB</td>
<td>-21.3dB</td>
<td>-29.4dB</td>
</tr>
</tbody>
</table>

Table II: Link budget merits for different modulation schemes

The link budget calculation is based on bit-error-rate (BER) requirement. Fig. 4.1 shows the BER versus SNR curves for various modulation schemes, including QPSK, 16-QAM, 64-QAM, and 256-QAM. The horizontal line marks the required signal-to-noise ratio (SNR) to achieve \( 10^{-12} \) BER, 17 dB for QPSK, 24 dB for 16-QAM, 30 dB for 64-QAM, and 36 dB for 256-QAM. Assuming from transmitter output the power is -20 dBm, with channel loss plus some margin, the received power at the input of receiver is -38 dBm. Then SNR requirement from different modulation, receiver noise figure, integration bandwidth and thermal noise floor together will determine the required RX sensitivity or Noise Figure. The link budget changes with different
channel conditions and different modulation schemes chosen. Taking 256-QAM as an example, it requires 36 dB SNR and -74 dBm receiver sensitivity.
4.2 Delay Mismatch Analysis

There are two different types of delay mismatch among channels: (1) delay mismatch between physical channels and (2) delay mismatch between multi-bands.

The physical channel delay mismatch (between different differential channels on the cable or PCB) is caused by channel design and fabrication variations, as shown in Fig. 4.2 (a). The proposed multi-band signaling is unique and has several advantages over those conventional ones when dealing with this type of delay mismatch. The forwarded clock is embedded within baseband in the frequency domain, and it travels with the data stream on the same physical channel. This feature makes the forwarded clock is capable of tracking the delay mismatch between different channels since each physical channel has its forwarded sampling clock.

For various frequency bands delay mismatch (within the same differential traces), more careful group delay analysis over all the used bands in multi-band signaling is necessary. The main contribution of this type delay mismatch comes from the channel condition and impedance matching quality. Due to the relatively low symbol rate, the group delay variance from transmitter on-chip circuits could be ignored. If taking multi-drop bus (MDB) channel as an example, as shown in Fig. 4.2 (b), the worst case of group delay variance happens at the notch frequencies. If these notch frequency bands were used as data transmission, the eye diagram quality would be degraded not only by a significant loss in magnitude response but also by large in-band group delay variance in phase response. On the contrary, the group delay is within +/- 100 ps around baseband, 3 GHz, and 6 GHz bands. To achieve the aggregated 16 Gb/s data rate, the symbol rate within one of the sub-bands is only 1 G Baud, then horizontal eye period is one ns, which is ten
times of the worst-case in-band group delay variance. The situation is also similar to another channel condition – low-cost cable channel, shown in Fig. 4.2 (c). Thus, no more delay tuning function is required in the proposed multi-band signaling architecture. However, it might be necessary if the symbol rate is further increasing.

![Diagram showing delay mismatch analysis](image)

Fig. 4.2 Delay mismatch analysis for different bands (a) different physical channel delay mismatch; (b) MDC channel insertion loss and group delay; (c) low-cost cable channel insertion loss and group delay
### 4.3 Inter-Band Interference (IBI) Analysis

Inter-band interference (IBI) is a critical specification for any multi-band system. For example, in Fig. 4.3, the 3 GHz band is the aggressor, and the 6GHz band is the victim. The victim band locates at the 2nd-order harmonic of the aggressor, which is suppressed by differential signaling. In-band IBI is created by side lobe of the aggressor, and can only be reduced by pulse shaping or filtering after DAC at TX side. An 18dB in-band IBI will present if there is no pulse shaping or filtering function block after DAC. In-band IBI could be improved to around 40dB by simple RC low-pass filtering. Apart from in-band IBI, all other interference is considered as out-of-band IBI, which can be rejected by receiver side low-pass filter.

![Normalized PSD graph](image)

**Fig. 4.3** Design specification: inter-band interference (IBI)
4.4 IQ Interference Analysis (IQI)

Another important inference source is IQ interference. It is mainly caused by I/Q phase imbalance. I/Q magnitude imbalance only causes different SNR for I/Q path and irrelevant to IQ interference. A 32-bit phase interpolator with 0.4-degree accuracy is used, equivalently it is integrated RMS jitter 200fs@6GHz as shown in Fig. 4.4. This specification is set by the 256-QAM modulation, which is toughest one.

Fig. 4.4 Design specification: IQ interference (IQI)
4.5 Carrier Phase Noise Analysis

The jitter or phase noise is a key factor, which affects the received BER significantly. As the equations are shown in Fig. 4.5, BER is calculated based on the distribution of demodulated signals on the received I/Q constellation. Taking one of received signal points as an example, errors occur when the received phasor sample falls outside a symbol decision boundary. Suppose the noise is Gaussian-distributed, then the addition of Gaussian noise creates a distribution of sample points around the mean of “ideal” symbol point. The probability density function (PDF) area under the curve beyond the symbol decision boundary represents the likelihood of that type of error. The error probability can be calculated by integrating the area from the symbol boundary to minus infinity, based on the equation (8).

\[
P(x < a) = \int_{-\infty}^{a} \frac{1}{\sqrt{2\pi}\sigma^2} \exp \left[ -\frac{(x-\mu)^2}{2\sigma^2} \right] dx \quad (8)
\]

where \(a\) is decision boundary, \(\mu\) is the mean value of a group of received symbol and \(\sigma\) is the standard deviation.

Fig. 4.5 Design specification: Jitter/Phase noise requirement for 16-QAM
CHAPTER 5 COGNITIVE CHANNEL LEARNING TRANSCEIVER ARCHITECTURE

5.1 System Architecture

The block diagram of the proposed cognitive tri-band TX is shown in Fig. 5.1. A modulation mapping block converts PRBS binary code to its corresponding digital-to-analog converter (DAC) input. For different modulation schemes, the 4-bit DACs will be supplied with various data patterns. It will also map different data patterns in phase calibration mode or channel learning mode. Placed after DACs, the analog and RF front end includes two in-phase and quadrature RF band paths and one baseband path for clock forwarding. At the last stage, all the signals from different bands are summed together and sent to the channel. A cognitive controller is designed to determine modulation scheme and carrier frequency allocation based on detected channel response. The cognitive controller also controls a carrier generation oscillator to choose carrier frequencies or sweep the carrier frequencies among the whole interested band in channel learning mode. On the RX side, a power detector and low-speed analog-to-digital converter (ADC) detect the channel response non-coherently and feed the channel information back to the cognitive controller on TX side. After detection, the cognitive controller utilizes received channel information to determine carrier frequency, calculate the link budget and further choose optimal data bandwidth and modulation scheme.
Fig. 5.1 System architecture of proposed cognitive tri-band transmitter
5.2 Cognitive Algorithm Design

The cognitive algorithm is illustrated in Fig. 5.2. The 1st step of channel learning is the non-coherent detection. The channel information needs to be sent back through a low-cost, low-speed single-ended channel. Several critical parameters are extracted by the cognitive controller, including frequency notch location, available bandwidth in each band, and channel loss profile over the whole interested band. With the extracted channel information, the 2nd step is to choose carrier frequency to avoid the high-loss notch frequencies and modulation scheme based on system data rate and BER requirement. After that, in the 3rd step, the cognitive controller calculates link budget and sets transmitter output power. The cognitive controller needs to check the look-up table for required SNR information for determined modulation scheme. In the last step, the phase calibration needs to be done for each carrier frequency before initiating the data transmission.
Step 1: Non-Coherent Channel Learning

Channel Quality Acquisition
- Frequency sweeping by CW from 0.1GHz to 10GHz @ TX
- Power detection @ RX

Channel Feature Extraction
- Frequency notches
- 3dB bandwidth of Baseband (BW₀)
- Carrier frequencies (fcᵢ)
- Bandwidth of each band (BWᵢ)
- Insertion loss of each band (Lcᵢ)

Step 2: Carrier/Modulation Decision

Features of Channel
Optimal Carrier Selection
Modulation Selection (from QPSK to 256-QAM)
Maximum Data-rate Calculation

System Specification
Change modulation scheme

> required data-rate

Step 3: Output Power Selection

Calculate link budget for each band
Set TX Output Power

Step 4: Phase Calibration for Each Band

Fig. 5.2 Cognitive algorithm flow
CHAPTER 6  CIRCUITS DESIGN OF KEY BUILDING BLOCKS

A fully differential current-mode architecture is utilized for all the circuit-level designs to suppress common mode and other even-order harmonics. It also mitigates simultaneous switching noise (SSN), supply noise, and electromagnetic noise.

6.1 4-bit Digital-to-Analog Converter (DAC)

A 4-bit DAC and a double-balanced mixer are combined to improve energy efficiency, as shown in Fig. 6.1. The 4-bit DAC is based on current steering structure and its output current ranges from 20 μA to 950 μA with around 100 mV peak voltage swing. A 1.2 V power supply instead of 0.9 V standard core voltage in 28 nm CMOS is chosen to provide more linearity headroom. A capacitor is added at DAC’s output and serves as a bandwidth limiter to alleviate the in-band IBI issue, as explained in Section III. The double-balanced mixer is composed of four passive switches so that the TX output power is proportional to DAC output current. The unit bias current of DAC is digitally tunable and set by the cognitive controller based on link budget calculation and energy efficiency optimization.
Fig. 6.1 4-bit DAC and double-balanced mixer schematic
6.2 Broad-Band Summation Block

The summation block consists of five slices as shown in Fig. 6.2, for ÷4/÷2 I/Q four RF bands and one baseband. A termination resistor with a switch is attached in series at the output to improve channel characteristics impedance matching if necessary. The block needs to sum all signals from all bands and provide broadband operation up to 8GHz. It also needs to subtract DC current to avoid desensitizing receiver front end.

The input PMOS size is 10x, and current mirrors mirror to 1x size PMOS, which sum differential input current to sense DC current,

\[ I_{1y\text{NMOS}} = 0.1 \times (I_P + I_N) = 0.2 \times I_{dc} \quad (9) \]

where \( I_{1y\text{NMOS}} \) is the current in 1y size NMOS, \( I_P \) and \( I_N \) are differential input current and \( I_{dc} \) is input DC current. Then \( I_{1y\text{NMOS}} \) is copied by 5y size NMOS, which \( I_{dc} \) is subtracted from input current.
\[ 0.1x(I_P + I_N) = 0.2xI_{dc} \]

\[ I_{out\_N} = I_N - I_{dc}, \quad I_{out\_P} = I_P - I_{dc} \]

Fig. 6.2 Broad-band summation block schematic
6.3 Receiver Front End

The receiver front end is a fully reconfigurable design, as shown in Fig. 6.3. It provides broadband operation and impedance matching coverage from 50 Ohm to 150 Ohm, to cover different channel conditions and to compensate fabrication variations. The gain-reused structure is used to boost the gain and improve sensitivity without consuming too much power. The digital-tunable regulated resistor is to improve high-frequency broadband operation. Moreover, DC bias current is also tunable.

In sum, there is a 3-bit slice enable control and 6-bit tuning for each slice. All the reconfigurable ability is to make the proposed serial interface capable of covering a lot of different channel conditions and maintaining high performance and low power even with fabrication variations.

Fig. 6.3 Receiver front-end schematic
CHAPTER 7  IMPLEMENTATION AND MEASUREMENT RESULTS
ANALYSIS

7.1 Measurement Platform

A test chip comprising carrier generation, a digital baseband controller, and the tri-band front end is fabricated in a 28nm CMOS process and occupies the 0.016mm² area. The data source is a 16-bit parallel pseudorandom binary sequence (PRBS) generator operating up to 1 GHz. A universal asynchronous receiver/transmitter (UART) interface is utilized to configure control register and monitor TX operation status.

As shown in Fig. 7.1, a commercial power detector LMX2492EVM with 12bit-ADC is used to detect received power through channels from 100MHz to 10GHz during TX frequency sweeping. Detected channel frequency response information is processed by MachX03L FPGA board, based on which the cognitive algorithm will determine carrier frequency allocation, modulation schemes, maximum achievable data rate, and other reconfigurable parameters. Two different channel conditions are tested – 10-inch low-cost differential cable by 3MTM and MDB modeled by open-stub transmission line on PCB. For RX side, a broadband power splitter (WSCH 1579), down-conversion mixers (MZ6310C), broadband 90° hybrid (KRYTAR1230), low-pass filters (SBLP 933), amplifiers (CRBAMP100) and HP 83460A as a local oscillator (LO) constitute an instrumental receiver to coherently demodulated TX output signal.
Fig. 7.1 Measurement platform
7.2 Frequency-Domain Measurement Results

The frequency domain measurement is shown in Fig. 7.2. The 1st column is transmitter output spectrum before signal passing through the channel. The 2nd column is receiver input spectrum after signal passing through the channel. The aggregated data rate here is 16 Gb/s, and baseband is configured for clock forwarding purpose by sending a half-rate clock. In Fig. 7.2 (a), the cognitive controller learns the MDB channel information and then shapes TX spectrum based on the detected channel information. The main lobe shape is maintained well after the channel. However, in Fig. 7.2 (b), the MDB channel is replaced by a low-cost cable channel and the cognitive controller channel learning feature is disabled. If the same TX spectrum is sent out, the main lobe energy and information would be corrupted after the channel. Alternatively, in Fig. 7.2 (c) channel learning option is enabled, and the cognitive controller chooses carrier frequency and data bandwidth based on channel information. The main lobe signal after the channel is maintained in good shape. Although based on two very different channel conditions, the proposed serial cognitive transmitter can learn channel information and use it to optimize configuration adaptively.
Fig. 7.2 Frequency-domain measurement analysis, (a) MDB channel with enabled channel learning; (b) low-cost cable channel with disabled channel learning; (c) low-cost cable channel with enabled channel learning
7.3 Time-Domain Measurement Results

Time domain measurement results are shown in Fig. 7.3. It demonstrates QPSK, 16-QAM, 64-QAM and 256-QAM modulation I/Q constellations and eye diagrams. The forwarded clock can be directly used to sample data without the need of PLL-based CDR. A -30dB EVM is achieved, and IQ mismatch is calibrated at the receiver side. The proposed cognitive tri-band transmitter achieved 16 Gb/s without any equalization or PLL-based CDR. The eye diagram and constellation of 256 QAM are marginal for $10^{-5}$ BER, which is limited by the instrument noise floor. For all the other modulation schemes, BER is less than $10^{-12}$.

The noise figures (NFs) of front-end splitter, passive mixer, a low-pass filter (LPF), analog baseband amplify is 6.5 dB, 7 dB, 1.2 dB, and 3.5 dB, respectively. The maximal resolution of the oscilloscope is 8 bit. Based on the specifications of instruments and discrete components, the maximal SNR can be measured 31.7dB, calculated by the equation (10).

$Max. RX input SNR = 8 \times 6.02 + 1.76 - NF_{LPF} - NF_{MIXER} - NF_{AMP} - NF_{SPLITTER}$ (10)

As shown in Fig. 7.3, the BER changed from $10^{-4}$ to $10^{-12}$ for 256-QAM if SNR changes from 32 dB to 37 dB. Consequently, the measured $10^{-5}$ BER is a reasonable result matched with calculation.
Fig. 7.3 Time domain measurement results
7.4 Power Consumption and Die Photo

The die photo and power consumption breakdown are presented in Fig. 7.4 (a) and (b). The total core area is 0.016mm$^2$ in 28nm CMOS technology with 40 μm x 300 μm for the analog front end, 50 μm x 40 μm for digital control/data generation and 50 μm x 40 μm for clock generation related circuitry. The total power consumption is 14.7 mW, 34% of which is consumed in summation block. It is the interface with the off-chip environment, and handling broadband operation up to 8 GHz. The power consumption of the controller is relatively small because it is only running at several tens of MHz for initial configuration or calibration.

Fig. 7.4 Die photo and power consumption breakdown
CHAPTER 8    CONCLUSIONS AND FUTURE WORK

In conclusion, a tri-band cognitive transmitter was implemented in 28nm CMOS technology. It demonstrated the unique capability of learning arbitrary channel response and adapt modulation scheme from NRZ or QPSK to PAM-16 or 256-QAM. It achieved 16 Gb/s data rate on MDB and low-cost cable channel conditions without using equalization. It also utilized source-synchronous or forwarded-clock scheme without increasing clock pin and channel number. It accomplished the best FoM of 20.4 µW/Gb/s/dB and occupied an area of 0.016mm².

Table III summarizes the silicon performance comparison with state-of-the-art serial interface transmitter. Compared to the other works, this work achieves 16 Gb/s per differential pair with 919 fJ/bit energy efficiency and 20.3/23.0 µW/Gb/s/dB FoM for the low-cost cable channel and MDB channel, respectively. Forwarded-clock scheme is utilized without using the extra physical channel and extra clock IO pins. The last two rows in the table -- worst channel loss (dB) within Nyquist frequency and FoM (µW/Gb/s/dB) are both related to the channel condition.
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Technology</strong></td>
<td>22nm CMOS</td>
<td>28nm CMOS</td>
<td>65nm CMOS</td>
<td>40nm CMOS</td>
<td>28nm CMOS</td>
</tr>
<tr>
<td><strong>Data rate/diff. pair</strong></td>
<td>8 Gb/s</td>
<td>13 Gb/s</td>
<td>14 Gb/s</td>
<td>7.5 Gb/s</td>
<td>16 Gb/s</td>
</tr>
<tr>
<td><strong>Signaling</strong></td>
<td>Base-band NRZ</td>
<td>Base-band NRZ</td>
<td>Base-band NRZ</td>
<td>Bi-band NRZ / QPSK</td>
<td>Tri-band QPSK/ 16/64/256-QAM</td>
</tr>
<tr>
<td><strong>Clock Synchronization Scheme</strong></td>
<td>Forwarded-clock w/ extra channel</td>
<td>Embedded Clock</td>
<td>Embedded Clock</td>
<td>Embedded Clock</td>
<td>Forwarded-clock w/o extra channel</td>
</tr>
<tr>
<td><strong>Area/Lane</strong></td>
<td>--</td>
<td>0.028 mm&lt;sup&gt;2&lt;/sup&gt;</td>
<td>0.061 mm&lt;sup&gt;2&lt;/sup&gt;</td>
<td>0.051 mm&lt;sup&gt;2&lt;/sup&gt;</td>
<td>0.016 mm&lt;sup&gt;2&lt;/sup&gt;</td>
</tr>
<tr>
<td><strong>Power</strong></td>
<td>2.56 mW</td>
<td>17.0 mW</td>
<td>12.5 mW</td>
<td>7.4 mW</td>
<td>14.7 mW</td>
</tr>
<tr>
<td><strong>Efficiency</strong></td>
<td>320 fJ/bit</td>
<td>1308 fJ/bit</td>
<td>893 fJ/bit</td>
<td>990 fJ/bit</td>
<td>919 fJ/bit</td>
</tr>
<tr>
<td><strong>Worst Channel Loss within Nyquist Freq.</strong></td>
<td>12 dB</td>
<td>35 dB</td>
<td>12 dB</td>
<td>45 dB</td>
<td>45 dB (Cable) 40 dB (MDB)</td>
</tr>
<tr>
<td><strong>FoM (µW/Gb/s/dB)</strong></td>
<td>26.7</td>
<td>37.4</td>
<td>74.4</td>
<td>22.0</td>
<td>20.4 (Cable) 23.0 (MDB)</td>
</tr>
</tbody>
</table>

Table III. Performance comparison with other state-of-the-art works
REFERENCE


