Title
Power-efficient Design of Multi-Gbps Wireless Baseband

Permalink
https://escholarship.org/uc/item/8631t4mq

Author
Park, Ji-Hoon

Publication Date
2011

Peer reviewed|Thesis/dissertation
Power-efficient Design of Multi-Gbps Wireless Baseband

by

Ji-Hoon Park

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

in

Engineering – Electrical Engineering and Computer Science

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Borivoje Nikolić, Chair
Professor Elad Alon
Professor Paul Wright

Fall 2011
Power-efficient Design of Multi-Gbps Wireless Baseband

Copyright 2011
by
Ji-Hoon Park
Abstract

Power-efficient Design of Multi-Gbps Wireless Baseband

by

Ji-Hoon Park

Doctor of Philosophy in Engineering – Electrical Engineering and Computer Science

University of California, Berkeley

Professor Borivoje Nikolić, Chair

There is a growing interest in the use of the 7 GHz of unlicensed bandwidth around 60 GHz for high-speed wireless data transfers. Complementary metal-oxide-semiconductor (CMOS) radio frequency (RF) circuits have been demonstrated to effectively operate in this band, but the challenge remains to design a complete high data-rate, energy-efficient system. With data rates of several Gb/s and short wavelengths, the baseband signal processing that compensates for the distortion of the wireless channel presents a significant challenge. This work demonstrates the design of a power-efficient baseband at different levels of abstraction from the algorithm level down to the transistor level.

A method for optimizing the equalizer architecture under power and bit-error rate (BER) constraints has been developed. This method has been used to optimize the number of equalizer taps and the distribution of signal processing between analog and digital domains. Two chips were built to demonstrate the methodology based on the IEEE wireless personal area network (WPAN) standard.

The first, fully-digital chip implements a single-carrier demodulator that minimizes the power consumption using a parallelized distributed arithmetic architecture. A 2mm × 2mm test chip in a 65 nm CMOS process implements a 6-tap feedforward and 32-tap feedback equalizer for binary phase-shift keying (BPSK) that can be configured to cancel the response of up to 72 symbols while consuming 5.6mW at 2 Gb/s throughput.

The second 1.86mm x 1.86mm chip implements a reconfigurable 4-bit ADC and 6-tap analog equalizer in addition to the digital equalizer for quadrature phase-shift keying (QPSK) demodulation. The analog preprocessor is measured to consume 1.3mW for the driver and 300 µW/tap for the analog equalization. The ADC power consumption varies from 1.2mW to 3.8mW depending on the resolution at 1.76 Gs/s. It is shown that, given a BER requirement, the mixed-signal reconfigurable receiver architecture can reduce the total link power consumption compared to a full-digital fixed transceiver depending on the propagation condition.
To my parents and truthful brother Sunghoon
Contents

List of Figures v
List of Tables ix

1 Introduction 1
1.1 Related Work ............................................. 4
1.2 Thesis Organization ........................................ 5

2 60 GHz Communication System 7
2.1 Propagation Channel ....................................... 7
2.1.1 Frequency Allocation .................................... 7
2.1.2 Channel Characteristics .................................. 8
2.1.3 Statistical Channel Model ................................. 13
2.2 Standardization ............................................. 14
2.3 Implementation Issues ....................................... 16
2.3.1 Radio frequency (RF) .................................... 16
2.3.2 Baseband (BB) ........................................... 17

3 Baseband Design 18
3.1 System Overview ........................................... 19
3.1.1 Modulation ............................................... 19
3.1.2 Frame Structure .......................................... 21
3.2 Equalization (EQ) ........................................... 21
3.2.1 Optimum Receiver ....................................... 21
3.2.2 Linear Equalizer (LE) ................................... 22
3.2.3 Decision Feedback Equalizer (DFE) ....................... 27
3.3 Channel Estimation (CE) .................................... 30
3.3.1 Least Square (LS) CE .................................... 30
3.3.2 Correlation Based CE .................................... 31
3.3.3 Adaptive CEs ........................................... 34
3.4 Synchronization ............................................. 35
3.4.1 Frequency Error Estimation ......................................... 36
3.4.2 Timing Estimation .................................................. 38
3.4.3 Synchronization Error Recovery by ADC clock adjustment .... 39

3.5 Link-level Simulation .................................................. 41

4 Mixed-Signal Power Optimization of a Baseband .................. 44
4.1 Introduction ............................................................. 44
4.1.1 Digital Limitation ................................................ 44
4.1.2 Analog Limitation ................................................ 45
4.1.3 Digital-Analog Trade-off ....................................... 47
4.2 Analysis Framework .................................................. 49
4.3 BER Performance Model ............................................. 50
4.3.1 Analysis without Linear Equalizer (LE) ....................... 51
4.3.2 Analysis with Linear Equalizer ................................ 53
4.4 Power Consumption Model .......................................... 55
4.4.1 Transmitter ......................................................... 55
4.4.2 Analog-to-Digital Converter (ADC) ............................ 56
4.4.3 Analog Decision Feedback Equalizer (ADFE) ............... 56
4.4.4 digital decision feedback equalizer (DDFE) ................. 57
4.5 Power Optimization .................................................. 57
4.5.1 Simple channel examples ..................................... 58
4.5.2 Application to the 60 GHz channels ......................... 60
4.6 Real-time Search for Optimal Configuration ..................... 64
4.6.1 Sensitivity to the implementation parameters ............... 64
4.6.2 Adaptive search for the optimal point ....................... 64

5 Digital Baseband Implementation ................................. 67
5.1 Power Consumption of Digital Circuits .......................... 67
5.2 Equalizer .............................................................. 70
5.2.1 Implementation Parameters ................................ 70
5.2.2 Hardware Architecture ....................................... 72
5.3 Channel Estimator .................................................. 77
5.4 Chip Implementation and Measurement .......................... 81
5.4.1 Test Structure ................................................... 82
5.4.2 Measurement Results .......................................... 83

6 Mixed-Signal Baseband Implementation ......................... 88
6.1 Analog Circuit Design ............................................... 89
6.1.1 ADC .............................................................. 90
6.1.2 ADFE ........................................................... 94
6.2 Chip Implementation and Measurement ......................... 96
List of Figures

1.1 Mobile Internet market growth. ........................................ 2
1.2 Wireless communications between mobile devices. ............. 2
1.3 Power saving approach in this work. .............................. 3
1.4 60 GHz CMOS RF implementations published. ................. 4
1.5 Implemented chips in this work. .................................. 5

2.1 60-GHz frequency allocation [97]. .................................. 8
2.2 60-GHz channel allocation [1]. ..................................... 9
2.3 60 GHz path loss measurement results published. ............ 11
2.4 60-GHz multipath propagation [1] and impulse response example. ................................................................. 12
2.5 Usage models (CM) and corresponding statistical channel parameters of the IEEE 802.15.3c channel model [1]. ............. 14
2.6 Parameters for a statistical channel model [1]. .................. 15
2.7 Review of communication systems. ............................... 17

3.1 General block diagram of a digital receiver baseband. ........ 19
3.2 Complexity comparison between modulations. .................. 20
3.3 Frame structure and corresponding functions and corresponding blocks. ................................................................. 21
3.4 Optimum receiver for an ISI channel and additive white gaussian noise (AWGN) [66]. .................................................. 22
3.5 LE model. ..................................................................... 23
3.6 Illustration of the noise enhancement problem in a band-limited channel. ................................................................. 24
3.7 Illustration of $Y(f)$ that shows the aliasing of the symbol-rate sampling. ................................................................. 27
3.8 Conventional DFE structure. .......................................... 28
3.9 Reduced-complexity DFE structure. ................................. 29
3.10 Autocorrelation property of the m-sequence. .................... 31
3.11 LFSR Implementation options [2] ................................. 32
3.12 4-way parallelized implementation of PRBS31. ................. 33
3.13 Golay correlator structure [65]. ..................................... 33
3.14 Preamble of 802.15.3c [33]. ......................................... 34
3.15 Adaptive channel estimation diagram. ............................ 35
3.16 Frequency estimation concept. ...................................................... 36
3.17 Mean and variance of frequency estimation. .............................. 37
3.18 Timing recovery concept. .......................................................... 38
3.19 Timing estimation in a sample-rate sampling system. ................... 39
3.20 Mean and variance of timing error estimation. ............................ 40
3.21 Mean of timing error estimation with different length pilot. .......... 40
3.22 Frequency and timing error compensation idea. ......................... 41
3.23 Simulink environment for the link-level simulation. .................... 42
3.24 BER simulation of NLOS/LOS channels. .................................... 43

4.1 Performance distribution of published ADCs [54]. ....................... 45
4.2 Simplified circuit diagram of ADFE. ........................................... 46
4.3 Analog-digital power trade-off depending on implementation scenarios. 47
4.4 Mixed-signal baseband diagram. ................................................. 48
4.5 Mixed-signal equalizer. .............................................................. 49
4.6 BER analysis step. ................................................................. 50
4.7 System model for BER analysis. ................................................ 50
4.8 Impulse response analysis model. .............................................. 51
4.9 Required SNR for BER of $10^{-2}$ calculated from impulse responses of the channel model (CM)2.3. .............................................................. 54
4.10 Power consumption surface of simplified channel scenario ............ 58
4.11 Power consumption and optimal receiver configuration for a channel with precursor ISI. .......................................................... 59
4.12 Power consumption and optimal receiver configuration for a LOS condition. .................................................. 60
4.13 Power consumption and optimal receiver configuration for an NLOS condition. .................................................. 61
4.14 Power consumption surface from the power and the BER models ... 62
4.15 Power trade-off and its implication on the BER performance ($P_{loss}=78$dB). ..... 62
4.16 Sensitivity of power consumption ((a),(c),(e)) and optimum ADC bits ((b),(d),(f)) to the implementation parameters in a NLOS condition (a),(b) with varying PA efficiency, $\eta$ (c),(d) with varying ADC power coefficient, (e),(f) with varying ADFE power coefficient, $\alpha_{ADF E}$. .................................................. 63
4.17 Algorithm diagram to tune the transceiver parameters to reach the minimal power point. .............................................................. 65
4.18 Example of tap assignment based on the proposed partitioning ....... 66

5.1 Implemented blocks ................................................................. 68
5.2 Tap assignment of the equalizer. ............................................. 70
5.3 Outage analysis of CM2.3 channel profiles with DFE of infinite number of taps 71
5.4 Floating-point simulation of equalizer. ..................................... 72
5.5 M-DFE (4-way parallelized, 24 tap DA FIR with 4 LUTs), where $\hat{x}_k$ is binary input from the slicer, $\hat{h}_k$ is the estimated impulse response used to calculate the LUT entries. ................................. 74
5.6 Dynamic tap assignment scheme. ................................. 74
5.7 S-DFE (loop-unrolled, 8 tap DA FIR with a LUT). .................. 75
5.8 Equalizer block diagram with implementation details. .............. 76
5.9 Channel estimator timing diagram. ................................ 77
5.10 Channel estimator data path with a parallelization factor of $P$. ........ 78
5.11 Channel estimator buffer operations implementing the required delay when $P=4$. ........................................ 79
5.12 Channel estimator control structure. ................................ 80
5.13 Channel estimator cell structure. .................................. 80
5.14 Minimum square error (MSE) of the channel estimator and its impact on the BER performance. ....................... 81
5.15 Block placement. .................................................. 82
5.16 Chip photo ....................................................... 82
5.17 Power breakdown by functions and elements from the electronic design automation (EDA) tool. ....................... 83
5.18 Test interface of the digital baseband chip. .......................... 84
5.19 Test diagram of the digital baseband. ............................... 84
5.20 Test environment of the chip. .................................... 85
5.21 Measured BER performance. ...................................... 85
5.22 Measured power and throughput. .................................. 86
5.23 Comparison between transmitted and measured channel impulse response when SNR=$\infty$ ........................................... 87

6.1 Block diagram of the mixed-signal chip. ......................... 89
6.2 Circuit diagram of the analog portion of the chip. ............... 90
6.3 ADC block diagram. ............................................. 91
6.4 ADC timing diagram. ............................................. 91
6.5 Comparator circuit. .............................................. 92
6.6 Resistive DAC for offset tuning. .................................. 93
6.7 ADC layout. ..................................................... 93
6.8 A flow chart of the ADC calibration. .............................. 95
6.9 ADC offset transition by control codes. ......................... 95
6.10 ADC offset histogram. .......................................... 96
6.11 Current source circuits for an ADFE tap. ....................... 97
6.12 ADFE layout. .................................................. 97
6.13 Top-level layout of the mixed-signal baseband. ................... 98
6.14 The packaged chip and the test board. ......................... 99
6.15 Measured ADC AC characteristics (Fs=1.76GHz). .............. 100
6.16 Measured ADC performance. ............................................. 100
6.17 Measured BER performance. ............................................. 101
6.18 Power breakdown. ........................................................... 101
6.19 Mixed-signal baseband summary. ....................................... 102
6.20 The power reduction by the methodology introduced in this work (\(\eta=15\%\), \(NF=7\text{dB}\), 
\(G_a=3\text{dB}\), \(P_{loss}=78\text{dB}\)) .......................................... 103
List of Tables

2.1 Propagation channel parameters assumed in this work [91]. .......................... 10
3.1 Comparison between CE algorithms. ......................................................... 34
3.2 Break-down of BER degradation. ............................................................... 42
4.1 Power consumption of 2 Gs/s, 100fJ/conv ADCs. ................................. 46
4.2 Baseline power coefficients ................................................................. 58
5.1 Chip summary .................................................................................. 87
5.2 Comparison to prior works ................................................................. 87
Acknowledgments

Working and studying with brilliant people in the superb environment at UC Berkeley was one of the most fortunate and honorable opportunities in my whole life. Most of all, my gratitude to my research advisor, Prof. Borivoje Nikolić is so sincere and deep that I can’t even find proper words to express it. Without the dedication, insight, and tolerance he showed me during my stay, I could not have reached this point, much less conducted proper research. I also would like to convey my deepest thanks to Prof. Elad Alon for the discussion and his devotion to the field of research. It was also my honor to have Prof. Ali Niknejad and Prof. Paul Wright as my qualification committee.

The Berkeley Wireless Research Center (BWRC) was the ideal place to study, discuss, and make chips. I’ve been proud of being a member of this great facility from the first day. I still vividly remember my hard feeling of pride when I first got my cubicle at BWRC. In the center, Brian taught me how to make chips literally from scratch with incredible tolerance. He hasn’t overlooked even the silliest questions of mine sent out at midnight on weekends. Tom showed me how far devotion and work ethics can reach. I also deeply appreciate Kevin, Gary, Bira, Deirdre, Olivia, and Leslie for their hard work to make the BWRC an efficient, comfortable, well-supported, and safe place to work in. Up in Cory Hall, I was always amazed by Ruth’s kindness and consistency in reaching out and helping grad students in need.

I appreciate the support and help from DCDG (or COM-IC) and BWRC students and visitors. Renaldi and Zhengya have been my role models for the grad life, research, and job search. I also would like to thank Farhana and Iffty for their kind concerns about my lonely life. I did my first tape-out with Liang-Teck, Kenny, and Lauren, whose help I really appreciate. Dusan, Bill, Radu, Seng, Vinayak, Jaehwa, Milos, Katerina, Charles, and Sharon have been good friends. Stanley and Weihung were always trying their best to help me in my research and coursework. I’m also proud of Jungdong, Nameok, Kwangmo, Kyuhyun, Shinwon, and Cheolwoong-hyung, the Korean circuit students who overcame the difficulties of being the first generation of Koreans in BWRC. The classmates I met and worked with at Berkeley always have inspired and stimulated me. They constantly remind me that the great classmates are the best asset that I get from Berkeley. I appreciate Lynn, Asako, David (Chen), Tsung-Te, and Debo as great friends. I also owe thanks to BWRC visitors. My Ph.D research topic began to take its shape out of discussion with Hideo Kasamisan and Ichiro Seto-san from Toshiba. I am also grateful to Prof. Seungjoon Lee, Kenichi-san, and Stefan for being great sources of inspiration.

I had great times with the Seoul Science High alumni at Berkeley. I learned from them how to live and survive in the US and Berkeley like toddlers do when they begin to walk. Also, whenever I have a gathering with Taeksoon, Woojae, Bumjoon, Sanghoon, Uijae, or Jungwon, I could forget my homesickness for a while by reproducing the night life of Seoul with them. I also would like to thank Daeseok, Jihyun, Hyoungjin, Taejoon, and Hyerin for their kind concern and help.

Most of this dissertation was written during my internship in Marvell Technology Group.
My gratitude goes to the kind and competent people there. Thomas, Li, and Jinho-hyung in the WiFi RFIC group generously allowed me ample opportunities to understand communication systems in a different angle while taking best advantage of my digital and communication background. I can also be proud of myself to share the same workplace with my smart friends Jihwan and Hyukjoon.

The language classes that I took originally as a hobby helped me by relieving my stress of the day and providing me with opportunities to meet good people and have healthy dose of laughter. I was also impressed by the devotion and hard work of sensei and laoshi. I appreciate Komatsu-sensei, Imagawa-sensei, Takata-sensei, Konno-sensei, and Deng-laoshi for their kindness and for giving me energy to start the day with sober and bright mind.

My research was supported in part by the Samsung scholarship. I appreciate its generous support and the chances to meet bright young people in its academic camps. Also, I would like to show appreciation for the support from C2S2, SRC, TSMC, and STMicroelectronics in various ways. Also, my studying abroad became possible by encouragement, support, and recommendation of Dr. Kyungsup Lee, Prof. Yonghwan Lee, and Prof. Sungchul Kim. I appreciate their continuing concerns, advice, and help even until today.

Last but not least, my greatest gratitude goes to my family who has been my endless source of encouragement and self-confidence even though they were thousands miles away. Mom always answered my phone calls with a comforting voice and patiently listened to all of my complaints. Without her emotional support and pep talks, I wouldn’t have made it through the desperate, dark moments and many sleepless nights. My diligent brother Sunghoon has carried out the duties that I neglected as the eldest son of the family. I owe a lot to him. I also appreciate Dad’s constant concern and encouragement. In all, this work would have been impossible without the support of my family. I dedicate this humble work to my Mom, Dad, and Sunghoon.
Chapter 1

Introduction

The advent of the high-performance mobile devices is increasing the wireless data traffic and dramatically changing the landscape of the related industry. The number of mobile subscribers has already surpassed 4 billion worldwide (Figure 1.1(a)) and mobile Internet access is increasingly common in developing parts of the world (Figure 1.1(b)). In the cellular world, Internet data traffic has already become the dominant source of revenue for service providers, replacing voice traffic. This leads to a rapid transition from the 3G cellular service to the more data-oriented 4G service with higher data rate, which brings more Internet data to mobile devices.

Also, as an increasing fraction of consumer electronics products are adding multimedia and wireless communication capability, the "last mile" problem that wired Internet access had experienced a few decades ago reemerged in the wireless world as a "last meter" problem: the devices close to end users need to exchange high-speed data generated by the Internet access and multimedia terminals without cumbersome wires (Figure 1.2).

So far, the last meter problem of the wireless connection has been addressed by the wireless local area network (WLAN) and WPAN systems such as WiFi and Bluetooth. However, Bluetooth is progressing slowly in handling the high-speed data [12] because it was originally designed to support only slow data traffic such as audio or control signals. Also, as the need for data traffic and the speed of such connections grow rapidly, the WiFi system begins to suffer from congestion due to its bandwidth limitation.

Similar to other natural resources, the radio bandwidth shortage [37] can be mitigated either by increasing the efficiency of the current usage or by exploring a new territory which was not cultivated before due to the technological barriers and/or cost. In the radio world, the spectral efficiency can be improved by increasing the modulation complexity or by opportunistic usage of the frequency band assigned to a primary user. On the other hand, the 7 GHz of unlicensed bandwidth available around 60 GHz presents itself as the new territory that can accommodate the growing wireless data traffic. The main obstacle that had delayed the commercial usage of the 60 GHz band was the high cost of the RF circuits. Fortunately, the problem was significantly relieved by the recent advance of the CMOS RF technology.
Still, however, the power consumption that comes with the high data rate of the system is the main technological challenge to be overcome in order to fulfill the need for the high-speed wireless data traffic between mobile devices in the 60 GHz band. The wireless communication system requires more sophisticated data processing than its wired counterpart, since it needs to mitigate the fading, distortion, and interference of the wireless propagation channel. The complex signal processing combined with the high operating frequency demands high power, which is scarce in the mobile devices powered by chemical batteries. It is widely believed that a recent attempt to realize a short-distance high-speed wireless connection by the UWB technology failed commercially mainly due to its high power consumption, which shows the
importance of power-efficient design for mobile devices [60],[57],[56].

Power saving in a communication system can be achieved at different levels of abstraction. Although the power that can be saved by circuit-level optimization is significant especially as the device scaling proceeds, a large part of the power saving comes from higher levels such as protocol, architecture, and algorithm. Considering the fact that the power saving in a communication system cannot be separated from the BER performance, it is necessary to find a way to minimize the power consumption of a high-speed wireless link while meeting a certain performance target. This can be achieved by optimizing the system from the algorithm and architecture level down to the device level in an integrated way.

Specifically, the channel condition of a wireless link varies in a wide range and the required hardware resources and power also change accordingly. By estimating the varying channel conditions and by operating a power-scalable hardware in the minimal power level that achieves the required performance, an optimal operation condition can be reached, rather than continuously burning the power that corresponds to the worst case condition.

This research began by selecting the modulation and receiver algorithms that are not only reconfigurable, but also power-scalable in different configurations. As the first step, a full-digital receiver has been built with various architectural features that reduces the power consumption. It is noticed that ADCs in a high-speed system consume a significant portion of the total power consumption. This power consumption can be reduced by pre-processing the analog signal before the sampling as explored in [82]. The research is extended further to find an optimal partitioning between the digital and analog processing that takes into account the BER performance. A mixed-signal chip is implemented to demonstrate the
validity of this approach. The power saving approach in this work is summarized in Figure 1.3.

1.1 Related Work

The baseband design of a wireless system is a topic with a long history that traces back to the 19th century when the electromagnetic wave began to be used for communications. Since then, various analog devices such as mechanical switches, vacuum tubes, and bipolar transistors have been used for the baseband of the radar, AM/FM radios, and TVs [43].

The baseband design shifted to full-digital implementations following the rapid advance of the CMOS integrated circuits. Those techniques were applied to a variety of wireless data communication systems such as the cellular, WLAN, and broadcasting system [41],[99]. However, the data rate of those wireless systems remained below 100 Mb/s due to its complex signal processing requirement.

On the other hand, the high-speed transceivers that reach up to tens of Gb/s were developed for the inter-chip and backplane interface mostly using full-analog circuit techniques [29],[83],[85]. The read channel receivers for hard disks show design examples of digital-analog mixed-signal implementations [86].

These techniques were applied to the high-speed 60 GHz baseband in [82], which implements an analog equalizer with an ADC with a throughput of 1 Gb/s. A full-analog equalizer targeting this band is demonstrated in [90], which shows that the large amount of
signal processing can be implemented with the analog techniques.

While there is rich literature reporting the RF circuits implementation for the band as shown in Figure 1.4(a) and 1.4(b), reports on the power-efficient baseband are relatively rare. Most of the reports on the digital baseband designs for the 60 GHz band are considering the OFDM modulation, which consumes large power in a high-speed system.

1.2 Thesis Organization

Chapter 2 of this dissertation reviews the 60 GHz band as a communication medium of commercial products. Characteristics and models of the channel are summarized followed by the review of the status of the standarization and industrial landscape. The technical difficulties to be overcome are briefly introduced as well.

Chapter 3 summarizes the theory and design principles behind the baseband for the 60 GHz receiver. This chapter reviews and justifies the modulation and architecture chosen for this work for the equalization, channel estimation, and synchronization. Link-level simulation performed to determine the architecture and parameters of the transceiver is also discussed.

Chapter 4 discusses the optimization methodology to partition the digital and analog circuits. We propose a way to set the total link power as the cost function, and compare the full-digital, the full-analog, and the mixed signal implementations of the equalizer, which shows that, depending on the channel condition, significant amount of power can be saved by the mixed-signal optimization.
Chapter 5 describes a full-digital chip that implements the equalizer and channel estimator based on the IEEE WPAN standard (Figure 1.5(a)). The chip was built to minimize the power consumption with architectural techniques.

Chapter 6 describes a mixed-signal chip implementation that demonstrates the methodology developed in Chapter 4. The chip includes reconfigurable and power-scalable ADCs and an analog equalizer in addition to the digital baseband described in Chapter 5 (Figure 1.5(b)).
Chapter 2

60 GHz Communication System

This chapter provides an overview of the 60 GHz band as a wireless communication medium. The first step to building a communication system on a new medium is to understand and characterize the medium as a propagation channel. To accomplish this, measurements are performed followed by the development of a channel model based on which the transceiver can be designed [69],[40]. Typically, a communication standard defines the operation of a commercial system. This is a unique feature of communication systems that is different from other consumer electronic products; it needs a pre-defined specification of its signal format and data exchange protocol, which makes it possible to communicate between devices from different manufacturers. Therefore, once the technology is mature and there are demands in the market, standardization bodies are formed and the standard is determined through extensive technical and political discussions.

For the 60 GHz communication system, while extensive measurement campaigns have been reported and channel models have been developed, standards are in development and there are still technical challenges to be overcome to gain popularity in the market.

In this chapter, the characteristics of the 60 GHz band as a communication channel and its modeling are reviewed in section 2.1. The on-going standardization activities are introduced in section 2.2. The channel property and the standard formed the basis of the parameters used for this work. In section 2.3, challenges involved in the circuit implementation of a 60 GHz communication system are summarized.

2.1 Propagation Channel

2.1.1 Frequency Allocation

The basic motivation of the 60 GHz research effort is the bandwidth available in the unlicensed band located around 60 GHz. In the United States, the band spans from 57 - 64 GHz, while other countries have slightly different allocations as shown in Figure 2.1. As seen
in the chart, the regional differences can be resolved by channelizing the band into 2 GHz sub-channels. As a result, all emerging standards specify the channel bandwidth to be about 2 GHz. As an example for the channelization, the channel assignment of the institute of electrical and electronics engineers (IEEE) 802.15.3c standard is illustrated in Figure 2.2 [1]. By this standard, the 9 GHz bandwidth is divided into 4 channels of 2.160 GHz including 1.728 GHz Nyquist bandwidth and guard bands. In this way, devices in EU can use all the channels, while devices in China can be assigned to use the channel A2 and/or A3.

### 2.1.2 Channel Characteristics

There are numerous publications reporting measurement campaigns conducted on the 60 GHz band [52],[100],[103],[45]. The path loss, temporal and spatial dispersion, and the Doppler shift measured in the literature are essential to determine the architecture and parameters of the power amplifier, the equalizer, antenna array, and frame structure of a communication system, respectively. The channel parameters assumed in this work were summarized in Table 2.1 and are described in this section. It is worth pointing out that
some of the parameters in the table are correlated. For example, the coherence time, $T_c$ is a time-domain representation of the Doppler shift $D_c$ while the coherence bandwidth is a frequency-domain interpretation of the delay spread, $T_D$.

### Path Loss

The 60 GHz communication system has to overcome the effects of oxygen absorption, which is about 15 dB/km at the sea level. This potentially limits the usage of the 60 GHz band for long-range communications except a point-to-point link. However, for short-range applications, the oxygen absorption has little importance. For example, a 10 m distance suffers from only 0.15 dB oxygen absorption in open spaces [82]. In this work, we examine the communication distance of the system under 10 m, which is enough for indoor WLAN or WPAN applications.

The path loss at a reference distance of $d_0$ is given from the Friis transmission equation [100],

$$PL(d_0)(dB) = 20 \log \left( \frac{4 \pi d_0}{\lambda} \right),$$

(2.1)

where $\lambda$ is the wavelength, which is 5 mm at 60 GHz in free space. The path loss over distance $d$ is expressed by,

$$PL(d)(dB) = PL(d_0)(dB) + 10n \log \left( \frac{d}{d_0} \right),$$

(2.2)

where $n$ is the path loss exponent, which is 2 for free space propagation. It is reported that, the measurement values follow this equation well when $n$ is set to be a value between 1.4 and 2.0, and $PL(d_0 = 1m)$ is about 68 - 70 dB. The Minimum Square Error (MSE) of
### Table 2.1: Propagation channel parameters assumed in this work [91].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Symbol</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>Carrier frequency</td>
<td>$f_c$</td>
<td>60 GHz</td>
</tr>
<tr>
<td>Data bandwidth</td>
<td>$W$</td>
<td>2 GHz</td>
</tr>
<tr>
<td>Communication distance</td>
<td>$d$</td>
<td>10 m</td>
</tr>
<tr>
<td>Velocity of mobile</td>
<td>$v$</td>
<td>1 m/s</td>
</tr>
<tr>
<td>Doppler shift</td>
<td>$D = \frac{f_c v}{c}$</td>
<td>200 Hz</td>
</tr>
<tr>
<td>Doppler spread</td>
<td>$D_s = 2D$</td>
<td>400 Hz</td>
</tr>
<tr>
<td>Path amplitude time scale</td>
<td>$\frac{d}{c}$</td>
<td>10 s</td>
</tr>
<tr>
<td>Path phase time scale</td>
<td>$\frac{1}{4D}$</td>
<td>1.25 ms</td>
</tr>
<tr>
<td>Path over a tap time scale</td>
<td>$\frac{c}{\sqrt{W}}$</td>
<td>0.15 s</td>
</tr>
<tr>
<td>Coherence time</td>
<td>$T_c = \frac{1}{4D_s}$</td>
<td>0.625 ms</td>
</tr>
<tr>
<td>Delay spread</td>
<td>$T_D$</td>
<td>20 ns</td>
</tr>
<tr>
<td>Coherence bandwidth</td>
<td>$W_c$</td>
<td>25 MHz</td>
</tr>
</tbody>
</table>

The equation from the measured data points varies from 1.1 dB to 8.6 dB depending on the number of measurements and the choice of the parameters [52], [100], [103]. If we take $n = 2$ and $PL(d_0 = 1 m) = 68$ dB, the path loss over 10 m is 88 dB. The path loss plots from the measurement reports are shown in Figure 2.3.

The path loss gets worse if there are no line-of-sight (LOS) components between the transmitter and the receiver. In case of non-line-of-sight (NLOS) propagation conditions, the path loss depends on the environment and materials of the medium that determines the transmission and reflection coefficients of the wave propagation. The excess attenuation by losing a direct path is reported to range from 17 dB to 45 dB [52].

### Temporal and Spatial Characteristics

In a wireless communication system, multipath propagation is a typical phenomenon that exists between a transmitter and receiver as illustrated in Figure 2.4(a). This causes temporal and spatial dispersion of the received signal.

In the time domain, the multipath represents itself as inter-symbol interference (ISI) in the impulse response, which consists of pre-cursor and post-cursor components divided by a main tap location. Figure 2.4(b) shows an example of an impulse response.

The temporal characteristics of a propagation channel can be represented by time-of-arrival (ToA) parameters including the mean excess delay, $\bar{\tau}$, the delay spread, $T_D$, and the root mean square (RMS) delay spread, $\sigma_\tau$. The parameters can be translated into the frequency domain parameters including the coherence bandwidth, $W_c = \frac{1}{2\pi T_D}$. The parameters
are defined by the following equations.

\[
\bar{\tau} = \frac{\sum_{i=1}^{N} P_i \tau_i}{\sum_{i=1}^{N} P_i}, \quad (2.3)
\]

\[
\sigma_\tau = \sqrt{\bar{\tau}^2 - (\bar{\tau})^2}, \quad (2.4)
\]

\[
\bar{\tau}^2 = \frac{\sum_{i=1}^{N} P_i \tau_i^2}{\sum_{i=1}^{N} P_i}, \quad (2.5)
\]

\[
T_D = \max_{i,j} |\tau_i - \tau_j|, \quad (2.6)
\]

where \(P_i\) and \(\tau_i\) are the power and delay of the \(i_{th}\) multipath component, and \(N\) is the number of multipath components that show up above the noise floor [91],[100].

The measured RMS delay spreads for the LOS conditions span from 4.5 ns to 34 ns depending on the measurement setup [100],[103]. This value is known to be smaller than that for other frequency bands. This is because the high frequency signal undergoes large attenuation as the path loss increases as wavelength shortens, an effect shown in (2.1), and the 60 GHz signal suffers the oxygen attenuation. However, in terms of the baseband design, what matters is the equivalent number of symbols spanned by the multipath components. For the high-speed baseband signal of 2 Gs/s, 34 ns delay spread means ISI that reaches almost 70 symbols. The delay spread gets worse when there is obstruction in the propagation path. A simulation study performed on typical indoor environments reports RMS delay spread of 65.9 ns at 10 m distance when averaged over different propagation conditions.
number corresponds to about 120 symbols [69],[21]. Equalization of this large number of taps is challenging for a high-speed receiver, especially when the system is targeting mobile application where the power consumption has to be kept at the minimum.

One way to deal with the delay spread is to exploit the spatial characteristics of the channel. As shown in Figure 2.4(a), the multipath components have the spatial correlation that can be exploited to reduce the burden of the equalizer by spatial signal processing with the directional antenna. The angle-of-arrival (AoA) parameters were extensively measured and reported in [100], in which the angular spread, Λ is defined as,

\[ \Lambda = \sqrt{1 - \frac{|F_1|^2}{F_0^2}} \]  

(2.7)

where

\[ F_n = \int_0^{2\pi} p(\theta) \exp(jn\theta) d\theta, \]  

(2.8)

which is the nth order term of the Fourier transform taken on the angular distribution of multipath power, \( p(\theta) \). This parameter ranges from zero to one, with zero representing a LOS case with a single signal propagation component, and one denoting a case when the angular power is uniformly distributed [100]. The measurement shows that \( \Lambda \) spans from 0.12 to 0.86 depending on the setup, which means that the signal is sometimes highly scattered in the spatial domain. The directional antenna is beneficial in the scattered case. Antenna arrays are also convenient to implement in the 60 GHz communication system because the antenna dimension is in the order of a few mm at 60 GHz. However, in terms of the power consumption, the spatial signal processing of the antenna array and the temporal signal
processing by the baseband equalizer need to be carefully compared. This is because, while duplicated RF chains for the array can be expensive in terms of the design cost and time as well as the power consumption, increasing complexity of the baseband equalizer leads to high baseband power consumption. Also, it is worth noting that the beamforming doesn’t eliminate the need for the baseband equalizer since the spatial signal processing alone can’t compensate for the temporal dispersion that comes from the channel.

**Doppler Shift**

The Doppler shift, \( D \), is a channel parameter that represents how fast the propagation channel changes. It depends more on the usage scenario of the system rather than the physical characteristics. For example, in the indoor WLAN or WPAN applications, the Doppler shift is mostly limited by the maximum speed people carrying the transceivers can move or by how fast the environment changes around the transceivers, which is less than 1 m/s at most. This rate of change corresponds to the Doppler shift of 200 Hz, which is roughly a frequency that makes the phase change of 360°. If we assume that a slot consists of 256 symbols at the 2Gs/s data rate, the phase change within a slot can be calculated as,

\[
\frac{\Delta \theta}{\text{slot}} = \frac{256 \text{ symbol}}{1 \text{ slot}} \cdot \frac{1 \text{ s}}{2G \text{ symbols}} \cdot 360^\circ \cdot 200 \text{ Hz} \\
\cong 9.2 \times 10^{-3}^\circ,
\]

which is negligible. Moreover, in this usage model, even if the propagation channel changes abruptly, it happens only sporadically and the data can be recovered by retransmission of the data and training sequence. Accordingly, in this work, we assume that the channel is quasi-stationary within a slot or frame period of the data burst.

**2.1.3 Statistical Channel Model**

An impulse response of a wireless channel can be predicted with good accuracy using the ray-tracing technique given accurate information about a propagation environment [21]. However, to design a communication system, a channel model needs to represent wide ranges of channel realizations.

The Saleh-Valenzuela (S-V) model and its variants are the most popular statistical models that have been used for wireless channel modeling [75]. The model is based on an observation that the multipath components arriving at a receiver are clustered and the power profile follows an exponential decaying profile on average. If the arrival of the clusters is assumed to be independent, the inter-arrival time between the clusters shows the exponential distribution and the arrival time is modeled by a Poisson process. Accordingly, channel impulse responses can be statistically generated from a few parameters such as the Poisson arrival rate, \( \lambda \) and the decaying exponent, \( \sigma \). The model can be expanded to the spatial domain with the additional parameter of the RMS angular spread, \( \sigma_\phi \). These parameters represent the average
behavior of the channel and can be extracted from measurement results [69],[6]. A statistical channel model based on measurement results in the 60 GHz band and a modified S-V model is adopted in the IEEE 802.15.3c standard body [1]. In the model, the channels are categorized by usages scenarios such as office, library, residential, and kiosk environments. The statistical parameters extracted from the measurement in the corresponding environment are used for the channel realizations. Figure 2.5 shows the CM environment and corresponding channel parameters, where $\Lambda$ and $\lambda$ represent the arrival rates of the clusters and the rays within a cluster, respectively. Also, $\Gamma$ and $\gamma$ are the RMS delay spreads of the cluster and the ray. Similarly, $\sigma_{\text{cluster}}$ and $\sigma_{\text{ray}}$ are the power decaying exponent, and $\sigma_{\varphi}$ is the RMS angular spread. An example of a channel realization from this model and parameter is illustrated in Figure 2.6. This channel model is used to simulate and determine the receiver parameters in this work.

### 2.2 Standardization

Several standards are competing for the commercial 60 GHz communication system. The first generation of the standards such as IEEE 802.15.3c [1], WirelessHD [98], and ECMA-387 [31] were designed to support uncompressed high-definition video and high-speed data
transfers between wall-plugged devices such as set-top boxes, TVs, and kiosks [21]. Emerging
next generation standards such as IEEE 802.11ad [30] and WiGig [97] were mostly initiated
by the WLAN community as an extension of the current 802.11 WLAN systems, which are
more inclined to general data transfers between mobile devices.

WirelessHD is an effort to make a wireless replacement of HDMI that connects TVs and
set-top boxes. The consortium is supported by consumer electronics companies including
Panasonic, NEC, Samsung, LG, Sony, Philips, and Toshiba. The first standard was finalized
in January 2008, and the products compliant with the standard were commercialized with
a data rate of more than 4 Gb/s. The improved WirelessHD 1.1 was released in April 2010.
The PHY layer of the standard is based on the OFDM PHY of the IEEE 802.15.3c. Because
the standard was intended for wall-plugged devices, the products based on this standard are
known to consume relatively high power [22].

The IEEE 802.15.3c standard activity was initiated by the formation of the millimeter
wave Interest Group (mmWIG) within IEEE 802.15 WPAN in July 2003. The mmWIG
was elevated to IEEE 802.15 Study Group 3c (SG3c) in March 2004 and to IEEE 802.15
Task Group 3c (TG3c) in March 2005. The standard (802.15.3c-2009) was published and
ratified by IEEE on September 2009. The standard supports both single-carrier and OFDM
modulation for its PHY layer. The modulation coding set (MCS) of both modulations spans
from BPSK to 16-QAM with low-density parity check (LDPC) or RS error control coding.
Depending on the class, data rates from 1.5 Gb/s to 3.0 Gb/s were specified. Beamforming is
also supported in the standard. Although the standard was not widely adapted by industry
and the task group went into hibernation after the release, its basic structure is succeeded
by later standards.

The Wireless Gigabit Alliance (WiGig) is an industry-driven organization formed by
PC, semiconductor and WLAN companies including AMD, Intel, Broadcom, Marvell, and Samsung. It is targeting wireless connections between handheld devices and PCs. The WiGig was created in May 2009, and its first standard was announced in December 2009. Its specification version 1.1 was released in June 2011. There is a close coordination between WiGig and the IEEE 802.11ad standard; the WiGig standard is confirmed to be the basis for the 802.11ad, which is scheduled to be finalized in December 2012 [30]. The 802.11ad is envisioned to be the next-generation WLAN following the 802.11ac, which is expected to replace the 802.11n within 1-2 years [97]. The PHY structure of WiGig standard is basically the same as that of the 802.15.3c standard with the support of both single-carrier and OFDM, which is why WiGig could finish its standardization in a short amount of time. For handheld and mobile devices, more emphasis is put on the single-carrier modulation in the standard. Prototypes are expected to go into interoperability testing in 2011, and real products to appear in 2012 [97].

The focus of this work is the single carrier modulation. Wherever possible, we attempt to conform to the single carrier PHY specification of the IEEE 802.15.3c and 802.11ad.

2.3 Implementation Issues

2.3.1 Radio frequency (RF)

In addition to the increasing demands for data traffic and more spectrum, the recent interest in commercial 60 GHz communication systems was triggered by the advance of the CMOS RF technology that is a cheaper replacement of the traditional silicon-germanium (SiGe) and gallium-arsenide (GaAs) process. This advance owes to the device scaling, which increased the transition frequency, $f_T$ of the device to several hundreds of GHz [26]. The CMOS process that is compatible with the digital process also enables a single-chip solution that includes the RF, digital, and baseband analog circuits all in one die [72],[21],[55].

However, the implementation of CMOS RF circuits operating in the 60 GHz band presents its unique challenges. First, additional device modeling has to be done because most of the device models do not support frequency operations at frequencies as high as 60 GHz. Also, the power amplifier design is a challenge because it has to efficiently deliver large power with large devices while working at 60 GHz. Fundamentally, all the RF blocks suffer from the fact that it is hard to get enough gain from a given gain-bandwidth product that is limited by the $f_T$ of the device.

Fortunately, most of these challenges have been resolved in clever ways and there are plenty of publications that report CMOS RF solutions working well [47],[19],[17],[73],[88],[79]. As of now, it is widely accepted that the CMOS RF technology is mature enough to be commercialized.
2.3.2 Baseband (BB)

The main challenge to the baseband implementation of the 60 GHz communication system is the high symbol rate that is unprecedented in traditional wireless communication systems. While the high-speed IO link is advanced to support as high as 40 Gs/s, the data rate is too high to efficiently implement the complex signal processing required for wireless communications. Figure 2.7 shows the contemporary communication systems that are designed for its own operating scenario and data rate requirement. The figure shows that, as of now the maximum data rate of a wireless system is below 100 Mb/s. To make things worse, the high symbol rate of a wireless system increases the effective delay spread and make the equalization more complicated. If the signal processing is implemented with digital circuits, the high-speed ADC required is another challenge. Things get harder for mobile applications, because lowering the power consumption is critical for the battery-based devices. Deciding the optimal digital-analog partition that minimizes the power consumption is, therefore worth investigating. The recent advance of the circuit techniques and device technology make it possible to implement ADCs up to 6 bits with several tens mW for a sampling rate of 2 Gs/s [93],[4].

Figure 2.7: Review of communication systems.
Chapter 3

Baseband Design

As device scaling proceeds and digital signal processing techniques improve, there is tendency that more functionality of a communication receiver is implemented in digital circuits. For wireless receivers where extensive signal processing is necessary to combat the channel impairments, most of the contemporary baseband implementations are dominated by digital circuits [99],[41].

The basic elements of a digital baseband are illustrated in Figure 3.1. The most essential component of the receiver is an equalizer that erases or compensates the ISI of the multi-path channel. The equalization is done in the frequency domain in the orthogonal frequency division multiplex (OFDM) system or with frequency domain equalization (FDE) in a single carrier system. In that case, the equalizer performs both fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) operation in the receiver. The time-domain equalizer is a typical choice in a single-carrier system, however. A synchronizer estimates and compensates the frequency and timing error. The baseband also needs a channel estimator that extracts the channel information for the equalizer and synchronizer. Although blind equalization and blind channel estimation without pilot signal overhead have been researched for a long time, no blind equalizer is reported to be used in a practical wireless system [32].

In this work, it is assumed that a down-converting direct conversion mixer is included the RF circuits. Also, the output of the baseband is assumed to be connected to a channel decoder that corrects random bit errors. Among many candidates for the channel codes, the LDPC is proven to fit well for a high-speed communication system given its parallel nature [102].

This chapter deals with the theory and structures of the baseband elements. In section 3.1, the modulation and frame structure that are chosen for this work are discussed. Section 3.2 and 3.3 review the algorithms and structures that can be used for the baseband functions and the architectural choice for this work. Finally, section 3.5 shows the software simulation results with the selected algorithms.
CHAPTER 3. BASEBAND DESIGN

3.1 System Overview

The modulation type deeply affects the receiver algorithms and architecture. Also, a frame structure that defines the timing allocation of the pilot signal and preamble constrains the synchronization algorithms and performance.

3.1.1 Modulation

The IEEE WPAN standard includes both the single-carrier and OFDM modulation options. While OFDM has its advantages in its relative immunity to multipath propagation, it results in higher system power. On the transmit side, the high peak-to-average ratio (PAR) of the OFDM signal requires the power-amplifier back-off to maintain its linearity. On the receive side, high-resolution ADC and FFT blocks must be operated regardless of the channel conditions because there is no way to scale down the equalization. Finally, it is generally hard to adaptively turn off the channel coding even in a mild multipath condition in an OFDM system.

Alternatively, the single-carrier receiver lends itself to reconfigurability and its power consumption scales with the actual channel conditions, which is beneficial in high-speed communication systems with a stringent power budget.

The complexity comparison of different equalization schemes is illustrated in Figure 3.2. As illustrated, OFDM consumes constant power regardless of the channel condition because it has to perform the FFT and IFFT all the time. On the contrary, the LE and DFE can scale its power consumption depending on the channel conditions. From the implementational point of view, the DFE can be constructed with less resources than the LE would require. The theories behind the LE and DFE will be discussed in section 3.2.2 and section 3.2.3,
respectively.

The FDE is another way to equalize single-carrier signals [23],[94],[89],[20]. A system with the FDE has a structure similar to an OFDM system except that the IFFT located in the transmitter of an OFDM system is moved to the receiver. Although this can relieve the transmit PAR problem of the OFDM system, the FDE still needs to perform FFT and IFFT regardless of channel conditions, which is the same problem the OFDM has in a high data-rate system. The FDE for the 60 GHz baseband might be meaningful considering the fact that the FFT engines can be useful to support both OFDM and single-carrier modulation options of the standards. However, considering the system overhead of having the FFT engine and logics required to share the data path with different data rate, it is challenging to make FDE power efficient.

BPSK and QPSK modulation are selected in our implementation to simplify the design while demonstrating the key concepts. Higher-order modulation schemes such as high-order quadrature amplitude modulation (QAM) are not considered in this work because in the NLOS channels, the required signal-to-noise ratio (SNR) often cannot be achieved even with an ideal equalizer. On the other hand, in a simple propagation condition, a very simple equalizer based on a 1-bit comparator is enough to achieve the BER performance required by the system, which will be described in section 3-2.

Figure 3.2: Complexity comparison between modulations.
3.1.2 Frame Structure

The single-carrier modulation options of emerging standards share the similar frame structure shown in Figure 3.3. The preamble consists of the SYNC pattern with short pilots for initial synchronization of frequency and timing and the channel estimation sequence (CES) for initial channel estimation. To track the time variance of the channel and to maintain the frequency and timing synchronization, there are short pilot patterns inserted into the data (TS) [1]. Also, there is an inter-frame spacing (IFS) period specified between the preamble and the data bursts stretching several microseconds (several thousand of symbols), which is used to accommodate the latency of the initial, coarse estimators. The IFS can also be used to pre-calculate parameters that are constantly used in the data bursts, and thereby save power consumption. This is the motivation of using distributed arithmetic (DA) in the implementation of equalizers in this work [95],[76],[74].

3.2 Equalization (EQ)

Equalization is a baseband function that compensates the distortion of a propagation channel. The distortion presents itself as ISI in the receiver. In this sense, a equalizer is a receiver block that cancels or compensates ISI of the received signal [66].

3.2.1 Optimum Receiver

The optimal receiver that detects a sequence of data symbols that are corrupted by the ISI is a maximum likelihood sequence detector (MLSD) under the maximum likelihood (ML)
Figure 3.4: Optimum receiver for an ISI channel and AWGN [66].

criterion [38]. The ML sequence detector can be implemented by a Viterbi detector, which is also used for a convolutional channel decoder. The Viterbi equalizer, which is the Viterbi detector used as an equalizer, has been a popular solution of the equalization for the digital communication of early generation such as the global system mobile, groupe spécial mobile (GSM) system.

Figure 3.4 illustrates the optimum receiver for a received signal with a symbol duration, $T$, corrupted by AWGN and an ISI channel. It assumes that the data, $x_k$ is sent from a transmitter through a channel, $h(t)$. The front-end of the optimal receiver consists of a matched filter, $h^*(t)$ that maximizes the SNR after sampling, and a noise-whitening filter that whitens the noise spectrum colored by the matched filter. It also eases the BER calculation in the following blocks. The whitening filter can be regarded as a precursor equalizer that eliminates precursor ISI generated by the matched filter [38]. The Viterbi detector operates as a MLSD and its output is the estimated transmit sequence, $\hat{x}_k$.

Although it is optimum in terms of performance, the optimal receiver has several problems from the implementation point of view. First, the matched filter requires accurate knowledge of the channel impulse response, $h(t)$ and presumes that the impulse response is not changing with time [38]. In practice, especially in a wireless communication system, it is a condition that can hardly be met. Even if the impulse response can be estimated using a channel estimator as discussed in section 3.3, a small estimation error or fading can deteriorate the performance significantly. Consequently, in most practical wireless communication systems, the sampler output directly feeds into an equalizer as shown in Figure 3.1.

The second problem of the optimum receiver is that the computational complexity of the Viterbi detector grows exponentially with the length of the channel delay spread. As the system assumed in this work has a channel delay of several tens of symbols, it is hard to realize in a practical sense. The complexity also increases as the modulation order grows. Therefore, we need to seek for suboptimal equalizers that have reasonable complexity with acceptable performance degradation.

3.2.2 LE

A linear equalizer is a suboptimal equalizer that can be used instead of the Viterbi detector. It is modeled and implemented as a linear finite impulse response (FIR) filter [66]. Its input
is \( v_k \) which is the output of the noise-whitening filter in case of the optimum receiver or the
sampler output in a practical receiver. Its output is the estimated signal sequence \( \hat{x}_k \) (Figure
3.5). If we express its complex-valued coefficients as \( w_m \), the input and output of the filter
becomes,

\[
\hat{x}_k = \sum_m w_m \cdot v_{k-m}.
\]

A bit error occurs when \( \hat{x}_k \) is not identical to \( x_k \). Although it is necessary to find \( w_m \) that
minimizes the bit error rate, unfortunately, it is highly nonlinear function of \( w_m \). Therefore,
we seek to optimize the \( w_m \) using a practical criteria. Among others, two approaches
are widely used. The first one is the zero-forcing (ZF) criterion that minimizes the peak
distortion. The other one is the minimum mean-square error (MMSE) that minimizes the
MSE.

**ZF Criterion**

The zero forcing criterion tries to force all the ISI components of the equalizer output to
be zero. If we define the peak distortion as the worst-case ISI, the ZF criterion minimizes
the peak distortion. Let \( f_m \) represent a discrete-time equivalent impulse response of all the
filters that the transmit signal undergoes before the equalizer, which includes the transmit
pulse shaping filter, \( g(t) \), the channel impulse response, \( h(t) \) and the sampler as shown in
Figure 3.5 (it can also include the matched filter and the noise-whitening filter if we assume
an optimum receiver). To have zero ISI, the combined response, \( q_m \) of this lumped impulse,
\( f_m \) and the ZF equalizer, \( w_m \) have to satisfy the following condition in the time domain,

\[
q_m = \sum_i w_i \cdot f_{m-i} = \begin{cases} 
1 & \text{if } m = 0 \\
0 & \text{if } m \neq 0.
\end{cases}
\]
By taking z-transform on both sides, we obtain,

\[ Q(z) = W(z) \cdot F(z) = 1, \quad (3.3) \]

where [59]

\[ Q(z) = \sum_{m=-\infty}^{\infty} q_m z^{-m}. \quad (3.4) \]

The \( W(z) \) and \( F(z) \) are defined similarly. Therefore, from (3.3), the equalizer response \( W(z) \) has to satisfy,

\[ W(z) = \frac{1}{F(z)}. \quad (3.5) \]

If we regard the noise-whitening filter as a part of the equalizer, the response of the extended equalizer becomes,

\[ W'(z) = \frac{1}{F(z)F^*(z^{-1})} = \frac{1}{G(z)}. \quad (3.6) \]

In both cases, it simply means that the ZF equalizer inverts the frequency response of the channel distortion. Therefore, if there is a spectral null in the channel response, the gain of the ZF equalizer at the point approaches infinity and the noise component is amplified, leading to zero SNR as illustrated in Figure 3.6, where there are spectral nulls at the edges of the signal band. This problem is called noise enhancement. To mitigate the performance degradation caused by this problem, a better criterion that takes into account the noise needs to be considered.
CHAPTER 3. BASEBAND DESIGN

3. MMSE Criterion

In the MMSE criterion, the equalizer coefficients $w_m$ are optimized to minimize the MSE of the error of the estimated signal that includes the noise [66],

$$\varepsilon_k = \hat{x}_k - x_k.$$  \hspace{1cm} (3.7)

The cost function $J(w)$ can be expressed as,

$$J = E[|\varepsilon_k|^2] = E[|\hat{x}_k - x_k|^2],$$ \hspace{1cm} (3.8)

where the $E[\cdot]$ is an expected value. By the orthogonality principle [28] and (3.1), the solution that minimizes the $J(w)$ has to follow,

$$E[\varepsilon_k v_{k-l}^*] = 0, \quad -\infty < l < \infty$$ \hspace{1cm} (3.9)

$$E\left[\left(\sum_m w_m \cdot v_{k-m} - x_k\right) v_{k-l}^*\right] = 0.$$ \hspace{1cm} (3.10)

This leads to,

$$\sum_m w_m E[v_{k-m} v_{k-l}^*] = E[x_k v_{k-l}^*].$$ \hspace{1cm} (3.11)

Because the noise is whitened by the noise-whitening filter, the expectation of the left-hand side can be written as,

$$E[v_{k-m} v_{k-l}^*] = \sum_n f_n^* f_{n+l-m} + N_0 \delta(l-m),$$ \hspace{1cm} (3.12)

where $\delta(\cdot)$ is Kronecker delta function and $N_0$ is noise spectral density. We assume that the transmit signal and the channel response are uncorrelated in our system model (no transmit precoding). Also, because it can be assumed that the signal is equi-probable ($E[x_k] = 0$), and the noise has zero mean ($E[n_k] = 0$), the right-hand side of (3.11) becomes,

$$E[x_k v_{k-l}^*] = f_{-l}^*.$$ \hspace{1cm} (3.13)

By plugging-in (3.13) and (3.12) into (3.11) and taking z-transform, we obtain

$$W(z) \left(F(z)F^*(z^{-1}) + N_0\right) = F^*(z^{-1})$$

$$W(z) = \frac{F^*(z^{-1})}{F(z)F^*(z^{-1}) + N_0}.$$ \hspace{1cm} (3.14)

Similar to the ZF case, if we include the noise-whitening filter as a part of the equalizer, the extended response of the equalizer becomes,

$$W'(z) = \frac{1}{F(z)F^*(z^{-1}) + N_0} = \frac{1}{G(z) + N_0}.$$ \hspace{1cm} (3.15)
By comparing (3.15) and (3.6), we can see that the only difference between two solutions is \( N_0 \) in the denominator. This makes sense because when the noise power approaches zero, the MMSE solution should be close to the ZF solution as there is only small noise enhancement.

In a practical implementation as shown in Figure 3.1, where there is no matched filter nor the noise-whitening filter, we can also lump the pulse shaping filter into the channel impulse response, \( h(t) (F(z) = H(z)) \). By taking the discrete Fourier transform rather than z-transform, the discrete-time equivalent response of the MMSE equalizer coefficient can be expressed as,

\[
W_n = \frac{H_n^*}{|H_n|^2 + N_0},
\]

where \( h_m \) is the discrete-time equivalent response of \( h(t) \) and \( N \) is the FFT block size. \( W_n \) and \( H_n \) are discrete Fourier transform of \( w_m \) and \( h_m \), respectively, i.e.,

\[
W_n = \sum_{m=0}^{N-1} w_m \exp\left(-j\frac{2\pi mn}{N}\right)
\]

\[
H_n = \sum_{m=0}^{N-1} h_m \exp\left(-j\frac{2\pi mn}{N}\right)
\]

Again, it can be verified that the solution matches with the ZF solution when there is no noise.

**Fractionally Spaced Equalizer (FSE)**

In the receiver structures we discussed so far, an assumption was made that the received signal is sampled at the symbol rate and the synchronization is ideal. A shortcoming of the equalizer with the symbol rate sampling, however, is that it is sensitive to errors in the timing recovery. One way to see the problem is to express the sampler output, \( y_k \) in the frequency domain with a sampling phase error, \( \tau_0 \)[66];

\[
Y(f) = \frac{1}{T} \sum_n G\left(f - \frac{n}{T}\right) \exp\left(j2\pi(f - \frac{n}{T})\tau_0\right),
\]

where \( T \) represent the symbol period. In the equation, the aliased signal component of the \(\exp\left(j2\pi\tau_0\right)\) is the term that cannot be compensated by an equalizer. This problem is illustrated in Figure 3.7(a), which shows the aliasing problem when the signal is sampled at the rate of \( \frac{1}{T} \).

One way to deal with this problem is to control the out-of-band component tightly in the transmit pulse shaping filter. Increasing the sampling rate in the receiver is another way to mitigate problem. Figure 3.7(b) illustrates the frequency response when the signal is sampled at \( \frac{M}{N}\cdot T \) where \( M > N \). The equalizer that works at this oversampled symbol rate,
fractionally spaced equalizer (FSE), therefore, can compensate the sampling phase error, and shows performance improvement [11],[66].

The price that has to be paid is, however, the additional power consumption necessary to increase the operating frequency of the ADC and digital signal processing circuits. Also, by increasing the sampling rate, the FSE effectively decrease the span of the filter taps. In other words, the FSE effectively decreases the maximum excess delay that the equalizer can compensate. Therefore, the adoption of the FSE needs to be carefully evaluated considering the trade-off between the additional resources necessary for the FSE and complexity reduction of the timing recovery block; if we can implement a better timing recovery block with less resources than what is needed for the FSE, the adoption of the FSE might not be an optimal solution.

3.2.3 DFE

The decision feedback equalizer consists of a linear equalizer for its feedforward part and a feedback part that uses the decision output to erase the post-cursor components of the ISI. A conventional DFE that has an $A$-tap feedforward filter and a $B$-tap feedback filter is
illustrated in Figure 3.8. The estimated signal can be expressed as,

\[ \tilde{x}_k = \sum_{m=1}^{A} w_{ff,m} v_{k-m} - \sum_{m=1}^{B} w_{fb,m} \hat{x}_{k-m}. \]  

(3.19)

The analysis of this structure is somewhat involved because the feedback loop makes the DFE operation nonlinear. However, it has been proven that the MMSE solution that minimizes the cost function (3.8) can be calculated [66]: First, the feedforward filter coefficients are basically the MMSE solution of the pre-cursor components. If we represent the z-transform of the pre-cursor lumped response, \( f_k \) as \( F_{pre}(z) \),

\[ F_{pre}(z) = \sum_{m=1}^{A} f_m z^{-m}, \]  

(3.20)

the feedforward filter solution becomes similar to (3.14) as,

\[ W_{ff}(z) = \frac{F_{pre}^*(z^{-1})}{F_{pre}(z) F_{pre}^*(z^{-1}) + N_0}. \]  

(3.21)

The \( w_{fb,m} \) can be obtained after the feedforward filter is calculated by taking convolution of the precursor lumped response and the feedforward coefficients, \( w_{ff,m} \),

\[ w_{fb,k} = \sum_{m=1}^{A} w_{ff,m} f_{k-m}, \quad k = 1, 2, \ldots, B. \]  

(3.22)

By implementing the part of the equalizer in the feedback path, the DFE structure can be implemented with reduced complexity because the input of the feedback filter is the hard
CHAPTER 3. BASEBAND DESIGN

Figure 3.9: Reduced-complexity DFE structure.

decision output, $\hat{x}_k$ with limited number of levels, especially when the modulation order is not so high. A look-up table based implementation [95] is a good candidate that can reduce the power consumption of the feedback filter as its implementation will be shown in Chapter 5. The computational complexity to get the coefficients can be reduced by utilizing DFT and IDFT in the frequency domain [39]. Another way to reduce the complexity is to move the feedforward filter into the feedback loop [7], [24] (Figure 3.9). By relocating the feedforward filter, it is not necessary to perform the convolution of (3.22) for the feedback coefficients; the post-cursor components of the precursor lumped response ($f_{A+1}$, $f_{A+2}$, ..., $f_{A+B}$) can be directly used as the coefficients. In a real system, the post-cursor response can be obtained by a channel estimation as will be introduced in the section 3.3.

The coefficients can also be obtained by adaptive algorithms such as least mean square (LMS) and recursive least square (RLS). However, it has been reported that the convergence time of the learning curve of those adaptive algorithms increases as the number of the equalizer taps grows. The convergence time of the learning curve also depends on the eigen-spread or the conditional number of the channel response, which increases the uncertainty of the communication link. Generally, the non-recursive approach of getting equalizer coefficients by estimating the channel has been shown to have a performance advantage over the adaptive equalization method [39]. Therefore, for the 60 GHz baseband application, a non-adaptive equalizer with a channel estimator was chosen for this work as will be shown in Chapter 5.

A drawback of the DFE-based receiver is a problem known as error propagation. If a decision error is made in the slicer of the DFE, the error propagates through the delay line input of the feedback filter and causes more errors. Although the DFE has performance advantage over LE without error propagation, the problem causes BER performance degradation of 1 - 2 dB. In a practical implementation, the complexity and performance trade-off between the DFE and LE has to be carefully balanced. However, in a high-speed applications such as the
60 GHz baseband, the significance of the complexity reduction of the DFE easily outweighs the cost of its BER degradation by the error propagation. Forward error correction (FEC) capability of the channel decoder also helps to relieve this issue.

### 3.3 Channel Estimation (CE)

Channel estimation is a baseband function that estimates an impulse response of the propagation channel so that equalization and synchronization parameters can be properly adjusted to minimize the power consumption while maintaining a target performance. There are several types of algorithms that are suitable for the wireless channel of interest.

#### 3.3.1 Least Square (LS) CE

In the least square (LS) approach, the estimator attempts to minimize the squared difference between the received training signal and the assumed noiseless signal \[34],[18]. Assume that there is a training sequence, \( A = [a(0), a(1), \cdots, a(N - 1)] \) of length \( N \), and the channel has \( L \)-tap impulse response denoted as \( h = [h(0), h(1), \cdots, h(L - 1)] \). The received training signal, \( r \) can be represented as \[9],

\[
\begin{bmatrix}
    a(L) & a(L-1) & \cdots & a(0) \\
    a(L+1) & a(L) & \cdots & a(1) \\
    \vdots & \vdots & \ddots & \vdots \\
    a(N) & a(N-1) & \cdots & a(N-L)
\end{bmatrix}
\begin{bmatrix}
    h(0) \\
    h(1) \\
    \vdots \\
    h(L-1)
\end{bmatrix}
+ n \triangleq A \cdot h + n, \tag{3.23}
\]

where \( n \) denotes additive noise. The LS estimation of \( h \) can be found by minimizing the squared error,

\[
J(h) = (r - A \cdot h)^T (r - A \cdot h) \\
= r^T r - 2r^T A h + h^T A^T A h. \tag{3.24}
\]

The minimum \( J(h) \) can be found by taking a gradient of the function as,

\[
\frac{\partial J(h)}{\partial h} = -2A^T r + 2A^T A h. \tag{3.25}
\]

Setting it to be zero leads to the LS estimation of the channel,

\[
\hat{h} = (A^T A)^{-1} A^T r. \tag{3.26}
\]

The right-hand side that includes \( A \) can be precomputed and stored to be used on-line, which is convenient when the length of the sequence is not very long. For example, in the
GSM system, the LS CE is a popular solution for the channel estimation since \( N \) is only 26. However, as the channel length to be estimated grows, it becomes necessary to have a longer training sequence. In our application where the channel length \( L \) can reach up to 100, and the \( N \) is several hundreds, the LS CE is not a suitable solution.

### 3.3.2 Correlation Based CE

While the LS CE does not assume anything about the training sequence, and therefore an arbitrary pattern can be used as the sequence, the computational complexity can be reduced by utilizing useful mathematical properties of some sequences. For example, an m-sequence, \( c(i) \) of length \( N \), has the following autocorrelation property [41]:

\[
\rho_c(k) = \frac{N + 1}{N} \delta(k) - \frac{1}{N} \quad \text{if } k = 0
\]

\[
= \begin{cases} 
1 & \text{if } k = 0 \\
-\frac{1}{N} & \text{if } k \neq 0,
\end{cases} 
\](3.27)

where

\[
\rho_c(k) = \frac{1}{N} \sum_{i=0}^{N-1} c(i)c(i + k) \quad 0 \leq k \leq N - 1, 
\](3.28)

and \( \delta(k) \) is Kronecker delta function. The autocorrelation function, \( \rho_c(k) \) is illustrated in Figure 3.10. Assuming that the propagation channel is a linear time invariant (LTI) system, and we send out the m-sequence through the channel, \( h \) with \( L \)-taps. The received signal can be represented,

\[
r(n) = \sum_{l=0}^{L-1} h(l) \cdot c(n - l). 
\](3.29)
If we calculate correlation between the received signal and the m-sequence, we can get an estimated channel impulse response, $\hat{h}$ as,

$$
\hat{h}(n) = \sum_{i=0}^{N} r(n+i) \cdot c(i) \\
= \sum_{i=0}^{N} \left( \sum_{l=0}^{L} h(l) \cdot c(n+i-l) \right) c(i) \\
= \sum_{l=0}^{L} h(l) \left( \frac{N+1}{N} \delta(n-l) - \frac{1}{N} \right) \\
= \frac{N+1}{N} h(n) - \frac{1}{N} \sum_{l=0}^{L} h(l),
$$

(3.30)

which shows that the impulse response can be estimated with a DC offset and a scaling factor that can be minimized by increasing the length of the sequence, $N$.

The m-sequence can be generated by a primitive polynomial over the finite field GF(2) [44],[2],

$$G(X) = g_0 + g_1 X + \cdots + g_{m-1} X^{m-1} + g_m X^m.
$$

(3.31)

This polynomial can be implemented by linear feedback shift register (LFSR) in either Galois or Fibonacci implementation shown in Figure 3.11. Although both implementations can be implemented in hardware with relatively low complexity, the Galois implementation
Figure 3.12: 4-way parallelized implementation of PRBS31.

Figure 3.13: Golay correlator structure [65].

is preferred for applications with low-latency requirement as it does not need to have deep logic that the Fibonacci implementation has for its cascaded adders (Figure 3.11(a)).

The LFSR can be parallelized for high-speed applications that need a parallelized data path as shown in Figure 3.12, which shows the implementations of $G(X) = X^{31} + X^{28} + 1$, PRBS31 [80].

The m-sequence also can be used as a random number generator that mimics data traffics. Actually, the parallelized LFSR shown in Figure 3.12 is used as a data source for bit error rate test (BERT) of the baseband implementation introduced in Chapter 5.

The computational complexity of the pseudo-random (PN) correlator can be reduced if complementary sequences such as Golay and Chu sequence are used. The Golay sequence correlator has less complexity because it can be implemented by a pulse compressor as shown in Figure 3.13 [25],[65],[13]. The 60 GHz standards under development adapt the Golay sequence for their CES and training sequence (TS). The properties of the Golay sequence and high-speed implementation of the Golay correlator will be discussed in Chapter 5.

When one designs a training sequence to estimate the channel impulse response with the correlation based CE within the data stream, the prefix and suffix need to be attached before and after the sequence. That is to make the environment that the correlator sees be same for all data bits. Otherwise, the channel estimation results are corrupted by correlation values from data rather than the pilot. This principle has been applied to the CES of 60 GHz
CHAPTER 3. BASEBAND DESIGN

3.3.3 Adaptive CEs

An adaptive filter can be used to estimate the channel. This application of an adaptive filter is a well-known modeling and system identification problem that is also important in control and signal processing systems such as geophysical exploration, as well as communication systems [96].

The application of the adaptive algorithms in the DFE was briefly discussed in section 3.2.3. The LE coefficients also can be derived using an adaptive algorithm. Even if we use adaptive equalizers, however, the channel estimation has to be considered separately because the channel estimation has other functionality than calculation of equalizer coefficients such as providing information for synchronization and beamforming.

Although there are plenty of variants in the adaptation algorithms, the LMS and RLS

and GSM standards [97],[1]. Figure 3.14 shows the preamble structure of the draft IEEE 802.15.3c standard. The cyclic postfix \((a_{post}, b_{post})\) and cyclic prefix \((a_{pre}, b_{pre})\) are attached to the main sequence \(a_{128}, b_{128}, a_{256}, b_{256}\).

The comparison between the LS, PN correlation based, and Golay correlator based CE algorithms that discussed so far is summarized in Table 3.1. The 60 GHz system is adapting the Golay correlator mostly owing to its stringent hardware constraints.

### Table 3.1: Comparison between CE algorithms.

<table>
<thead>
<tr>
<th></th>
<th>Computation</th>
<th>MSE</th>
<th>Hardware</th>
</tr>
</thead>
<tbody>
<tr>
<td>LS</td>
<td>(O(N^2))</td>
<td>small</td>
<td>(N^2) memory</td>
</tr>
<tr>
<td>PN correlator</td>
<td>(O(N^2))</td>
<td>medium</td>
<td></td>
</tr>
<tr>
<td>Golay correlator</td>
<td>(O(N \log_2(N)))</td>
<td>large</td>
<td></td>
</tr>
</tbody>
</table>

Figure 3.14: Preamble of 802.15.3c [33].
algorithms are most frequently used ones and the LMS algorithm is virtually the only adaptive algorithm that can be used in the high-speed communication system we are targeting, considering the power consumption and the complexity of implementation.

The channel impulse response can be derived by training the FIR model of the channel using a known transmit signal. The known signal also can be obtained from a decision-directed operation. A diagram of the adaptation is illustrated in Figure 3.15, which is basically same as a diagram for system identification [96].

In this work, however, the adaptive CE is not considered because of the long convergence time, the uncertainty of convergence for a channel with large delay spread or low SNR, which is the same reason that the adaptive DFE is not used in this work.

### 3.4 Synchronization

Baseband synchronization refers to the estimation and compensation of the frequency and timing error between a transmitter and receiver. The frequency recovery block estimates the frequency error coming from the difference of oscillator frequencies of the transceiver. The timing recovery estimates the best phase of the sampling clock so that the SNR of the sampler can be maximized [51],[50],[53],[38],[3],[101],[84].

There are a variety of ways to achieve the synchronization, which strongly depends on the modulation, channel characteristics, and system requirement [5]. The simplest way to achieve the synchronization is to send a clock separate from the data stream. Although it is a commonly used practice in the high-speed IO link [14] and broadcasting systems [99], most of the wireless systems cannot afford it because of the high cost for the additional bandwidth needed.

On the other extreme side, there are synchronization techniques that do not rely on a separate clock or pilot signal. As for frequency synchronization, decision-directed frequency
recovery [38] is a way that estimates the phase rotation coming from the frequency error based on decision values similar to the DFE. This method, however, does not work well in a channel with large ISI components and in a low SNR, BER range. For timing synchronization, a spectral-line method [38] is another blind algorithm that extracts a timing tone by bandpass filtering. The problem of this method in our target application is that the oversampling needed for this method is too expansive.

Considering the system constraints of our application (large ISI components, expensive oversampling), the synchronization algorithms that make best use of the pilot signal (TS) specified in most of the 60 GHz standards [97],[1] are summarized in following section 3.4.1 and 3.4.2

### 3.4.1 Frequency Error Estimation

The frequency difference between a transmitter and receiver can be estimated by measuring the phase difference between two symbols with a known temporal difference. In case of the signal being corrupted by ISI and noise, the channel estimation output can be used for the phase measurement instead of the data symbols. In the 60 GHz standards, TS signals are inserted periodically within the data stream, which can be utilized to estimate the impulse response (Figure 3.16). Assuming that the channel profile is not changing rapidly so that a tap with the maximum amplitude represents the same propagation path, the phase difference of the taps, $\Delta \theta$ can be easily translated into a frequency error, $\Delta f$ as,

$$
\Delta f = \frac{1}{2\pi} \frac{\Delta \theta}{\Delta t} = \frac{1}{2\pi} \cdot \frac{f_{sym}}{N_{slot}} \cdot \Delta \theta,
$$

where $f_{sym}$ is the symbol-rate of the system and $N_{slot}$ notes the number of symbols in a slot including the TS and data. An additional benefit of using the maximum amplitude
taps is that we can maximize the SNR and minimize the variance of the estimation. This scheme, of course, assumes that the phase difference is less than $2\pi$, which can be ensured by initial frequency synchronization utilizing the SYNC sequence in the preamble [97],[1] (Figure 3.14).

Figure 3.17 shows simulated performance of the frequency error estimator in different channel environments. The mean and variance are calculated out of 100 measurements from 101 slots. The mean of the estimation in Figure 3.17(a) shows that the estimation is unbiased even in channel conditions with severe ISI components. The $\pm 50$ppm limitation is specified in the standard and reflected in the length of the slot: if the slot length gets too long, the phase difference can exceed $2\pi$, which prevents a correct frequency error estimation. The variance of the estimation is limited by the channel SNR and the quantization noise from the ADC. Figure 3.17(b) shows gradual degradation of the variance as the SNR and ADC resolution decrease. Averaging can be performed to reduce the variance further. Because the averaging increases the latency of the frequency recovery loop (Figure 3.1), the level of averaging has to be carefully tuned to ensure the stability of the loop while maintaining the frequency synchronization.

The 60 GHz standards specify that the data stream can be sent without TS depending on negotiation between the transmitter and the receiver [1]. In case when the channel has a strong LOS path and high SNR, TS could be omitted and the frequency error estimation can be done with the decision-directed mode.
3.4.2 Timing Estimation

The timing recovery ensures that the phase of the ADC sampling clock is tuned to maximize the SNR of the sampled signal. Choosing a clock phase other than the optimal one can degrade SNR significantly because there is a pulse shaping filter and channel bandwidth limitation that shape the received pulse [51]. As shown in Figure 3.18, the sampling clock with the phase $\phi_0$ samples the maximum signal value with the maximum SNR while sampling with other phases degrades the SNR [38], [51].

Estimating the best sampling phase or timing error in a symbol-rate sampling receiver assumed in this work is not a trivial task. Given the fact that we can utilize the TS in our target application, one way to find the best sampling is by interpolating the TS, calculating the correlation in different phases and selecting the phase that has the maximum correlation value [5]. As illustrated in Figure 3.19(a), the method requires an interpolation filter that inserts intermediate samples between the TS signals sampled at the symbol rate. Also, it is necessary to have multiple correlators as many as the number of phases that needed to be searched. All these hardware resources are expensive especially in a system with high data rate such as our application working at 2 Gb/s.

One way to find the best sampling phase with reduced hardware is to evaluate one phase at a time and sweep the phase in different slots as illustrated in Figure 3.19(b). Despite its hardware reduction, this method assumes that the channel is static, and the timing recovery loop can endure the additional latency. A structure of a clock generator that can switch the phase in a short amount of time, which is necessary in this method, can be implemented by a phase-locked loop (PLL) with ring-oscillator based voltage controlled oscillator (VCO) as illustrated in Figure 3.22. Transient behavior of the clock generator and its interaction with the frequency recovery loop have to be carefully examined in the design phase.

The mean and variance of the timing error estimator with the pilot based correlator is illustrated in Figure 3.20. The timing phase is estimated out of four candidate phases in different channel conditions. The number of phases to be searched could be tuned in the design phase depending on the specific implementation parameters such as the pulse shaping filter characteristics. Figure 3.20(a) shows that the estimator is unbiased throughout different
channel conditions. Figure 3.20(b) shows that the variance of the estimator depends on the ADC quantization noise, the channel SNR, and the channel conditions. The variance gets worse as the channel has large ISI components.

The impact of the number of pilots on the timing error estimator is shown in Figure 3.21. While Figure 3.20 is from the case with 256 pilot symbols per slot, Figure 3.21(a) shows that 32 pilot symbols per slot also work well in high ISI environment. The timing error estimator begins to degrade as the number of pilots decreases down to 16 symbols in high ISI channel as shown in Figure 3.21(b). Therefore, considering the characteristics of the synchronization blocks, the number of pilot symbols in a slot needs to be carefully negotiated between the transmitter and receiver depending on the channel condition. If the channel has a strong LOS path or is in a high SNR range, the timing recovery can be omitted or performed in the decision-directed mode without needs for TS, similar to the frequency error estimator.

### 3.4.3 Synchronization Error Recovery by ADC clock adjustment

The compensation of the frequency and timing error can be done by adjusting the frequency and phase of the ADC sampling clock as shown in Figure 3.22. The frequency offset can be adjusted by changing the division ratio using a ΣΔ modulator inside the PLL loop.
The optimal timing phase can be chosen from different timing phases if the VCO is based on the ring-oscillator structure. The compensation gets more complicated if we have an analog preprocessor in front of the ADC such as the ADFE. The analog compensator structure has been investigated and implemented in [82].

Figure 3.20: Mean and variance of timing error estimation.

Figure 3.21: Mean of timing error estimation with different length pilot.
3.5 Link-level Simulation

Link-level simulations are performed in the Simulink environment to determine and evaluate different architectures and to decide the implementation parameters. The top-level design view of the simulation platform is shown in Figure 3.23, which shows the equalizer, channel estimator, and synchronizer in the model. The simulation platform also supports a fixed-point simulation mode, in which the wordlengths of the digital signal and ADC can be optimized.

Examples of the BER simulation results are shown in Figure 3.24. The receiver parameters are optimized to have less than 1 dB performance degradation from the ideal receiver. Figure 3.24(a) shows the parameters and BER curves for an IEEE CM2.3 channel profile, which has a strong LOS path, while Figure 3.24(b) is drawn from an IEEE CM2.3 profile with strong NLOS ISI components. It can be seen from the curves that the hardware resources required vary significantly by the channel conditions, which motivated our research to find the optimal in the trade-off of the performance and hardware resources. An implementation of a power scalable receiver that can adjust itself depending on the channel condition is described in Chapter 6.

The link-level simulation is also useful to identify a critical block that limits the overall system performance. Table 3.2 shows the contribution of each receiver block to the BER performance degradation in the operating condition of Figure 3.24(b). The receiver can be optimized by identifying critical blocks and allocating proper amount of hardware resources.
Figure 3.23: Simulink environment for the link-level simulation.

<table>
<thead>
<tr>
<th>Error source</th>
<th>degradation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Finite EQ taps</td>
<td>&lt; 0.1dB</td>
</tr>
<tr>
<td>Quantization</td>
<td>~ 0.6dB</td>
</tr>
<tr>
<td>CE error</td>
<td>~ 0.3dB</td>
</tr>
<tr>
<td>DFE error propagation</td>
<td>~0.1dB</td>
</tr>
</tbody>
</table>

Table 3.2: Break-down of BER degradation.
CHAPTER 3. BASEBAND DESIGN

Figure 3.24: BER simulation of NLOS/LOS channels.
Chapter 4

Mixed-Signal Power Optimization of a Baseband

4.1 Introduction

In addition to the benefits such as noise immunity, EDA support and ease of design, digital circuits enjoy the benefit of the process scaling. Accordingly, baseband signal processors of most of the contemporary wireless communication systems such as digital television (DTV), WiFi, and cellular baseband are implemented in digital circuits [99].

4.1.1 Digital Limitation

The high-speed digital baseband requires multi-Gs/s ADCs that come with significant power consumption. Figure 4.1 shows the performance and power distribution of recently published ADCs presented in conferences [54], where the figure-of-merit (FOM) is defined as,

\[ FOM = \frac{P_{ADC}(W)}{2^{\text{ENOB}} \cdot f_s(Hz)} \]  

where ENOB stands for the effective number of bits, and \( f_s \) the sampling frequency of the ADC while the \( P_{ADC} \) is the total power consumption of the ADC. As the figure shows, except some ADCs with extreme operating conditions, most of the ADCs show a FOM of 100 fJ/conv or worse. Table 4.1 shows the power consumption of an ADC with 100 fJ/conv FOM with different ENOB. It can be seen that the ADC consumes significant amount of power if a system demands high resolution ADCs. It is exacerbated for higher modulation constellations and complex channel responses since it increases the required wordlength of the equalizer and the ADC.
4.1.2 Analog Limitation

The ADC requirement and the complexity of the digital signal processing can be relaxed if the analog signal is preprocessed in front of the ADC, so reduce the dynamic range of the analog signal. In [82], ADFE and analog synchronization blocks were implemented to demonstrate this concept. A full-analog implementation of the baseband is popular in the high-speed wired link for the backplane and inter-chip communication.

A circuit diagram of the ADFE modeled as a one-pole system is drawn to illustrate the analog limitation in Figure 4.2. The ADFE load capacitance at the input of the ADC, $C_{EQ}$ is expressed in (4.2) as summation of the interconnection capacitance, $C_L$, driver capacitance that is proportional to driving current, $I_D$ ($a \cdot I_D$), and the capacitance of the taps proportional to the driving current and the number of taps ($b \cdot N_{tap} \cdot I_D$) as follows:

$$C_{EQ} = C_L + a \cdot I_D + b \cdot N_{tap} \cdot I_D.$$  \hspace{1cm} (4.2)

The ADFE also has the bandwidth requirement to support the 2 GHz symbol rate. The speed of the ADFE can be specified as a unit-gain frequency, $\omega_u$, which is expressed with
### Table 4.1: Power consumption of 2 Gs/s, 100fJ/conv ADCs.

<table>
<thead>
<tr>
<th>ENOB</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>3-bits</td>
<td>1.6 mW</td>
</tr>
<tr>
<td>4-bits</td>
<td>3.2 mW</td>
</tr>
<tr>
<td>5-bits</td>
<td>6.4 mW</td>
</tr>
<tr>
<td>6-bits</td>
<td>12.8 mW</td>
</tr>
<tr>
<td>7-bits</td>
<td>25.6 mW</td>
</tr>
</tbody>
</table>

By arranging the relations in terms of $I_D$ as (4.4), it can be seen that the driving current necessary to meet the speed requirement approaches singularity when the number of the taps reaches a certain point ($N_{\text{tap}} = \frac{1}{k}$).

\[
\therefore I_D = \frac{C_L}{V_{DSAT} - \omega_u} \cdot \frac{1}{1 - k \cdot N_{\text{tap}}} \propto \frac{1}{1 - k \cdot N_{\text{tap}}}. \tag{4.4}
\]

It not only means that there is a limit in the number of taps implementable, but also means that the power consumption increases rapidly beyond a certain point. This is the reason why the number of taps in [82] is set to be 16 taps. Ref. [90] significantly increased the number of taps by cascoding the devices of the equalizer taps.
CHAPTER 4. MIXED-SIGNAL POWER OPTIMIZATION OF A BASEBAND

4.1.3 Digital-Analog Trade-off

There is a trade-off between the digital and analog implementation of the equalizer, as illustrated in Figure 4.3. On one extreme, as shown in the left side of the plot, an equalizer can be built in full-digital as described in Chapter 5 \[62],[63\]. Although it doesn’t involve the overhead of analog circuitry, it results in high power consumption in the digital circuits and ADCs. The ADCs need to have high resolution and high power consumption because they need to be used to equalize complex ISI profiles that result in high dynamic range. On the other hand, a full analog equalizer would consume high power in the analog circuits to meet the bandwidth and requirement for the number of taps as explained in section 4.1.2. Therefore, the minimum power is achieved in somewhere between those two extremes. This is basically a problem of where to put the ADCs in a receiver (Figure 4.3).

The optimal partitioning between analog and digital circuits has been a common problem in high-speed systems such as hard disk read channels, high-speed IO for backplane \[15],[42],[36\], DTV \[99\] and wireless baseband \[92],[58\]. Usually, systems with challenging speed requirement are implemented in analog circuits. As the complexity of its signal pro-

Figure 4.3: Analog-digital power trade-off depending on implementation scenarios.

On the contrary, power consumption of digital can be reduced by a variety of techniques such as parallelization, pipelining, and table look-up with memory as discussed in Chapter 5 \[62],[63\].
cessing goes up and the digital circuits get faster as the device scales, the analog blocks are gradually replaced by digital circuits [41]. Finally, full-digital implementations dominate the segment. This pattern historically has been taking place repeatedly.

The same pattern may happen in the 60 GHz baseband equalizer, whose mixed signal implementation is illustrated in Figure 4.4. Although the implementation of the 60 GHz baseband equalizer so far has been dominated by analog circuits, following this historical pattern, the digital implementations will gradually replace them as device sizes continue to shrink. Interestingly, the data rate of the 60 GHz system lies between the conventional fully digital systems and the high-speed wired links, which are dominated by analog implementations (Figure 2.7).

However, a question that has to be answered in this process is how to determine the optimal partition for the particular implementation with a given technology and system architecture. In a wireless communication system, the partition has to take into account the performance parameters such as BER performance; power reduction makes sense only when a target performance is achieved.

In the baseline equalizer structure shown in Figure 3.2.3, the DFE part can be implemented in both digital and analog domains as illustrated in Figure 4.5, which provides a good framework to analyze the digital and analog trade-off in terms of power consumption and BER performance. In the equalizer, the analog-digital partition is determined by the number of taps handled by each analog and digital parts of the equalizer and the quantization levels of the ADC (Figure 4.5).

The number of quantization levels in the digital signal processor can be determined based on empirical Monte-Carlo simulations or signal-to-quantization noise ratio (SQNR) computations [78],[87]. For the simulation-based methods, unfortunately, there are no straightforward methods to determine the post-equalization BER degradation analytically. Similarly, using the SQNR as a metric to determine the quantization levels can be misleading because the
quantization noise is not random, and affects performance in a different way than the thermal or interference noise.

In this chapter, as an effort to find the optimal trade-off between digital and analog circuits, first, we propose an analysis framework in section 4.2 that defines the relationship between the BER performance and the link power consumption. A BER expression for given receiver parameters in the model is derived in section 4.3, followed by a power model that relates this BER expression to the actual circuit power consumption of the link. The application of these models to the 60 GHz channel and the equalizer is presented in section 4.5 [64].

4.2 Analysis Framework

The key steps for the analysis are summarized in Figure 4.6. Basically, it is to find the link power consumption given the receiver configuration, channel impulse response, and BER target. The basic receiver configuration parameters considered here are (1) ADC bit resolution, \( B \), (2) the number of taps for ADFE, \( NTAP_A \), and (3) the number of taps for DDFE, \( NTAP_D \). Once the BER performance is derived in step1, the required SNR can be determined (step2), by which the power required in the power amplifier of the transmitter can be calculated. The receiver power consumption also can be calculated using the receiver parameters and added up to get the total power consumption (step3). This procedure can be repeated until it reaches an optimal configuration. In the following sections, a BER performance model is developed for step1, and the power model is introduced for step3.
4.3 BER Performance Model

A system model is set up to analyze the BER performance as illustrated in Figure 4.7. In the model, \( m \) is a time index, the BPSK signal \( s_m \) is transmitted to a propagation channel with an impulse response, \( h_n \), and additive noise, \( n_m \). The ISI components in the received signal, \( b_m \) are first erased by an ADFE of \( NTAP_A \) taps. The ADFE output, \( r_m \) is quantized by an ADC with \( B \)-bit resolution with a quantization step, \( \Delta \) and added up with a DDFE output, \( D_m \), resulting in \( c_m \), which goes into a LE with \( L \)-tap coefficients (\( l_n \)). The LE output is used as a slicer input and a final hard decision is made. Ideal synchronization is assumed in the model.

Although the ADFE and DDFE perform the same function of erasing the post-cursor ISI, there is a difference involving quantization error from ADC since the ADC is placed between the two blocks as illustrated in Figure 4.8: The ADFE resolution is not limited by the ADC quantization, while the DDFE output has residual error that can’t be erased even if it had perfect channel estimation. Ideally, the ADFE could erase the post-cursor perfectly.
with its infinite amount of resolution. Although practical ADFE resolution is limited by analog impairments, in the range of interest for this application (4-6 bit ADC), the ADC dominates the overall quantization error.

Past references, [36], have discussed performance trade-off involving LE-DFE combination but only in a qualitative manner. In addition to reduced-complexity techniques for BER performance computation from a channel impulse response [10], this work incorporates the ADC quantization analysis and the equalizer structure to express the BER.

The analysis procedure is illustrated in Figure 4.8. In the first step, selected ISI components from the channel impulse response are erased using the ADFE. After that, BER can be derived that includes residual ISI and ADC quantization, which can be translated into required SNR and transmit power that achieve a target BER. The minimum power point of the total link power consumption can be found after iterating this procedure with different receiver configurations.

4.3.1 Analysis without Linear Equalizer (LE)

To demonstrate the analysis method, we first derive a BER expression for the equalizer without using linear equalization, where the main tap is the first element of the impulse response \((n = 0)\) and there are no precursor components. For simplicity, the error propagation of the DFE is not considered. If desired, error propagation can be modeled as a Markov chain as introduced in [81]. By using the system model from Figure 4.7, the signal received in the baseband front-end, \(b_m\), is a function of the transmitted signal, \(s_m\), the impulse response, \(h_m\), and the sampled noise, \(n_m\), as follows:

\[
b_m = \sum_{n \in \{TAP_A \cup TAP_D\}} h_n s_{m-n} + n_m, \tag{4.5}\]

where \(m\) is a time index. \(TAP_A\) and \(TAP_D\) are the sets of tap indexes assigned to the analog filter and the digital filter, respectively. \(r_m\) is the signal after the \(TAP_A\) ISI components are cancelled by ADFE.
erased by the ADFE taps:

\[ r_m = \sum_{n \in \text{TAP}_D} h_n s_{m-n} + n_m \]

\( \triangleq a_m + n_m. \)  

(4.6)

The ADC quantizes \( r_m \), including the noise, \( n_m \). If the ADC has a resolution of \( B \) bits, the quantized signal \( q_m \) becomes,

\[ q_m = \left( i + \frac{1}{2} \right) \Delta \quad \left( -2^{B-1} \leq i \leq 2^{B-1} - 1, i \in \mathbb{Z} \right), \]

(4.7)

where \( \Delta \) is the ADC quantization step. The summation of the replica of the ISI, \( D_m \) is generated by the channel estimator output, \( \hat{h}_n \). The slicer output is the same as the transmitted signal if there are no propagated decision errors from previous symbols,

\[ D_m = - \sum_{n \in \text{TAP}_D, n \neq 0} \hat{h}_n s_{m-n}. \]

(4.8)

This replica is subtracted from the quantized signal and fed into the slicer (assuming that the linear equalizer is an one-tap filter with a unit gain, \( L = 1 \)),

\[ c_m \triangleq q_m + D_m. \]

(4.9)

If BPSK modulation is assumed for simplicity, the BER for equi-probable signal can be expressed as,

\[ P_{e|a_m} = \frac{1}{2} P(c_m \leq 0|s_m > 0, a_m) + \frac{1}{2} P(c_m > 0|s_m \leq 0, a_m). \]

(4.10)

Under Gaussian noise with a variance \( \sigma^2 \), the first term in the right-hand side can be expressed as [66],

\[ P(c_m \leq 0|s_m > 0, a_m) = \sum_{i = -2^{B-1}}^{2^{B-1}} P \left( q_m = \left( i + \frac{1}{2} \right) \Delta, \left( i + \frac{1}{2} \right) \Delta + D_m \leq 0 \mid s_m > 0, a_m \right) \]

\[ = \sum_{i = -2^{B-1}}^{2^{B-1}} P \left( q_m = \left( i + \frac{1}{2} \right) \Delta | s_m > 0, a_m \right) \]

\[ = \sum_{i = -2^{B-1}}^{-\frac{D_m}{\Delta}} P \left( i \Delta - a_m \leq n_m \leq (i + 1) \Delta - a_m | s_m > 0, a_m \right) \]

\[ = \int_{-\infty}^{\left\lfloor \frac{1}{2} - \frac{D_m}{\Delta} \right\rfloor \Delta - a_m} \frac{1}{\sqrt{2\pi} \sigma} e^{-\frac{y^2}{2\sigma^2}} dy \]

\[ = 1 - Q \left( \frac{\left\lfloor \frac{1}{2} - \frac{D_m}{\Delta} \right\rfloor \Delta - a_m}{\sigma} \right), \]

(4.11)
where \(Q(\cdot)\) is the Q-function, \(\lfloor \cdot \rfloor\) is the floor operator. The expression can be further simplified using the property of the flooring operation \((\lfloor -a \rfloor = -\lceil a \rceil)\) and the Q-function \((Q(-a) = 1 - Q(a))\) as,

\[
P(c_m \leq 0|s_m > 0, a_m) = Q \left( \frac{-\frac{1}{2} + \frac{D_m}{\Delta}}{\sigma} a_m \right).
\] (4.12)

The second term of (4.10) can be calculated similarly and the conditional BER expression becomes

\[
P_{e|a_m} = \frac{1}{2} Q \left( \frac{-\frac{1}{2} + \frac{D_m}{\Delta}}{\sigma} a_m \right) + \frac{1}{2} Q \left( \frac{\frac{1}{2} - \frac{D_m}{\Delta}}{\sigma} a_m \right).
\] (4.13)

With equi-probable binary signal \(s_m\) and the ADFE pre-cancellation, \(NTAP_D\) is the number of symbols involved in the ISI. For a particular response characterized by the tap-vector \(S\) with \(TAP_D\) elements, the BER can be calculated by taking an average over possible combinations of \(S\) for a given channel impulse response as [66]

\[
P_e = \sum_S f(S) P_{e|a_m},
\] (4.14)

where \(f(S)\) represents the probability of a particular realization of \(S\). Because the elements of the sequence are binary numbers and the number of the elements is \(NTAP_D\), \(f(S)\) is a constant, \(\frac{1}{2^{NTAP_D}}\) and the bit error probability can be expressed and calculated as follows:

\[
P_e = \sum_{i=1}^{2^{NTAP_D}} \frac{1}{2^{NTAP_D}} P_{e|a_m}.
\] (4.15)

Figure 4.9 shows the required SNR, \(SNR_{req}\) that achieves the BER of \(10^{-2}\) for the channel impulse responses generated from the IEEE 802.15.3c propagation models (CM2.3) [1] with varying analog-digital partitioning and the ADC resolution, without LE. The planes in the figure show that the \(SNR_{req}\) increases as the ADC resolution and/or the \(NTAP_A\) decreases. This is because lowering the ADC resolution and/or the number of the analog taps increases the quantization noise and degrades the BER performance while increasing the \(SNR_{req}\).

### 4.3.2 Analysis with Linear Equalizer

The derivation from the previous section can be generalized to add multiple taps in the LE \((L > 1)\) with coefficients, \(l_n\). This is accomplished by modifying the slicer input and the decision rule (4.9) to be,

\[
c_m = \sum_{n=1}^{L} l_n (q_{m-n} + D_{m-n}) \leq 0.
\] (4.16)
We can introduce vector notations for channel output, \(a_m\), and quantized signal, \(q_m\), to represent the elements in the delay line of the LE as follows:

\[
\begin{align*}
\mathbf{A} & \triangleq [a_m, a_{m+1}, \cdots, a_{m+L-1}] \\
\mathbf{Q} & \triangleq [q_m, q_{m+1}, \cdots, q_{m+L-1}].
\end{align*}
\]  

(4.17)

Similar to the previous section, the BER expression can be derived and numerically calculated as following (4.18):

\[
\begin{align*}
P_{e|\mathbf{A}} &= \int \cdots \int f_{\mathbf{Q}|\mathbf{A}}(\mathbf{Q}) P_{e|\mathbf{Q},\mathbf{A}}(\mathbf{Q}) d\mathbf{Q} \\
P_e &= \int \cdots \int f_{\mathbf{A}}(\mathbf{A}) P_{e|\mathbf{A}}(\mathbf{A}) d\mathbf{A},
\end{align*}
\]  

(4.18)

where \(f_{\mathbf{A}}(\mathbf{A})\) is a discrete probability density function (pdf) of the channel output, \(\mathbf{A}\), \(f_{\mathbf{Q}|\mathbf{A}}(\mathbf{Q})\) a pdf of the quantized signal, \(\mathbf{Q}\) given \(\mathbf{A}\), by which the conditional bit error probability, \(P_{e|\mathbf{A}}\) and the final bit error probability, \(P_e\) can be calculated.

Inclusion of the LE in the analysis is important for the application of interest. The analysis is confined to a digital implementation of LE because the coefficients of an analog
LE are hard to control and the time span of the precursor may be too long for analog implementation. Further, the DFE coefficients are affected by the analog implementation of the LE, making them hard to predict.

4.4 Power Consumption Model

Based on the performance model, we develop an analytical expression for the total power consumption of the communication link. Although the circuit power consumption strongly depends on the bandwidth, the process technology, and circuit design, we seek to develop a simplified power consumption model to grasp the trade-offs involved. For baseline power consumption, we are assuming a bandwidth of 1.728 GHz as specified in 60 GHz standards. Implementation in a standard 65 nm CMOS process is assumed, as well as Nyquist sampling in the ADC to keep the power consumption low. Although an accurate optimal power point may vary with a specific implementation, the fundamental trade-off doesn’t change by the parameters, which will be introduced in section 4.6.1. Also, the exact point can be found by an adaptive on-line tuning procedure described in section 4.6.2.

4.4.1 Transmitter

To make a communication link power efficient, we assume that transmit power control is performed in the link. The transmit power control is essential in reducing the overall system power because the transmit power is a large fraction of the system power, and acts as interference to other users [61].

In a system with transmit power control, any degradation of \( SNR_{req} \) can be interpreted as the additional transmit power needed to maintain a target BER, \( BER_{target} \). To minimize the link power consumption, we assume a system that adjusts the transmit power for a given data rate depending on channel conditions, and interference power to achieve a \( BER_{target} \). This feature can be implemented using data fields specified in the 60 GHz standards [1].

The transmit power that needs to be transmitted, \( P_{TX} \) is related to the channel and antenna parameters as [70],

\[
P_{TX}(dBm) = P_{loss}(dB) + P_{noise}(dBm) - G_a(dB) + SNR_{req}(dB),
\]

where \( P_{noise} \) is a thermal noise floor,

\[
P_{noise} = -174dBm/Hz + 10 \log_{10}(1.728 \text{ GHz}) + NF
= -174dBm/Hz + 92.4dB + 7dB
= -74.6dBm
\]

and the antenna gain, \( G_a \) is assumed to be 3 dB and noise figure, \( NF \) to be 7 dB [43],[70]. The propagation loss, \( P_{loss} \) is known to be about 68 dB at the propagation distance of 1 m and as high as 88 dB for 10 m of distance [33].
The power consumption of the transmitter is dominated by a power amplifier (PA), and its power consumption, $P_{PA}$ can be expressed in the linear scale as,

$$
P_{PA}(mW) = \frac{1}{\eta} \cdot 10^{\frac{P_{TX}}{10}}
= \frac{1}{\eta} \cdot 10^{\frac{P_{loss} + P_{G_{A} - G_{B}}}{10}} \cdot 10^{\frac{SNR_{req}}{10}}
\triangleq \alpha_{PA} \cdot 10^{\frac{SNR_{req}}{10}} \tag{4.21}
$$

where $\eta$ represents the efficiency of the power amplifier. Therefore, the $\alpha_{PA}$ in (4.21) includes the parameters in (4.19) and the PA power efficiency. PA’s in the 60GHz band have rather low efficiency. Although there is a report of a PA with 15% power efficiency [17],[19], as a baseline, this analysis assume 10% power efficiency. Although the instantaneous efficiency of a PA varies with the signal power, the efficiency used here is a long-term efficiency, which can be maintained to be high by tuning circuit parameters such as bias point and supply voltage. Given all these baseline values and plugging in the parameters, $\alpha_{PA}$ varies from 0.27mW ($P_{loss} = 68dB$) to 27.4mW ($P_{loss} = 88dB$) depending on the $P_{loss}$.

### 4.4.2 ADC

In the range of interest (around 2 Gs/s, 1-5bits ENOB), a flash ADC topology [93] or a time-interleaved successive approximation register (SAR) ADC [4] are generally the most energy efficient. We focus on the flash ADC because the conversion latency involved in the SAR ADC can deteriorate the BER performance of the mixed-signal equalizer. Another favorable consideration for the flash ADC is that its power consumption can be scaled down with lower resolutions using clock gating. From the FOM expression of (4.1), the ADC power can be expressed as,

$$
P_{ADC} = 1.728GHz \cdot FOM \cdot 2^{ENOB}
= \alpha_{ADC} \cdot 2^{ENOB} \tag{4.22}
$$

where $\alpha_{ADC}$ is 86-345$\mu$W/level in our range of operation [54]. Notice that the $P_{ADC}$ increases exponentially with ENOB, making the power reduction of the ADC an important component of the total power optimization.

### 4.4.3 ADFE

As shown in Figure 4.4, the analog portion of the DFE is commonly implemented using differential pairs. The current source of the pair makes it easy to digitally control the coefficients of the equalizer. The power consumption of the ADFE can be treated as proportional to the number of taps, $NTAP_A$,

$$
P_{ADF} = \alpha_{ADF} \cdot NTAP_A. \tag{4.23}$$
The exact value depends on the circuit implementation. Typical values of the $\alpha_{\text{ADFE}}$ reported in the literature range from 200 $\mu$W/tap to 875 $\mu$W/tap \cite{82}. However, the actual value depends on the channel condition and number of active taps. As a baseline value for our analysis, we use 100 $\mu$W/tap obtained from our simulation in 65 nm process (Table 4.2).

The resolution of the tap can be much higher if the analog impairments such as mismatch and finite output impedance are well-controlled. Figure 6.11 shows a current source structure that can achieve very large tap resolutions \cite{46}.

The maximum number of the analog taps determines the output capacitance of the equalizer and the speed of the equalizer. By limiting the length of the analog equalizer, this output capacitance can be kept limited.

4.4.4 DDFE

The DDFE can be implemented in many ways, including the direct-form and transpose-form FIR filters. The most energy-efficient implementation of a short-wordlength, high-speed DDFE is based on look-up table (LUT). A naive implementation of a digital filter with a direct-form FIR filter would consume over 100mW when operating at 2 Gb/s. The power consumption of the digital equalizer can be further reduced using parallelization and loop-unrolling \cite{63}. In the digital implementation, the equalizer can also be made power-scalable by using clock gating. The power consumption of the digital DFE, therefore can be expressed to be proportional to the number of taps as,

$$P_{\text{DDFE}} = \alpha_{\text{DDFE}} \cdot NTAP_D,$$

where $NTAP_D$ is the number of DDFE taps. The $\alpha_{\text{DDFE}}$ is around 85 $\mu$W/tap in actual implementation \cite{63,27}.

4.5 Power Optimization

By combining the $SNR_{req}$ and the power model of the section 4.4, the total link power consumption, $P_{\text{link}}$ in different configurations can be obtained,

$$P_{\text{link}} = P_{\text{PA}} + P_{\text{ADFE}} + P_{\text{ADC}} + P_{\text{DDFE}}.$$

Power coefficients ($\alpha$) for the ADC, ADFE, DDFE, and PA listed in Table 4.2 are used as a baseline. Given the BER and performance model, the receiver configuration with different parameters such as the minimum BER, the all-digital, the all-analog, and the minimum $P_{\text{link}}$ can be compared.
CHAPTER 4. MIXED-SIGNAL POWER OPTIMIZATION OF A BASEBAND

<table>
<thead>
<tr>
<th>Power coefficients</th>
<th>Baseline values</th>
<th>Related parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\alpha_{ADC}$</td>
<td>27.4mW</td>
<td>$P_{\text{loss}} = 88$ dB</td>
</tr>
<tr>
<td></td>
<td></td>
<td>$\eta = 10%$</td>
</tr>
<tr>
<td></td>
<td></td>
<td>$NF = 7$ dB</td>
</tr>
<tr>
<td></td>
<td></td>
<td>$G_a = 3$ dB</td>
</tr>
<tr>
<td>$\alpha_{ADC}$</td>
<td>0.30mW</td>
<td>$\text{FOM} = 174$ fJ / conv.</td>
</tr>
<tr>
<td>$\alpha_{ADFE}$</td>
<td>0.10mW</td>
<td></td>
</tr>
<tr>
<td>$\alpha_{DDFE}$</td>
<td>0.08mW</td>
<td></td>
</tr>
</tbody>
</table>

Table 4.2: Baseline power coefficients

Figure 4.10: Power consumption surface of simplified channel scenario

4.5.1 Simple channel examples

To illustrate the analysis method, the $P_{\text{link}}$ was evaluated for several channel impulse responses. Figure 4.10(a) shows the case when the channel has only one propagation path. In this case, the minimal power can be achieved by just having 1-bit ADC, and without using the rest of the equalizer.

Figure 4.10(b) is the case when there are four propagation paths with equal power. The first tap is treated as the main tap and the other taps are erased by either ADFE or DDFE. The power consumption surfaces show that the minimal $P_{\text{link}}$ can be achieved with three taps in the ADFE and by quantizing the remaining signal with a 1-bit ADC regardless of the propagation loss. This confirms the intuition that eliminating the strong ISI components in the analog domain reduces the required dynamic range of the signal and reduces the ADC power consumption significantly.

Another exemplary channel impulse response with precursor ISI is shown in Figure
4.11(a). Different criteria are applied to determine the best receiver configuration and the \( P_{\text{loss}} \) has been swept to see the transient of the optimal points and \( P_{\text{link}} \). The minimal power configuration specified as "min power" in Figure 4.11(b),(c),(d) is a configuration that minimizes the \( P_{\text{link}} \), while the minimum BER criterion shown as "min BER" is a configuration that minimizes the BER. This is a configuration that activates all available receiver resources and consumes the largest power. The label "all digital variable ADC" is a minimal power setting with a variable, power-scalable ADC, but without ADFE. Similarly, "all analog" is a receiver setting for a minimum power without DDFE.

Figure 4.11(b) shows that the configuration for the best BER doesn’t overlap with the minimal power setting especially when the distance between the transceivers is short. The minimal BER setting is generally a configuration with the maximum ADC resolution and the maximum number of equalizer taps (Figure 4.11(c),(d)), which consumes relatively large power compared to \( P_{TX} \) when the communication distance decreases.
Figure 4.11(c),(d) show that the ADC and DFE sizes for the optimal points increase as the $P_{\text{loss}}$ grows. This is because, the $P_{\text{TX}}$ increases rapidly as the $P_{\text{loss}}$ goes up, which makes the ADC and DFE power consumption in the receiver relatively small. Therefore, even a small $\text{SNR}_{\text{req}}$ improvement by additional receiver resources leads to a large power saving in the transmitter and offset the additional receiver power consumption.

Most of all, the advantage of the mixed-signal optimization is pronounced in Figure 4.11(b),(c), which show that, by optimizing the receiver in the mixed-signal domain, the power consumption becomes minimum (Figure 4.11(b)) and the ADC can do with one-less effective number of bits (ENOB) in most of the $P_{\text{loss}}$ range (Figure 4.11(c)).

The ”all analog” curve doesn’t show up in Figure 4.11(b) because this configuration cannot achieve the target BER of $10^{-2}$ due to its lack of the LE and capability to equalize the precursors.

### 4.5.2 Application to the 60 GHz channels

The optimization procedure is applied to the channel impulse responses from the IEEE 802.15.3c channel models. Figure 4.12(a) shows a case when the channel has a strong LOS component, while 4.13(a) is a NLOS example. Figure 4.14 shows that the link power consumption surfaces we can get from the model and the impulse responses in different $P_{\text{loss}}$ values.

Figure 4.12 and 4.13 show the optimization results with the different criteria, which are applied to determine the best receiver configuration. The $P_{\text{loss}}$ is swept to see the change of the optimal points and the $P_{\text{link}}$. The mixed-signal minimum power configuration specified as ”mixed” in figure 4.12 and 4.13 is a configuration that minimizes the $P_{\text{link}}$, which was obtained by finding the minimum power point from a power surface as shown in Figure
4.10. The label “all digital” represents a setting without ADFE. Similarly, “all analog” is a receiver setting including a 1-bit ADC without DDFE.

The mixed-signal minimum power configuration achieves the least power consumption in both LOS and NLOS cases. If compared to the all-digital implementation, the power saving is as much as 10 dB when the distance between the transceiver is small (Figure 4.13). In addition to the power saving, the reduction of the ADC ENOB requirement can save the area and the design effort for the ADC in the system. The power saving over the all-analog implementation gets more pronounced in the NLOS case when $P_{\text{loss}}$ increases and the performance of ADFE deteriorates.

The $P_{\text{link}}$ surface is associated with the BER curves in the Figure 4.15 for the NLOS case. Each BER curve in Figure 4.15(b) corresponds to the configuration point A, B, and C in Figure 4.15(a). The operating point A and C are full digital configurations. Because the point C has one more bit of resolution in the ADC, it shows a better BER performance and thereby dissipates more power than other points as shown in Figure 4.15(a) and Figure 4.15(b). If the ADC resolution decreases by 1 bit, the BER performance degrades about 2 dB (Figure 4.15(b)). Both BER and the power consumption can be improved by adding ADFE taps and moving to the point B. The point B is the minimal power configuration achieved by the mixed-signal optimization, which needs 6-tap ADFE and 3-bit ADC.
CHAPTER 4. MIXED-SIGNAL POWER OPTIMIZATION OF A BASEBAND

(a) LOS

(b) NLOS

Figure 4.14: Power consumption surface from the power and the BER models

(a) $P_{\text{link}}$ (dBm) contour for NLOS

(b) Corresponding BER performance

Figure 4.15: Power trade-off and its implication on the BER performance ($P_{\text{loss}}=78$dB).
Figure 4.16: Sensitivity of power consumption ((a),(c),(e)) and optimum ADC bits ((b),(d),(f)) to the implementation parameters in a NLOS condition (a),(b) with varying PA efficiency, $\eta$ (c),(d) with varying ADC power coefficient, $\alpha$ (e),(f) with varying ADFE power coefficient, $\alpha_{ADFE}$. 
4.6 Real-time Search for Optimal Configuration

4.6.1 Sensitivity to the implementation parameters

While the baseline power coefficients given in Table 4.2 have been used for the analysis so far, the minimal power point shown above depends on the power coefficients which are related to process technology, circuit implementation, as well as the channel conditions. For the latter, it can be assumed that the channel impulse response can be estimated with good accuracy using the pilot pattern provided by the current 60 GHz standards.

To evaluate how sensitive the optimum point and the analog-digital trade-off to the parameters, the $P_{\text{link}}$ and the optimum ADC bits are plotted with different power coefficients. Figure 4.16 (b), (d), and (e) show that the optimal ADC bits that achieves the minimal power points slightly changes by the actual power consumption of the transmitter and the ADC, while the equalizer power consumption hardly changes the optimal point. Also, it can be observed from Figure 4.16 (a), (c), and (e) that the mixed-signal receiver achieves the minimal power consumption compared to either all-digital or all-analog receiver throughout the range.

4.6.2 Adaptive search for the optimal point

In real implementations, to reach the minimal power point regardless of the actual power consumption, an online adaptive search algorithm can be applied once the channel impulse response is estimated.

The receiver configuration for the minimal power consumption can be easily determined when the impulse response suggests an obvious solution. For example, if there is a strong dominant LOS path, the ADC can be configured to be at the minimal resolution and ADFE turned off, while the number of taps of DDFE can be minimized to reduce the power without hurting BER performance. However, if the channel turns out to be highly scattered, the large path except the main tap can be assigned to ADFE and rest of the ISI taps could be passed to DDFE with higher ADC resolution.

On the other hand, in case the shape of the impulse response has a complex pattern and the trade-off between the $P_{PA}$ and receiver power consumption is not obvious, the receiver parameters can be tuned adaptively on-line based on the measured BER or packet error rate (PER). Figure 4.17 shows a conceptual flow chart to tune the parameters.

The adjustment starts from the open-loop $P_{TX}$ control, since it is better to start from a block that occupies a large portion in the total power consumption, and the $P_{\text{link}}$ is dominated by $P_{TX}$ in most of the cases. The initial open-loop control takes place in the following order: The $P_{\text{loss}}$ is estimated by decoding the $P_{TX}$ data in the control field and estimating the actual received power. The $P_{TX}$ can be roughly adjusted considering the $P_{\text{loss}}$ feedback from the receiver. This procedure is common in most of the cellular systems. Meantime, the receiver is configured to be in the minimal BER configuration. Afterward, the $P_{\text{link}}$ can be gradually reduced by trading-off the parameters step by step while meeting the BER target,
starting from the ADC resolution, as the strongest remaining tuning knob. The partitioning between the ADFE and the DDFE can be tuned by increasing the partitioning threshold gradually and transferring more ADFE taps to DDFE. Figure 4.18 shows an example of the tap assignment, and illustrates the concept of the analog-digital partitioning threshold.

It is expected that the same methodology can be applied to other high-speed mixed signal communication systems and to determine the quantization resolution of digital signal processing systems. To implement the mixed-signal optimization, first, the ADC should be variable and power-scalable. Also, the ADFE needs to have sufficient accuracy so that the error from the ADFE is far less than the quantization noise of the ADC.

The other application in which this methodology can be applied is analog circuits with digital calibration. While a digital calibration block generally improves the analog performance, there are also costs involved in the digital calibration. Introducing an overall cost function such as the $P_{\text{link}}$ used in this analysis, the optimal partitioning between analog circuits and digital calibration might be determined.

The minimal power point shown above depends on the power coefficients which are related to process technology, circuit implementation, as well as the channel conditions. For the latter, we can assume that we can estimate the channel impulse response with good accuracy using the pilot pattern provided by most of 60 GHz standards.

Once the channel impulse response is estimated, the receiver configuration for the minimal
Figure 4.18: Example of tap assignment based on the proposed partitioning

power consumption can be easily determined when the impulse response suggests an obvious solution. For example, if there is a strong dominant LOS path, the ADC can be configured to be the minimum resolution and ADFE turned off, and the number of taps of DDFE is minimized to reduce the power without hurting BER performance. However, if the channel turned out to be highly scattered, the large path except the main tap can be assigned to ADFE and rest of the ISI taps could be passed to DDFE with higher ADC resolution.
Chapter 5

Digital Baseband Implementation

The implementation in 65 nm CMOS process of the digital equalizer and channel estimator discussed in Chapter 3 are discussed in this chapter. The focus of the implementation is to minimize the power consumption while meeting the throughput requirement of 2 Gb/s. The top block diagram of the chip is shown in Figure 5.1, which includes a transmitter, a receiver, and test blocks such as a channel emulator, a noise generator, and a BER counter.

This chapter begins with a brief review of the digital circuit power consumption and power reduction techniques described in section 5.1, followed by the detailed description of the equalizer in section 5.2, and the channel estimator in section 5.3. The chip development, test setup, and measurement results are discussed in section 5.4

5.1 Power Consumption of Digital Circuits

The power consumption of CMOS digital circuits consists of three major sources, which can be expressed as follows [16],[68]:

\[
P_{\text{total}} = P_{\text{dyn}} + P_{\text{sc}} + P_{\text{leak}}
= \alpha (C_L \cdot V \cdot V_{dd} \cdot f_{clk}) + I_{sc} \cdot V_{dd} + I_{leak} \cdot V_{dd}
= (\alpha \cdot C_L \cdot V \cdot V_{dd} + I_{\text{peak}} \cdot t_s \cdot V_{dd}) f_{clk} + I_{leak} \cdot V_{dd}.
\] (5.1)

The first term represents the dynamic power consumption that comes from charging and discharging of load capacitance, \( C_L \). Within this term, \( \alpha \) is the activity factor of the switching node, \( V \) is the voltage swing of the signal, \( V_{dd} \) is the supply voltage, and \( f_{clk} \) represents the operating frequency.

The second term is the power consumption due to short circuit current that flows during the switching transient when both PMOS and NMOS are conducting. This component is proportional to the transition time, \( t_s \), and the peak short current, \( I_{\text{peak}} \). Besides reducing \( V_{dd} \) and \( f_{clk} \), the short circuit current can be minimized by tuning the size of the device and load capacitance so that the rise and fall times are matched [68].
The last term expresses the power consumption due to the leakage current, $I_{leak}$. The sub-threshold current that flows between the drain and source of a turned-off transistor, is one source of the leakage that begins to dominate the total power consumption as the supply and threshold voltage scales down and the density of the transistor and the size of a chip grow. The other source of the leakage is the junction leakage current that flows through the reverse-biased junction between the drain/source of a device and the substrate when the device is turned off. Although there are active research efforts going on to reduce the leakage, which include the silicon on insulator (SOI) technology and tri-gate transistors, reduction of the leakage can be more effectively achieved by the device and process technology rather than the architecture and circuit improvement.

The dynamic and short-circuit power consumption shown in (5.1) can be reduced by reducing each contributing factor. For example, $\alpha$ can be reduced by representing signals in a different way that minimizes the transitions or selectively disabling parts of a system [67].
The signal swing, $V$ can be reduced by adopting the pass-transistor logic style or the LVDS technique. Also, if it is allowed to tune the $f_{\text{clk}}$ and $V_{\text{dd}}$, it is known that an optimal point that minimizes the power consumption can be achieved \[49\].

However, in practical implementations of a communication system that needs to perform complex digital signal processing (DSP), the number of options for the power reduction techniques is limited. One reason is that the static-CMOS logic style synthesized with standard cells is virtually the only logic style that is supported by the EDA tools, which we have to rely on to implement complex DSP functions with realistic design cost and time. Although some core blocks such as adders and multipliers might be implemented in a different logic style selectively, most of the system needs to be implemented in the static-CMOS. Also, $V_{\text{dd}}$ and the device threshold voltage, $V_t$ for standard cells are usually governed by considerations of the process technology, reliability, and leakage, which do not necessarily correspond to an optimal power point. This prevents an option to tune those voltages to achieve the minimal power. Lastly, the throughput is not tunable in a communication system because it is pre-defined by the symbol rate of the system, $f_{\text{sym}}$. The oversampling rate, i.e. the ratio between the sampling rate and the $f_{\text{sym}}$ is the only tuning knob that changes the operating frequency, $f_{\text{clk}}$. However, this parameter has to be determined by considering the system performance, not only by the circuit power consumption.

Given the limited options available, architectural techniques that reduce the $f_{\text{clk}}$, such as parallelization and pipelining are options that we can choose to reduce the power consumption of the 60 GHz digital baseband.

Parallelization involves implementing a function with multiple paths of a slower frequency, which effectively decreases the $f_{\text{clk}}$. The $f_{\text{clk}}$ reduction actually decreases the $V_{\text{dd}}$ requirement, and it is known that the parallelization effectively decreases the power consumption for a given throughput \[16\]. In addition, for our application where the throughput requirement far exceeds the maximum operating frequency limited by the intrinsic logic delay of the process, the parallelization is virtually the only way to meet the symbol rate requirement of the communication system. However, parallelization needs to be carefully applied, mainly because it increases the area, routing overhead of the design, and leakage power consumption. Also, a parallelized data path needs be carefully implemented when there is a feedback loop in a system.

Similarly, the pipeline architecture effectively reduces the $f_{\text{clk}}$ with additional overhead of sequential elements such as flip-flops and latches. It is also shown in \[16\] that the power reduction by $V_{\text{dd}}$ or $C_L$ reduction outweighs the overhead by additional circuitry. One problem of the pipelining is that, because it increases latency, it must be carefully applied for a DSP block with feedback, which is the case of the DFE implementation of the architecture considered in this work.


5.2 Equalizer

5.2.1 Implementation Parameters

The diagram of the reduced-complexity DFE is shown in Figure 5.1(b), which is introduced in section 3.2.3. The tap assignment of the equalizer components to the impulse response is illustrated in Figure 5.2.

The required number of equalizer taps is initially determined by the link outage probability analysis in the statistically-generated NLOS channel profiles [1]. The BER performance target is set to be $10^{-2}$ since the errors at the equalizer output are substantially corrected by error correction codes such as LDPC. The LDPC coding is a part of the proposed standards in the 60 GHz band. The channel decoder is assumed to be adaptively turned off if the channel conditions are good, such as under the LOS condition.

An outage is defined to be a case when the performance target is not achieved. In the case of the BPSK modulation under consideration, the outage occurs when the SNR of the signal after the equalizer ($SNR_{residual}$) is less than 4.2dB. Therefore, the outage probability of the BPSK signal can be expressed as,

$$P(BER > 10^{-2}) = P(SNR_{residual} < 4.2dB).$$

(5.2)

The noise term of the $SNR_{residual}$ after an ideal DFE is a summation of AWGN and the residual ISI terms that are not cancelled by the available DFE taps;

$$SNR_{residual} = \frac{|h_0|^2}{N_0 + \sum_{m \neq 0} |h_m|^2}$$

(5.3)
CHAPTER 5. DIGITAL BASEBAND IMPLEMENTATION

71

where $h_0$ represents the main tap, $m$ is a time index for excess delay and $N_0$ is the power spectral density of the noise. Figure 5.3(a) shows the outage probability calculated for 100 channel profiles generated from the NLOS statistical channel model (IEEE residential channel model, CM2.3 [1]). Figure 5.3(b) shows the outage probability of the equalizer with varying number of taps in the generated channel profiles, which shows that approximately 30-tap DFE is enough to achieve better than 10% outage probability. The outage probability decreases significantly in other usage scenarios since the CM2.3 is the worst model in terms of the ISI.

The actual number of implemented filter taps in the linear equalizer is $A = 6$, in the DFE it is $B = 24$, and in the sub-DFE it is $L = 8$ to meet the latency requirement of the feedback loop inside the equalizer as will be discussed in section 5.2.2. Floating-point link-level simulation is performed using the simulation environment described in section 3.5 to verify the parameters. Figure 5.4 shows examples of impulse responses (IR2-6, and AWGN) generated from the IEEE statistical channel model and corresponding BER performance of the equalizer with the implemented number of filter taps ($A, B, L$), which shows that the BER=$10^{-2}$ is achievable with the equalizer in the reasonable SNR range.

The digital signal wordlengths are also determined by the link-level simulation in accordance with the determined number of the filter taps. The wordlength is minimized to reduce the hardware size, power consumption as well as the LUT size needed for the DA implementation, while maintaining the fixed point loss in the BER performance to be below 1 dB. The wordlength used for the equalizer is shown in Figure 5.8. A symbol-rate sampling is employed in this work to avoid the power consumption associated with oversampling.
5.2.2 Hardware Architecture

All the equalizers are divided into four parallel data-paths to meet the throughput requirement with low power consumption. Without parallelization, the equalizer would have to be implemented with power-hungry logic styles like dynamic logic, given the high symbol rate and the CMOS process parameters we used.

The DFE is implemented as an FIR filter that calculates a convolution of estimated channel coefficients, $\hat{h}_k$ and the slicer output, $\hat{x}_k$ as follows:

$$ y_k = \sum_{m=1}^{24} \hat{h}_{L+m} \cdot \hat{x}_{k-m} \quad (q \in \mathbb{Z}). \quad (5.4) $$

Although the transposed form of an FIR filter is often preferred for high-speed, low-latency applications [59], the structure is difficult to parallelize because it needs to perform multiple multiply-and-add operations within a clock. On the other hand, the direct form FIR is parallelized by simply repeating the same structure with time-shifted inputs, which can be expressed as,

$$ y_{4q+p} = \sum_{m=1}^{24} \hat{h}_{L+m} \cdot \hat{x}_{4q+p-m} \quad (p = 0, 1, 2, 3). \quad (5.5) $$

From (5.5), it is easy to see that the filter can be parallelized by implementing it with four
identical blocks and time-shifted inputs, expressed as,

\[ y_{4q+p} = \sum_{m=1}^{6} \hat{h}_{L+m} \cdot \hat{x}_{4q+p-m} + \sum_{m=7}^{12} \hat{h}_{L+m} \cdot \hat{x}_{4q+p-m} + \sum_{m=13}^{18} \hat{h}_{L+m} \cdot \hat{x}_{4q+p-m} + \sum_{m=19}^{24} \hat{h}_{L+m} \cdot \hat{x}_{4q+p-m}. \]  

\[ (5.6) \]

For the FIR filters of the LE and main decision feedback equalizer (M-DFE), the LUT based DA architecture is chosen for each of the parallelized blocks to reduce the latency and implement the filter with very low power consumption.

In the DA architecture, intermediate results of multiply-and-add operations are pre-computed and stored in LUTs [95]. In designing the DA architecture, there is a trade-off between the memory size and latency. The pre-computation can be done during the IFS period. It is necessary only during initial setup and when there are changes in the channel condition.

The size of the LUT depends on the number of coefficients, their wordlengths and structure [76]. In this particular implementation, the emphasis is put on meeting the timing requirement to close the feedback loop while shortening the latency. Figure 5.5 illustrates the M-DFE implementing the 24-tap FIR, which uses 4 LUTs. Using only one LUT would minimize the latency of the filter, but would require a LUT with prohibitive $2^{24}$ entries with binary input of the BPSK signal. On the other hand, breaking down the LUT reduces the memory requirement while increasing the latency [76]. In this work, LUTs of four instances each with $2^{24}/4 = 64$ entries are used. Each of the parallelized paths is marked as $p = 0, 1, 2, 3$.

With the DA structure, sixteen different memory instances would be required to directly implement the filter in (5.6) with the parallelization factor of four. However, because the LUTs share the same contents, they can be implemented with four multi-ported memories. In this implementation for M-DFE, the four 26-word LUTs are instantiated with D-FFs and multiplexers (MUXs) as shown in Figure 5.5. The LE and the channel emulator also share the same architecture with 6 LUTs (6 taps, 6-bit input, $2^6 = 64$ words each) and 12-LUTs (72 taps, $2^{72}/12 = 64$ words each), respectively, implementing the following convolutions:

\[ z_k = \sum_{m=1}^{6} w_m (r_{k-m} - y_{k-m}) \] (LE)  

\[ c_k = \sum_{m=1}^{72} h_m \cdot x_{k-m}. \] (TX channel emulator)  

\[ (5.7) \]

\[ (5.8) \]

The sub-DFE (S-DFE) structure, however, has to be implemented differently because of its single-cycle feedback requirement. Therefore, the S-DFE is first combined with the slicers and loop-unrolled [85] and then implemented in a DA architecture. Although the loop-unrolling requires additional combinational logic, the 8-tap filter needs only one LUT with
Figure 5.5: M-DFE (4-way parallelized, 24 tap DA FIR with 4 LUTs), where \( \hat{x}_k \) is binary input from the slicer, \( \hat{h}_k \) is the estimated impulse response used to calculate the LUT entries.

Figure 5.6: Dynamic tap assignment scheme.
Figure 5.7: S-DFE (loop-unrolled, 8 tap DA FIR with a LUT).

256 (= 2⁸) entries because the slicer output has only two levels in a BPSK system (Figure 5.7). Figure 5.8 shows the hardware details of the equalizer with its bitwidth and pipeline register allocation, whose block diagram is illustrated in Figure 5.1(b). As shown in the figure, the feedback loop has a latency of two clock cycles, each of which comes from the logic delays from register#1 to register#2, and again from register#2 and register#1. The latency is handled by the S-DFE. MUXs are added to the delay line to enable adjustable tap allocation, which makes it possible to configure the equalizer to cancel the ISI up to 72 taps long. Figure 5.6 illustrates a structure and an example that shows this dynamic tap assignment that allocates four-tap delay line groups to the major multipath clusters with large amplitude, which can be determined by the channel estimator output.

The coefficients of the equalizer filters are calculated based on the channel estimation results. The coefficients of the feedforward linear equalizer \((w_1, \cdots, w_A)\) can be calculated on the precursor parts of the impulse response \((h_1, \cdots, h_A)\) using the MMSE criterion, which can be expressed in the frequency domain as introduced in section 3.2.2:

\[
W_n = \frac{H_n^*}{|H_n|^2 + N_0} \quad n = 1, 2, \cdots, A, \tag{5.9}
\]

where, \(H_n\) and \(W_n\) are discrete Fourier transform of \(h_n\) and \(w_n\), respectively \((H_n \xrightarrow{IDFT} h_n)\),
CHAPTER 5. DIGITAL BASEBAND IMPLEMENTATION

Figure 5.8: Equalizer block diagram with implementation details.

W_n IDFT \rightarrow W_n). The complexity of this operation is low because the number of taps in the linear equalizer is minimized to be six (A = 6) and can be reduced further depending on the channel profile. Also, this operation only needs to be performed sporadically when there is a change in the channel condition. In addition, in a real system, the calculation can be easily done with a general purpose DSP or CPU, which is common in most of communication systems to perform analog calibrations and medium access control (MAC) operations.

The exact estimation of N_0 is known to be not critical for the BER performance [94]. For the coefficients of the two DFEs, the channel estimator results, \hat{h}_m can be directly used. In the case of the M-DFE, entries of LUT0, C_{0,K} are calculated as illustrated in Figure 5.5,

\[ C_{(O,K)} = \sum_{i=1}^{6} \hat{h}_{L+i}(1 - 2b_{K,i}) \quad K = 0, 1, \ldots, 2^6 - 1, \quad (5.10) \]

where b_{K,i} is a binary number representing possible combinations of the slicer outputs that has a following relationship with an integer, K,

\[ K = \sum_{i=1}^{6} b_{K,i} \cdot 2^{i-1}. \quad (5.11) \]

The entries of other LUTs are calculated in a similar way. Although this LUT entry calculation is not implemented on-chip, this operation can be hard-wired via low-power adder trees because the operation only needs to be completed within the IFS length after the latency of the initial estimation blocks.
5.3 Channel Estimator

The IEEE WPAN standard specifies a channel estimation sequence based on Golay codes, both in a preamble (CES) and within data bursts (TS) [1]. The sequence is used to estimate the channel impulse response, which can be used to calculate the equalizer coefficients. The estimation also can be used for the synchronization of frequency and timing. The code is a binary complementary sequence consisting of \(a(i)\) and \(b(i)\) of \(N\) elements that has the following autocorrelation property [25]:

\[
\rho_a(k) + \rho_b(k) = \begin{cases} 
1 & \text{if } k = 0 \\
0 & \text{if } k \neq 0
\end{cases}
\]  

(5.12)

where

\[
\rho_a(k) = \sum_{i=0}^{N-k-1} a(i) \cdot a(i+k) \quad 0 \leq k \leq N - 1
\]  

(5.13)

\[
\rho_b(k) = \sum_{i=0}^{N-k-1} b(i) \cdot b(i+k) \quad 0 \leq k \leq N - 1.
\]

The channel, \(h_m\) can be reconstructed by the following recursive equations, which consists of shift, add, and subtract operations between two sequences,

\[
a_0(i) = \delta(i), \quad b_0(i) = \delta(i)
\]  

(5.14)

\[
a_n(i) = a_{n-1}(i - D_n) + W_n \cdot b_{n-1}(i)
\]  

(5.15)

\[
b_n(i) = b_{n-1}(i - D_n) + W_n \cdot b_{n-1}(i)
\]  

(5.16)

where \(\delta(i)\) is the Kronecker delta function, \(n\) is the iteration index \((n \in \{1, \ldots, \log_2(N)\})\), \(W_n\) are binary coefficients \((W_n \in \{1, -1\})\), and \(D_n\) is a circular delay. Since the number of
CHAPTER 5. DIGITAL BASEBAND IMPLEMENTATION

operations required for a Golay correlator is $O(N \cdot \log_2(N))$, as opposed to $O(N^2)$ in a PN correlator, it is more suitable for power constrained high-speed communication systems [65].

Figure 5.9 shows the timing diagram of the implemented channel estimator working on the CES based on a 128-symbol Golay sequence. Only the center portions of the received sequence are buffered to be correlated in order to estimate the impulse response without the influence of the ISI from the irrelevant signals. Although the data path is shared with the equalizer that is parallelized by a factor of four in this work, it is desirable to make the structure easy to reconfigure because the parallelization factor, $P$, needs to be tuned depending on the system latency and power requirement.

The estimator operation is basically addition of $a(i)$ and $b(i)$ element-by-element after delaying $a(i)$ by $D_n$. Figure 5.10 shows the parallelized datapath of the channel estimator. The delay operations required in the parallel scheme are easily implemented by adding or subtracting offsets to the read address when the delay value, $D_n$ is a multiple of the $P(mod(D_n, P) = 0)$. However, when the delay value is a fraction of the factor, the delay operation is implemented by a swap-and-partial-shift operation of the buffer. Figure 5.11 illustrates the operations to get a delay by 2 $(mod(D_n, P) = 2)$, and delay by 1 $(mod(D_n, P) = 1)$, implemented when the $P$ is four. In the figure, the leftmost column boxes show the original buffer in which data order is represented by the number inside. In a clock cycle, four of the data in a row are processed at the same time. If we want to delay the data by four, it is sufficient to increase the read pointer of the memory by one. To implement a fractional delay operation, a swap operation is performed, which swaps the gray and white portion of the buffer. After that, the partial shift operation does a rotational shift of the white portion.
of the box, which eventually moves the dark gray boxes from the top to the bottom. It can be seen that, by sequentially reading out the re-ordered buffer, the delay operation is completed. All of these operations can be implemented without any physical movement of the data but by pointer management.

The control path of the channel estimator consists of controllers, and correlator A and B. Each correlator is composed of \( P \) identical cells built out of MUXed static read access memory (SRAM) elements. In this way, a different \( P \) can be easily accommodated with slight modification of the design depending on the system requirements. The data path, control and cell structure were illustrated in Figure 5.10, 5.12, and 5.13 respectively.

Figure 5.23 shows an impulse response emulated by the channel emulator in the chip overlapped with the channel estimator output measured from the chip, which shows a proper operation of the block. Although the block consumes 33 mW at 2 Gb/s, the channel estimation is performed only for a fraction of the time during the connection with the activity factor of \( \rho \) (Figure 3.3). In the IEEE WPAN standard, the periodicity of a CES field (768 symbols) can set to be 8192 (\( \rho=9.4\% \)), 16384 (\( \rho=4.7\% \)), 32768 (\( \rho=2.3\% \)), or infinite [5]. Therefore, \( \rho \) can be significantly lower than 1\% by the MAC layer adjustment if the channel is stationary.

While the channel estimation error is not negligible, particularly in the low SNR range, it is verified through the link-level simulation that the performance degradation caused by this
error is less than 1 dB under nominal operating conditions. Figure 5.14 shows the link-level simulation results of the channel estimator in the NLOS channel condition. The MSE of the channel estimator illustrated in the right-hand side figure shows that the the error depends on the ADC quantization noise and the channel SNR. The BER curves in the left-hand side
Figure 5.14: MSE of the channel estimator and its impact on the BER performance.

The digital part of the design has been synthesized using a customized design flow [48], which integrates Mathworks® Simulink for design editing, Mentor Graphics® ModelSim for simulation, Synopsys® Design Compiler for digital synthesis. The digital backend flow was performed using Synopsys® IC Compiler. Physical implementation was completed using Cadence® Virtuoso and verified for DRC and LVS violations using Mentor Graphics® Calibre. To interface with the packaged die, a test board was built using Cadence® Design Entry CIS (for schematic) and Cadence® Allegro PCB Editor (for layout). Figure 5.15 illustrates the hierarchical color view of the chip generated by Synopsys® IC Compiler. It shows the placement and relative area of each block.

A 2mm x 2mm chip was fabricated by TSMC in 65nm CMOS. The core size is 1.53mm by 1.53mm and the chip is pad-limited with a utilization factor of 15%. The floorplan and the chip photo are shown in Figure 5.15 and 5.16. Figure 5.17 illustrates the power breakdown of the chip estimated by Synopsys® PrimeTime. Large portion of the power is consumed by non-essential functional elements such as IO pads and transmitter. Also, the channel estimator and its memory turned out to be a major power consumer, which consumes more power than the equalizer if the activity factor, \( \rho \), is 1.
Figure 5.15: Block placement.

(a) Die photo  (b) Chip bonded

Figure 5.16: Chip photo

5.4.1 Test Structure

Figure 5.18 shows the interface of the chip and Figure 5.19 illustrates the test setup for the testing. The on-chip test blocks eliminate the need for high-speed interconnections. The configuration of the chip such as an operating mode (either a data or channel estimation
mode), the delay line offset, is set by a scan chain. The filter coefficients for the equalizers and the channel emulator are initialized by a separate data bus. The chip has simple input control signals of the start and reset. Debugging pins are designed to monitor the function of the blocks inside in a low frequency clock. The pins are routed through the FPGA board and monitored using a TLA5202 logic analyzer. In the full-speed test, the BERT_done indicator is the only control signal that needs to be checked out. The resulting number of bits and errors are read from the debugging interface once a BERT is done. The pictures of the test board and environment in Berkeley Wireless Research Center (BWRC) are shown in Figure 5.20.

5.4.2 Measurement Results

Figure 5.21 shows the measured BER performance for both an AWGN and a multipath channel, verifying the correct operation. The deviation from the theoretical performance shows the effect of the error propagation in the DFE. Figure 5.22 shows the measured total power consumption with varying throughput. As shown in the figure, the post-synthesis estimated power consumption of 46.7 mW is close to the measured 60.7 mW at 2 Gb/s. The power breakdown derived from the synthesis estimates is adjusted proportionally to estimate the actual power consumption of each block. Because the channel estimator is only active during the preamble period, a duty cycle, $\rho$, scales the result.

Figure 5.23 plots the measured channel estimator output overlapped with the transmitted channel impulse response, which is programmed as filter coefficients of the channel emulator. The two curves show that the channel estimator actually estimates the channel impulse response properly as designed.
Table 5.1 summarizes the chip parameters. The power consumption is measured in a multipath propagation condition when EbNo is 4 dB. Although the equalizer power consumption depends on the channel conditions, stronger dependency on the EbNo is observed. This is because the high noise level increases the signal activity factor significantly. The
activity factor also depends on the filter coefficients setting, which was original intent of the chip aiming a power-scalable structure.

Table 5.2 compares this chip with the prior works published. For a similar throughput, this implementation is shown to implement more number of equalizer taps and linear
equalization capability with less power consumption.
Figure 5.23: Comparison between transmitted and measured channel impulse response when SNR= ∞.

<table>
<thead>
<tr>
<th>Technology</th>
<th>TSMC 65nm CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply</td>
<td>1.0V (core), 2.5V (IO)</td>
</tr>
<tr>
<td>Chip area</td>
<td>2 mm × 2 mm</td>
</tr>
<tr>
<td>Throughput</td>
<td>0.72 Gb/s (@180 MHz) - 2.8 Gb/s (@700 MHz)</td>
</tr>
<tr>
<td>Power dissipation</td>
<td>Total</td>
</tr>
<tr>
<td></td>
<td>11.1 mW (@0.72 Gb/s)</td>
</tr>
<tr>
<td></td>
<td>60.7 mW (@2.0 Gb/s)</td>
</tr>
<tr>
<td></td>
<td>183.8 mW (@2.8 Gb/s)</td>
</tr>
<tr>
<td></td>
<td>Equalizer</td>
</tr>
<tr>
<td></td>
<td>5.6 mW (@2.0 Gb/s)</td>
</tr>
<tr>
<td></td>
<td>CE</td>
</tr>
<tr>
<td></td>
<td>3.3 mW (@2.0 Gb/s, ρ =0.1)</td>
</tr>
</tbody>
</table>

Table 5.1: Chip summary

<table>
<thead>
<tr>
<th>Technology</th>
<th>[8]</th>
<th>[82]</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data rate</td>
<td>0.25μm CMOS</td>
<td>90nm CMOS</td>
<td>65nm CMOS</td>
</tr>
<tr>
<td></td>
<td>2 Gb/s</td>
<td>1 Gb/s</td>
<td>2 Gb/s</td>
</tr>
<tr>
<td>Number of taps</td>
<td>2-tap DFE</td>
<td>16-tap DFE</td>
<td>6-tap LE, 32-tap DFE</td>
</tr>
<tr>
<td>Power</td>
<td>10 mW</td>
<td>14 mW</td>
<td>5.6 mW</td>
</tr>
</tbody>
</table>

Table 5.2: Comparison to prior works
Chapter 6

Mixed-Signal Baseband Implementation

A mixed-signal baseband chip is implemented to demonstrate the validity of the optimization framework developed in Chapter 4. The chip is a mixed-signal expansion of the digital chip developed in Chapter 5.

There are several requirements for the analog circuits of the chip to demonstrate the methodology developed in Chapter 4. The first one is the power scalability of the circuits. By scaling the power, the receiver can reduce its power consumption depending on the propagation condition of the channel. The other requirement for the ADFE is that the resolution of its coefficients should be high. It is because one of the basic assumptions of the analysis in Chapter 4 is that the analog circuit does not suffer from the quantization noise of the ADC. Therefore, the analog resolution has to be at least finer than that of the ADC. The analog circuits of the chip were designed to meet these requirements.

Figure 6.1 shows the block diagram of the implemented mixed-signal chip, where the ADFE and the ADC at the receiver are added to the digital blocks designed for the full-digital chip. Compared to the digital chip, the modulation is upgraded from BPSK to QPSK, which doubles the throughput. The channel estimator is not included in this implementation because the digital implementation of the estimator has already been demonstrated and the optimization framework developed in Chapter 4 is only for the equalizer.

This chapter describes the design, circuit implementations, and measurement results of the mixed-signal chip. The circuit details, the ADFE, and the analog-digital interface designed for the ADC are presented in section 6.1. The chip implementation and measurement results are shown in section 6.2. Section 6.3 demonstrates the power reduction that can be obtained by applying the methodology of Chapter 4 and the circuits implemented.
6.1 Analog Circuit Design

Figure 6.2 shows the circuit diagram of the analog portion of the chip. The driver works as a voltage-to-current converter, which converts the voltage input of the off-chip input signal to a current signal. The ADFE adds or subtracts currents from the output of the driver by the amount programmed by a digital controller. The chip implements 6-tap ADFE, which corresponds to 6 taps of the equalizer for both I and Q branch for QPSK reception. The digital tap control signal is translated into a current signal with a current-based digital-to-analog converter (DAC). There is another DAC that generates the body-bias voltages for the comparators which are digitally controlled during the ADC calibration. This DAC is a resistive DAC contrast to the current-based DAC for the ADFE [35]. The Ser2Par converts the high-speed analog signal to a parallelized digital signal working at a digital clock frequency, which is four times slower than the analog clock. On the contrary, the Par2Ser converts the parallelized digital signal (slicer output) to the high-speed analog signal for the ADFE. There is a clock driver that regenerates the input clock signal to restore sharp clock edges and rail-to-rail swing. The driver also has a clock divider that generates the slower digital clock.
6.1.1 ADC

The main objective of the ADC implementation in this work is to design a low-power ADC that is reconfigurable while meeting the speed requirement. The flash ADC is selected for the implementation as mentioned in Chapter 4. The architectural choice came from several considerations: (1) The flash ADC is the most power-efficient structure given the range of the operation in interest (2 Gs/s, up to 4-bit resolution), (2) the latency of the flash ADC is smaller than other ADC structures; this is important in the receiver structure because the ADC is within the feedback loop of the DFE. (3) The flash ADC is easy to reconfigure without affecting its latency. Also, the power is scalable in different configurations simply by selectively turning on or off the clock signals of the comparators.

Block Diagram

The block diagram of the ADC is shown in Figure 6.3. The differential input signal is sampled by a sampler, which consists of simple CMOS switches. The constant Vgs switch [35] is not considered in this work because the nonlinearity is not a major design limitation in this design given the low ENOB requirement. A dummy switch is added to the main CMOS sampler to relieve the charge injection problem [35]. The comparator output feeds the SR latch that holds the comparator output during the reset phase. There is an encoder that converts the
ADC output to a binary signal with correction capability against the sparkling error [82],[77]. The encoder also has a bypass path that gets around the encoder for testing purpose. The metastability problem in this design is not critical since, in the range of operation, the BER of the receiver is already as low as $10^{-2}$. The ADC timing diagram is shown in Figure 6.4, which also illustrates the timing relation among the analog clock, digital clock, and ADFE
Comparator

The circuit diagram of the comparator used for the ADC is drawn in Figure 6.5. The comparator is the conventional StrongARM comparator that has a cross-coupled inverter at the output nodes and reset switches that clamp the output to the supply rail in the reset phase. A gated signal is used for its clock signal to implement the reconfigurability. Also, the initial input offset of the comparator is programmed by a MOS capacitor attached to the drain node of an input device. The capacitor slows down the discharge of a branch in the input pair during the evaluation phase and induces the input offset. However, since the value of the MOS capacitance varies in a wide range by process, voltage, and temperature (PVT) variations, the body-bias of the input devices is designed to be controlled externally [93] to fine-tune the threshold voltage of the input devices thereby adjusting the input offset of the comparator.

Input Offset Tuning

The body-bias is generated and controlled digitally through a resistive DAC shown in Figure 6.6 [93]. The input resistor ladder is eliminated in this design because the power consumption of the ladder in high-frequency operations is lower-bounded by the RC constant formed by the
ladder and the input capacitance of the comparator array. Because the operating frequency of the resistive DAC is much lower than the input signal, the resistance of the ladder can
be maximized thereby minimizing the DC current flowing through the resistor ladder. The limiting factor of the resistance for the ladder is the settling time of the bias control signal. The worst case happens when the center node is selected by the switches and thereby the equivalent resistance is maximized. The eighty instances of the resistors, including dummies, are serially connected generating $2^5$ levels of voltages. Each resistor is tuned to be $350\Omega$, which leads to $28\text{k}\Omega$ of total resistance, flowing $357\mu\text{A}$ of current from 1.0V supply. The ladder is shared among the fifteen comparators implemented and can generate thirty control signals necessary for the differential control of the input device thresholds.

While the power consumption of the resistive ladder can be minimized in this structure, a problem in terms of the implementation is that it requires large number of switches and registers that store the control values from the digital circuits. Manual layout of the DAC would be quite demanding and would consume huge design time. In this implementation, a switch is drawn to be physically compatible to the standard digital cells as shown in the right-hand side of Figure 6.6. The switches are synthesized, placed, and routed with the standard cell registers using the standard digital flow, which reduces the design effort significantly.

The top layout of an ADC is shown in Figure 6.7. While the comparators, sampler, the resistor ladder are built by manual layout, the other blocks are mostly synthesized to take best advantage of the CAD tools and to reduce the design effort and time. The design has a pair of the ADCs to support the I and Q phase for the QPSK modulation.

**Calibration**

A foreground calibration is used to tune the switching levels of the comparators in the ADC. A flow chart for the calibration is shown in Figure 6.8, where $\text{comp}$ is a comparator index, $Q$ is a comparator output. Also $\text{INN}$ represents a control word that generates the body-bias of the positive side of the input device ($\text{cal}_n$) while $\text{INP}$ shows negative side voltage ($\text{cal}_p$). To reduce the hardware complexity of the resistive DAC, one side of the control words are reduced to be 3 bits while the other side has a 5-bit control.

The effectiveness of the input offset tuning by the body-bias adjustment is shown in Figure 6.9 and 6.10. Figure 6.9 plots measured input offset transition of the fifteen comparators during the calibration procedure, which shows that the input offsets converge to the ideal equi-distant positions as the iteration of the calibration repeats. Figure 6.10 shows the histogram of the comparators input offsets after the calibration procedure.

**6.1.2 ADFE**

The circuit diagram of the ADFE is illustrated in Figure 6.11. The ADFE consists of the driver that converts the input signal voltage to a current, and the six differential pairs each of which acts as an equalizer tap. The amount of current that is added or subtracted from the input signal is determined by a current source DAC that generates the tail current of the differential pair. Each DAC is controlled by an 8-bit digital control code. Consequently,
Set input voltage

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

$\text{INP}++(\text{offset}--)$

$\text{INN}++(\text{offset}++)$

Figure 6.8: A flow chart of the ADC calibration.

Figure 6.9: ADC offset transition by control codes.
FIGURE 6.10: ADC offset histogram.

there are 48 (8 bits \times 6 taps) DAC control signal lines between the digital circuits and
the ADFE. To reduce the size ratio between the current generating NMOS devices thereby
minimizing the mismatches, the current for LSB side of the DAC is scaled down by a ratioed
current mirror [46].

Also, the data signal that feeds the input devices of the differential pairs comes from
the digital slicer output. There is a Par2Ser converter that translate the 4-way parallelized
digital signal into the serialized analog input to the ADFE. The delay assignment of those
analog taps are programmed by the digital logic in the same way that the digital equalizer
taps are assigned as described in Chapter 5.

Figure 6.12 shows the layout of the ADFE. The digital control bits are store in registers
within the ADFE, which were synthesized with the Par2Ser converter. The synthesized
blocks are connected to the driver and equalizer taps that are manually drawn.

6.2 Chip Implementation and Measurement

Figure 6.13 shows the top-level layout of the implemented chip, which has been fabricated
by STMicroelectronics in 65nm CMOS process. As shown in the layout, the major portion
of the chip is occupied by the digital circuits. The chip has been synthesized using the
digital synthesis flow described in Chapter 5, in which the analog blocks are compiled to be library cells. Those cells are placed and routed together with digital cells by the synthesis
tools. Some of the custom blocks visually stand out in the layout in Figure 6.13. The analog add-on, which includes the ADCs, the ADFE, and analog clock driver is shown in the left-hand side of the layout. The digital clock driver is added to provide another clock source of the digital circuitry for testing purpose. The actual digital clock is selected either from the divided analog clock or from the output of the digital clock driver to help the testing procedure. The ADC buffer is an 1024-words SRAM block that stores the ADC output, which is transferred through the debugging interface and is analyzed to characterize the performance of the ADC. The buffer is necessary because the speed of the IO interface is limited by the speed of the IO pads and the FPGA board used for the testing, which cannot match the high operating frequency of the ADC.

The test setup of the chip is basically the same as the setup described in Chapter 5, which utilizes the FPGA board to set the configuration parameters by a serial scan chain interface. The channel and filter coefficients are set by a 8-bit data bus also controlled by the FPGA board. The debugging signal routed through the board is monitored by a logic analyzer and a laptop. The only difference from the previous setup is the analog signal interface where the analog input signal from a sinusoidal signal generator and/or a pattern generator comes in. The test environment for the chip is shown in Figure 6.14.
6.2.1 ADC

The FFT test results with sinusoidal input signals are shown in Figure 6.15. The effect of the ADC calibration is illustrated by plotting distortion components of the ADC output in the frequency domain. Figure 6.15 shows the FFT plots before and after the calibration. The dominant distortion component is shown to be suppressed by more than 10 dB through the calibration. The performance target of the ADC could be met even with the nonlinear distortion components standing out in the FFT plots. That is because the ENOB requirement of the ADC is not stringent.

The differential non-linearity (DNL) and integral non-linearity (INL) of the ADC after the calibration are plotted in Figure 6.16(a). The plot shows that both parameters are within ±0.5 least-significant bit (LSB) and that the ADC is working properly. The SNDR of the ADC with different input signal frequencies at 1.76 GHz sampling frequency is plotted in Figure 6.16(b) before and after the calibration, which shows that the calibration improves the SNDR by around 5 dB, and that the ENOB of the ADC is more than 3 bit up to the Nyquist frequency.

6.2.2 BER Performance

The measured BER performance of the equalizer is shown in Figure 6.17. The BER performance measured in the digital loop-back configuration is the same as the performance measured in Chapter 5, which is shown in Figure 6.17(a). The measurement of the BER with the analog add-on is involved because the implemented chip doesn’t have a DAC that converts the digital output of the on-chip transmitter to an analog signal that can feed the ADFE and the ADC. A pattern generator that can produce the PRBS31 sequence is used.
CHAPTER 6. MIXED-SIGNAL BASEBAND IMPLEMENTATION

Figure 6.15: Measured ADC AC characteristics (Fs=1.76GHz).

Figure 6.16: Measured ADC performance.

as the analog transmitter to measure the BER. Figure 6.17(b) shows the BER performance that includes the ADFE, ADFE, and the digital circuits. The figure also shows the effect of the ADFE on the BER performance.

6.2.3 Power Consumption

The measured power consumption of the chip is shown in Figure 6.18. The power consumption of the analog circuits are measured with 1.1V analog supply when operating at 1.76 GHz
sampling frequency. The ADC power consumption is shown to be scalable with different bit configurations. The reason why the ADC power doesn’t scale exponentially is that there are circuits such as the resistor ladder and the analog clock driver that burn constant power
regardless of the configuration. The ADC power consumption in a low ENOB setting would be further reduced by selectively turning off the supply of the resistor ladder. The power consumption of an ADFE tap varies depending on its coefficient setting and the range of the variation is from 0 to \(600\, \mu W\).

The power consumption of the digital portion of the chip is measured at 0.8V digital supply when operating at 440MHz digital clock frequency, which corresponds to 3.52 Gb/s throughput in QPSK modulation. The breakdown of the digital power consumption is shown in Figure 6.18(b). Unlike the full-digital baseband implementation, the power consumption of the equalizer is dominated by the leakage power. It is simply because the D/FF for the LUT of the equalizer is mistakenly made of a general purpose (GP), low voltage threshold (LVT) cell by the synthesis tool. The leakage power consumption would have been 10,000 times smaller if an appropriate type of the D/FF would have been used. This problem can be easily fixed in the later implementation simply by changing a few setup parameters in the synthesis flow. Without the leakage power, the power consumption of the equalizer is around 3.9mW, which is comparable to that of the full-digital implementation shown in Chapter 5.

The key features of the chip are summarized in Figure 6.19 with a chip photo.

### 6.3 Power Reduction of the Mixed-signal Transceiver

The mixed-signal power optimization framework developed in Chapter 4 is summarized in Figure 6.20 using the power ADFE and ADC power consumption measured from the mixed-

<table>
<thead>
<tr>
<th>Technology</th>
<th>ST 65nm CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply</td>
<td>1.0V (core), 2.5V (IO)</td>
</tr>
<tr>
<td>Chip area</td>
<td>1.86 mm x 1.86 mm</td>
</tr>
<tr>
<td>Throughput</td>
<td>3.52 Gbps (QPSK)</td>
</tr>
<tr>
<td>DDFE (sw, + int.)</td>
<td>3.9 mW</td>
</tr>
<tr>
<td>ADC</td>
<td>1.2 mW~3.8 mW</td>
</tr>
<tr>
<td>ADFE</td>
<td>0~600,\mu W</td>
</tr>
<tr>
<td>Driver</td>
<td>1.3 mW</td>
</tr>
</tbody>
</table>
signal implementation and the power model developed in Chapter 4 with the link parameters assumed ($\eta=15\%, \text{NF}=7\text{dB}, G_a=3\text{dB}, P_{\text{loss}}=78\text{dB}$). Compared to the full-digital implementation of the receiver, the mixed-signal receiver saves 15% of its power consumption, which can be larger in a shorter communication distance as shown in Chapter 4 given a channel impulse response. The transceiver is also power-scalable depending on the channel condition. For example, if the channel propagation improves to be an AWGN condition, the transceiver can scale-down the power consumption and reduce the power consumption by 73%.
Chapter 7

Conclusion

7.1 Summary

In this work, the implementation of the baseband for the 60 GHz communication system, that consumes minimum power for a specified BER performance has been investigated. The research has been initiated by the fact that the power consumption is a critical limitation in a system for mobile devices, while the high data-rate wireless communication system requires complex signal processing with high operating frequency that comes with high power consumption.

The investigation spans the high level exploration of algorithms and architecture for the modulation, the equalization and the channel estimation. It has been observed that a large amount of the power can be saved by choosing power-aware architecture and algorithm and by reconfiguring the receiver according to the given channel condition and performance requirement. As a receiver meeting the requirements, the single-carrier modulation with a LE-DFE combined equalizer has been chosen that is reconfigurable based on the output from the channel estimator.

Also, it has been noticed that, while it is challenging to build a Gbps rate communication system with full-digital circuits, the easiness of design and noise immunity can justify the digital implementation of the baseband system. The power consumption has been minimized by using parallelization and the DA technique. It has been demonstrated that the implemented chip actually consumes less power than previous reports while having more complex signal processing capability.

Another aspect of the power saving has come from an observation that there is a trade-off between full-analog and full-digital baseband implementation mainly because of the power-hungry ADCs working at GHz frequency. In a communication system, things get more complicated because the optimization has to take into account the BER performance. An optimization methodology has been developed that minimizes the power consumption of the whole link including the transmitter, so that the BER performance expressed by the SNR
can be incorporated into the power optimization framework.

This analysis framework has potential to be expanded to find the optimal operation condition of a high-speed system that has performance metric. The digital calibration of the analog circuits is a good example.

7.2 Contribution

This work has investigated the ways to implement a high-speed wireless baseband with minimum power consumption while combating against the multipath interference and synchronization error. During the course, following contributions have been made:

- Detection and estimation algorithms for a high-speed digital baseband have been investigated. For detection of frequency and timing error under high-ISI environment, algorithms based on the channel estimation have been suggested and simulated. To compensate the error, a PLL structure with both frequency and phase compensation capability has been suggested.

- Architectures of a high-speed digital channel estimator have been investigated and implemented. To reduce the power consumption and to enable a digital implementation, the structure has been parallelized. Also, a buffer management scheme has been suggested so that physical movements of data have been minimized. The channel estimator has been demonstrated by a chip implementation in 65 nm CMOS process.

- A digital equalizer architecture has been developed with a design priority placed on the minimum power consumption. The low power consumption has been achieved by the use of the parallelism and distributed arithmetic computation. As a proof of the concept, a 38-tap equalizer has been implemented in 65 nm CMOS process.

- An analysis framework has been developed so that a proper partitioning can be determined between the analog and digital circuits. The analysis takes into account both BER performance and the circuit power consumption, so that the total link power consumption can be minimized while meeting a BER performance target.

- A mixed-signal equalizer has been implemented in 65 nm CMOS process to demonstrate the framework. The implementation includes 4-bit reconfigurable ADCs and a 6-tap analog equalizer in addition to a 38-tap digital equalizer for QPSK modulation. It has been shown that the chip can not only reconfigure its parameters but also scale power consumption depending on the channel conditions.
CHAPTER 7. CONCLUSION

7.3 Future Work

While the overall structure for the synchronization of the frequency offset and timing error has been investigated, the actual development has not been included in this work. The synchronization of a receiver is a critical receiver block that needs to be exploited further. This might be an interesting topic because there is also a trade-off of complexity between the synchronization and the equalization; complex and costly equalization such as FSE can relax the performance requirement of the synchronization. The analysis methodology and the reconfigurable receiver structure introduced in this work might help to determine the optimal partitioning between the equalization and synchronization, and lead to power-aware synchronization that maximizes the power efficiency.

Another topic worth further investigation is the analog implementation of LE. Although digital implementation of the LE is assumed in this work, an analog LE has potential to reduce the ADC ENOB requirement.
Bibliography


[58] K. Onodera, “Low-power techniques for high-speed wireless baseband applications,”  


oscillators for variability characterization in 45nm CMOS,” IEEE Custom Integrated  
Circuits Conference (CICC’09), Sep. 2009.


LOS/NLOS receiver in the 60 GHz band,” in Proc. Asian Solid-State Circuit Conf.  
(A-SSCC10), Nov. 2010.

[63] ——, “A 2 Gb/s 5.6 mW Digital LOS/NLOS Equalizer for the 60 GHz Band,” IEEE  


2003.

models for factory and open plan building radio communication system design,” IEEE  


Appendix A

Abbreviation

ADC  Analog-to-Digital Converter .......................................................... iii
ADFE  Analog Decision Feedback Equalizer .......................................... iii
AoA  angle-of-arrival ................................................................. 12
AWGN  additive white gaussian noise .................................................. v
BB  Baseband ............................................................ ii
BERT  bit error rate test ........................................................... 33
BER  bit-error rate ................................................................. 1
BPSK  binary phase-shift keying ...................................................... 1
BWRC  Berkeley Wireless Research Center ...................................... 83
CES  channel estimation sequence ................................................... 21
CE  Channel Estimation .............................................................. ii
CMOS  Complementary metal-oxide-semiconductor .......................... 1
CM  channel model ................................................................ vi
DAC  digital-to-analog converter ..................................................... 89
DA  distributed arithmetic .......................................................... 21
DDFE  digital decision feedback equalizer ....................................... iii
DFE  Decision Feedback Equalizer .................................................. ii
DNL  differential non-linearity .......................................................... 99
DSP  digital signal processing ......................................................... 69
DTV  digital television ................................................................. 44
EDA  electronic design automation .................................................. vii
### APPENDIX A. ABBREVIATION

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ENOB</td>
<td>effective number of bits</td>
</tr>
<tr>
<td>EQ</td>
<td>Equalization</td>
</tr>
<tr>
<td>FDE</td>
<td>frequency domain equalization</td>
</tr>
<tr>
<td>FFT</td>
<td>fast Fourier transform</td>
</tr>
<tr>
<td>FIR</td>
<td>finite impulse response</td>
</tr>
<tr>
<td>FOM</td>
<td>figure-of-merit</td>
</tr>
<tr>
<td>FSE</td>
<td>fractionally spaced equalizer</td>
</tr>
<tr>
<td>GaAs</td>
<td>gallium-arsenide</td>
</tr>
<tr>
<td>GP</td>
<td>general purpose</td>
</tr>
<tr>
<td>GSM</td>
<td>global system mobile, groupe spécial mobile</td>
</tr>
<tr>
<td>IEEE</td>
<td>institute of electrical and electronics engineers</td>
</tr>
<tr>
<td>IFFT</td>
<td>inverse fast Fourier transform</td>
</tr>
<tr>
<td>IFS</td>
<td>inter-frame spacing</td>
</tr>
<tr>
<td>INL</td>
<td>integral non-linearity</td>
</tr>
<tr>
<td>IR</td>
<td>impulse response</td>
</tr>
<tr>
<td>ISI</td>
<td>inter-symbol interference</td>
</tr>
<tr>
<td>LDPC</td>
<td>low-density parity check</td>
</tr>
<tr>
<td>LE</td>
<td>Linear Equalizer</td>
</tr>
<tr>
<td>LFSR</td>
<td>linear feedback shift register</td>
</tr>
<tr>
<td>LMS</td>
<td>least mean square</td>
</tr>
<tr>
<td>LOS</td>
<td>line-of-sight</td>
</tr>
<tr>
<td>LSB</td>
<td>least-significant bit</td>
</tr>
<tr>
<td>LTI</td>
<td>linear time invariant</td>
</tr>
<tr>
<td>LUT</td>
<td>look-up table</td>
</tr>
<tr>
<td>LVT</td>
<td>low voltage threshold</td>
</tr>
<tr>
<td>M-DFE</td>
<td>main decision feedback equalizer</td>
</tr>
<tr>
<td>MCS</td>
<td>modulation coding set</td>
</tr>
<tr>
<td>MLSD</td>
<td>maximum likelihood sequence detector</td>
</tr>
<tr>
<td>ML</td>
<td>maximum likelihood</td>
</tr>
<tr>
<td>MMSE</td>
<td>minimum mean-square error</td>
</tr>
<tr>
<td>MSE</td>
<td>minimum square error</td>
</tr>
</tbody>
</table>
### APPENDIX A. ABBREVIATION

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Definition</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>MUX</td>
<td>multiplexer</td>
<td>73</td>
</tr>
<tr>
<td>NLOS</td>
<td>non-line-of-sight</td>
<td>10</td>
</tr>
<tr>
<td>OFDM</td>
<td>orthogonal frequency division multiplex</td>
<td>18</td>
</tr>
<tr>
<td>PAR</td>
<td>peak-to-average ratio</td>
<td>19</td>
</tr>
<tr>
<td>PA</td>
<td>power amplifier</td>
<td>56</td>
</tr>
<tr>
<td>PN</td>
<td>pseudo-random</td>
<td>33</td>
</tr>
<tr>
<td>PLL</td>
<td>phase-locked loop</td>
<td>38</td>
</tr>
<tr>
<td>PVT</td>
<td>process, voltage, and temperature</td>
<td>92</td>
</tr>
<tr>
<td>QAM</td>
<td>quadrature amplitude modulation</td>
<td>20</td>
</tr>
<tr>
<td>QPSK</td>
<td>quadrature phase-shift keying</td>
<td>1</td>
</tr>
<tr>
<td>RF</td>
<td>Radio frequency</td>
<td>ii</td>
</tr>
<tr>
<td>RMS</td>
<td>root mean square</td>
<td>10</td>
</tr>
<tr>
<td>RLS</td>
<td>recursive least square</td>
<td>29</td>
</tr>
<tr>
<td>S-DFE</td>
<td>sub-DFE</td>
<td>73</td>
</tr>
<tr>
<td>SAR</td>
<td>successive approximation register</td>
<td>56</td>
</tr>
<tr>
<td>SiGe</td>
<td>silicon-germanium</td>
<td>16</td>
</tr>
<tr>
<td>SNDR</td>
<td>signal-to-noise-and-distortion ratio</td>
<td>20</td>
</tr>
<tr>
<td>SNR</td>
<td>signal-to-noise ratio</td>
<td>20</td>
</tr>
<tr>
<td>SOI</td>
<td>silicon on insulator</td>
<td>68</td>
</tr>
<tr>
<td>SQNR</td>
<td>signal-to-quantization noise ratio</td>
<td>48</td>
</tr>
<tr>
<td>SRAM</td>
<td>static read access memory</td>
<td>79</td>
</tr>
<tr>
<td>ToA</td>
<td>time-of-arrival</td>
<td>10</td>
</tr>
<tr>
<td>TS</td>
<td>training sequence</td>
<td>33</td>
</tr>
<tr>
<td>TX</td>
<td>transmitter</td>
<td></td>
</tr>
<tr>
<td>VCO</td>
<td>voltage controlled oscillator</td>
<td>38</td>
</tr>
<tr>
<td>WLAN</td>
<td>wireless local area network</td>
<td>1</td>
</tr>
<tr>
<td>WPAN</td>
<td>wireless personal area network</td>
<td>1</td>
</tr>
<tr>
<td>ZF</td>
<td>zero-forcing</td>
<td>23</td>
</tr>
</tbody>
</table>