# A 260 MHz IF Sampling Bit-Stream Processing Digital Beamformer With an Integrated Array of Continuous-Time Band-Pass $\Delta\Sigma$ Modulators

Jaehun Jeong, Nicholas Collins, Student Member, IEEE, and Michael P. Flynn, Fellow, IEEE

Abstract—We propose an ADC-digital codesign approach to IF sampling digital beamforming (DBF) that combines continuoustime bandpass  $\Delta\Sigma$  modulators (CTBPDSMs) and bit-stream processing (BSP). This approach enables power- and area-efficient DBF by removing the need for digital multipliers and multiple decimators. The prototype beamformer digitizes eight 260 MHz IF signals at 1040 MS/s with eight CTBPDSMs, and performs digital down conversion and phase shifting with only multiplexers directly on the undecimated CTBPDSM outputs. With two sets of phase shifters, the prototype simultaneously forms two independent beams. Each phase shifter is controlled by a 12 bit programmable complex weight to provide a total of 240 phase-shift steps. By constructively combining inputs from eight elements, an 8.9 dB SNDR improvement is achieved, resulting in an array SNDR of 63.3 dB over a 10 MHz bandwidth. Fabricated in 65 nm CMOS, the eight-element two-beam prototype beamformer is the first IC implementation of IF sampling DBF. It occupies 0.28 mm<sup>2</sup>, and consumes 123.7 mW.

*Index Terms*—Beamforming, bit-stream, delta-sigma, direct IF sampling, phased array.

#### I. INTRODUCTION

B EAMFORMING in receivers performs spatial filtering of incoming signals. This spatial filtering separates a desired signal from interferers from different locations. In particular, spatial filtering is useful when the interferer frequency is close to the frequency of the desired signal because frequency domain filtering is not helpful [1]. In addition, beamforming improves the SNR of the received signal by 3 dB for each, doubling the number of antenna elements. More elements give a narrower beamwidth and a larger SNDR improvement. However, power consumption, area, and routing complexity have been bottlenecks in the implementation of efficient beamforming systems.

Beamforming can be performed by introducing adjustable time delays in each antenna path. However, time delays are relatively bulky and costly [2]. Therefore, for narrowband signals, beamforming is often implemented with phase shifters, since a time delay can be approximated by a constant phase-shift over

Manuscript received August 25, 2015; revised November 03, 2015; accepted November 30, 2015. Date of publication January 14, 2016; date of current version April 28, 2016. This paper was approved by Guest Editor Salvatore Levantino. This work was supported by DARPA-ACT.

The authors are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: jaehun@umich.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2015.2506645

the bandwidth of interest. In a beamforming receiver, phase shifting can be implemented in the analog domain [3]–[9] or in the digital domain as shown in Fig. 1.

In analog beamforming (ABF), phase shifters can be implemented in the RF signal path [3]–[6] [Fig. 1(a)] or in the local oscillator (LO) path [7]-[9] [Fig. 1(b)]. Traditionally, phase shifting in the RF signal path has been the most popular. In RF-path phase shifting, multiple inputs are combined in the RF domain, and therefore the amount of subsequent hardware including down converters and ADCs is minimized. This early combination of element signals also relaxes the linearity and dynamic range requirements of the down converters and ADCs, since interferers can be suppressed before reaching these components. Especially, at high frequencies (i.e., tens of GHz), the short wavelength enables an area-efficient implementation of passive phase shifters [10]. However, RF-path phase shifting suffers from high insertion loss, limited phase-shift resolution, and component mismatch, which result in the degradation of system performance. In addition, due to the early combination, the information of each received element signal is lost before reaching the baseband digital signal processing (DSP). This limits both flexibility and the ability to form multiple simultaneous beams. In LO-path beamforming, phase shifting is implemented in the LO distribution network. Since phase shifters are not placed in the signal path, LO-path beamforming has less impact on SNR [10]. However, LO-path beamforming requires multiple analog mixers and a large LO distribution network, increasing system complexity and area.

In digital beamforming (DBF), incoming signals received at an antenna array are down-converted to baseband I/Q signals, and digitized by ADCs. By controlling the phase of each down-converted signal  $(x_k)$  at the kth element with DSP, signal paths are constructively or destructively combined. To achieve a phase-shift of  $\theta$ , the baseband I/Q signals are scaled, and combined to generate I'/Q' phase-shifted outputs as follows:

$$I' = \cos(\theta) I + \sin(\theta) Q, \tag{1}$$

$$Q' = -\sin(\theta) I + \cos(\theta) Q. \tag{2}$$

When the I/Q signals are represented as a complex signal, the above operations are equivalent to multiplication by  $e^{j\theta}$ . For this reason, this technique is often called complex weight multiplication (CWM). For a uniformly spaced eight-element linear antenna array, a complex weight of  $e^{j(k\theta)}$  adjusts the delay at the kth element, and then all signal paths are combined to create a beam  $(=\sum_{k=0}^{7} x_k e^{j(k\theta)})$ .



Fig. 1. (a) ABF in the RF signal path. (b) ABF in the LO path. (c) DBF.

Since phase shifting with CWM is performed in the digital domain, DBF achieves the highest accuracy and flexibility. In addition, multiple simultaneous beams can be formed because the digitized and down-converted *I/Q* signals for all antenna elements are available. Moreover, DSP algorithms can be easily applied in DBF for advanced functions including adaptive beamforming and array calibration. However, DBF requires multiple down converters, high-performance ADCs, and an intensive DSP unit, resulting in high power consumption and large die area. Therefore, DBF has not been attractive for low-cost on-chip implementation. Instead, DBF is largely confined to base station applications, and implemented on FPGAs [11] or in software [12].

To enable efficient implementation of DBF, we propose a new DBF architecture based on continuous-time band-pass  $\Delta\Sigma$  modulators (CTBPDSMs) and bit-stream processing (BSP). In this architecture, the 260 MHz IF signals are digitized by an array of CTBPDSMs to take advantage of direct IF sampling. By directly processing the undecimated CTBPDSM digital outputs with BSP, we implement digital down conversion (DDC) and phase shifting with only multiplexers (MUXs). Moreover, directly processing the CTBPDSM outputs avoids the need for multiple decimators for DBF. As a result, the architecture achieves power- and area-efficient IF sampling DBF.

This paper is an extension of [14], and is organized as follows. Section II presents the concept of IF Sampling DBF with CTBPDSMs and BSP. In addition, our prototype beamformer is introduced. Section III details the circuit implementation of the CTBPDSM. Section IV provides measurements of a single CTBPDSM and of the entire beamformer.

#### II. IF SAMPLING DBF WITH CTBPDSMS AND BSP

#### A. DBF With Direct IF Sampling

The concept of direct IF (or RF) sampling has arisen to enable digitally intensive receivers. By digitizing higher frequencies (i.e., IF or RF), most of the signal processing chain

 $^{1}$ Low-pass  $\Delta\Sigma$  modulators combined with variable delay lines are used for ultrasound beamforming [13]. However, the combination of CTBPDSMs and BSP has not been proposed for phase-shift beamforming.



Fig. 2. (a) IF sampling digital beamformer and (b) its MUX-based implementation.

including down conversion and filtering is carried out in the digital domain. This enables perfectly matched digital *I/Q* down conversion as well as high-performance channel-selection filtering. In addition, with a digitally intensive architecture, the receiver can be highly reconfigurable to support multiple standards, and a digital architecture benefits more from CMOS scaling. Furthermore, with direct IF sampling, the receiver is immune to flicker noise and dc offset.

CTBPDSMs [15]–[21] are capable of digitizing relatively high frequencies, and are attractive for direct sampling receivers. Compared to a discrete-time (DT)  $\Delta\Sigma$  modulator, a continuous-time (CT) modulator is more suitable for high-speed operation due to the relaxed op-amp bandwidth requirements. In addition, a CT  $\Delta\Sigma$  modulator presents a resistive input, which is relatively easy to drive in a system compared to a switched-capacitor ADC input. Furthermore, a CT modulator provides implicit antialias filtering, which relaxes the receiver front-end filtering requirements. To simplify DDC in the receiver, the sample rate of the CTBPDSM is often chosen to be four times the input IF (or RF). With this choice of frequencies, the sampled LO sequence for DDC has only three values of -1, 0, and +1 (see Section II-C1).

We implement IF sampling DBF with an array of CTBPDSMs as shown in Fig. 2(a). IF input signals are digitized by CTBPDSMs, and digitally down-converted to form baseband I/Q signals. The baseband I/Q signals are phaseshifted with CWM, and summed to create a beam. The IF sampling DBF architecture requires several digital multipliers for DDC and CWM. However, thanks to the  $\Delta\Sigma$ -modulated low-resolution CTBPDSM digital outputs, the architecture is implemented very efficiently with MUXs [Fig. 2(b)]. As we will see next, BSP allows both DDC and CWM to be implemented with simple MUXs.

# B. Bit-Stream Processing DBF With $\Delta\Sigma$ Modulator Outputs

In  $\Delta\Sigma$  modulation, the combination of oversampling and noise shaping enables a high SNR modulator output with a single-bit (or low-resolution) quantizer. Conventionally, the low-resolution digital output of the  $\Delta\Sigma$  modulator is low-pass filtered and decimated before further DSP [Fig. 3(a)]. After decimation, DSP can be performed at a lower clock rate, but the digital word width grows. In BSP, on the other hand, the



Fig. 3. (a) DSP after decimation. (b) BSP.



Fig. 4. (a) Bit-stream multiplication with a 2-to-1 MUX. (b) Five-level stream multiplication with a 5-to-1 MUX.

bit-stream modulator output is directly processed before decimation [Fig. 3(b)] to take advantage of the low word width. This approach was first proposed in [22] to realize a multiplier-less digital filter with a single-bit  $\Delta$  modulator output.

A significant advantage of BSP is that it replaces bulky multipliers with simple MUXs. MUX-based multiplication with a bit-stream is described in Fig. 4(a). The bit-stream controls a 2-to-1 MUX to multiply the input bit-stream by a multibit coefficient W, which is stored in a register. Depending on the value of the bit-stream, the 2-to-1 MUX output is selected to be either 0 or W. In this way, the 2-to-1 MUX output represents the result of multiplication of the bit-stream by W. MUX-based multiplication can be extended to a five-level stream which consists of  $\pm 2$ ,  $\pm 1$ , and 0 [23]. Compared to a bit-stream, the five-level stream contains the additional levels of -2, -1, and +2. To handle these additional levels, two trivial operations are added to the multiplexing: 1) sign inversion and 2) 1 bit left shift [shown as  $\ll 1$  in Fig. 4(b)]. When the value of the five-level stream is -1, the sign of W is inverted to implement multiplication by -1. When the value of the five-level stream is +2, Wis left-shifted by 1 bit to implement multiplication by +2. When the value of the five-level stream is -2, both sign inversion and 1 bit left shift are performed to implement multiplication by -2. In this way, a 5-to-1 MUX performs multiplication with sign inversion and 1 bit left shift as shown in Fig. 4(b). This MUX-based multiplication is particularly attractive for up to a five-level stream.<sup>2</sup> To exploit this simple MUX-based multiplication for DBF, the sample rate of the CTBPDSM is chosen to be four times the 260 MHz IF, and the CTBPDSM quantizer



Fig. 5. (a) DSP with multiple decimators. (b) BSP with a single decimator.



Fig. 6. Prototype 260 MHz IF sampling BSP digital beamformer.

resolution is chosen to be five levels. These enable a MUX-based implementation of both DDC and CWM [Fig. 2(b)], greatly reducing circuit complexity.

Another advantage of directly processing the CTBPDSM outputs in a multiple-input single-output system (e.g., beamformer) is that it reduces the number of decimators to just one. For multiple inputs and multiple  $\Delta\Sigma$  modulators in conventional DSP [Fig. 5(a)], there is a decimator for each modulator. Because of this, the cost of decimation (by M) increases linearly with the number of inputs. In BSP, on the other hand, decimation is performed only once after all the digital signal paths are combined [Fig. 5(b)]. Since decimation consumes a lot of power and requires a large area, the single decimation helps to significantly reduce the power consumption and area of the entire system.

# C. Prototype BSP Beamformer

A block diagram of the prototype 260 MHz IF eight-element two-beam BSP digital beamformer is shown in Fig. 6. Eight CTBPDSMs digitize eight 260 MHz IF input signals over a 20 MHz bandwidth to create 1040 MS/s five-level digital outputs. To facilitate MUX-based multiplication (discussed in Section II-B) in the following DDC and phase shifting stages, the sample rate of the CTBPDSM (i.e., 1040 MS/s) is chosen to be four times the 260 MHz IF, and the CTBPDSM output resolution is chosen to be five levels. The MUX-based DDC and CWM are detailed in Fig. 7(b). By exploiting MUX-based BSP on the five-level CTBPDSM digital outputs, the implementation

 $<sup>^2</sup>$ On the other hand, a seven-level stream, which consists of  $\pm 3$ ,  $\pm 2$ ,  $\pm 1$ , and 0, is less attractive because simple bit shifting alone cannot be used for multiplication by three.



Fig. 7. (a) DDC and CWM operations and their (b) MUX-based implementation.



Fig. 8. (a) DDC with a 3-to-1 MUX. (b) Multiplication in CWM with a 5-to-1 MUX.

of DDC and phase shifting operations, which normally require six multipliers and two adders [Fig. 7(a)], is achieved with eight MUXs [Fig. 7(b)]. Each phase shifter provides a total of 240 phase-shift steps through a 12 bit programmable complex weight. After phase shifting, all eight signal paths are summed to create 1040 MS/s 10 bit *I/Q* beam outputs. The beam outputs are finally decimated by four to produce 260 MS/s 13 bit *I/Q* beam outputs. The prototype generates two simultaneous beams, and each beam can be independently configured.

- 1) DDC With a 3-to-1 MUX: The CTBPDSM digital output is multiplied by I/Q LO signals  $\cos(2\pi f_{\rm IF}t)$  and  $\sin(2\pi f_{\rm IF}t)$ , for DDC to create baseband I/Q streams [Fig. 7(a)]. Since the sample rate  $(f_s = 1/Ts)$  of the CTBPDSM is four times the 260 MHz IF  $(f_{\rm IF})$ , the required I/Q LO signals for DDC  $\cos [2\pi f_{\rm IF} (nTs)]$  and  $\sin [2\pi f_{\rm IF} (nTs)]$ , are simplified to  $\cos [n\pi/2]$  and  $\sin [n\pi/2]$ , which are represented by only three values ( $\pm 1$  and 0). A 3-to-1 MUX performs multiplication by three-level LO sequence as shown in Fig. 8(a). Depending on the value of the three-level LO sequence, the five-level CTBPDSM output is passed through, zeroed, or its sign is inverted. Furthermore, since multiplication by  $\pm 1$  does not change the magnitude of the signal, the down-converted I/Q streams are still represented by five levels ( $\pm 2$ ,  $\pm 1$ , and 0). This enables to implement multiplication with a 5-to-1 MUX in the following phase shifting stage.
- 2) Phase Shifting With 5-to-1 and 2-to-1 MUXs: The fivelevel baseband I/Q streams are fed to two sets of phase shifters

(Fig. 6). To achieve a phase-shift of  $\theta$ , each baseband I/Q stream is multiplied by weighting factors ( $\cos \theta$  and  $\sin \theta$ ), and combined to create phase-shifted I'/Q' streams [Fig. 7(a]. The resolution of the weighting factor is chosen to be 6 bit to provide total 240 phase-shift steps. In our BSP implementation, the two required operations for phase shifting (i.e., multiplication and combination) are realized by 5-to-1 MUXs and 2-to-1 MUXs [Fig. 7(b)].

Fig. 8(b) shows how a 5-to-1 MUX multiplies the baseband I or Q stream by the 6 bit weighting factor with a 5-to-1 MUX. Depending on the value of the five-level I or Q stream, the 6 bit weighting factor is zeroed, 1 bit left-shifted ( $\ll$ 1), or its sign is inverted. For example, when the down converter output (X) is 2 and the 6 bit weighting factor stored in the register (W) is 27, then the weighting factor is left-shifted by 1 bit, and the resulting 7 bit output of the 5-to-1 MUX (WX) is 54.

After the down-converted I/Q streams are multiplied by the weighting factors, they are added to create phase-shifted I'/Q' streams [Fig. 7(a)]. Although addition normally requires an adder, here, because the three-level LO sequences  $\cos{[n\pi/2]}$  and  $\sin{[n\pi/2]}$ , are alternately zero, only either the I or the Q down converter output is nonzero at any time, and therefore this addition can be implemented with a 2-to-1 MUX [Fig. 7(b)]. The two 2-to-1 MUX outputs represent phase-shifted I'/Q' streams, which are the results of multiplication of the baseband I/Q streams by a 12 bit complex weight of  $e^{j\theta}$  (=  $\cos{\theta} + j\sin{\theta}$ ).

3) Summation and Decimation: Phase-shifted I'/Q' signals from eight phase shifters are summed to create a beam. Each phase shifter I'/Q' output is a 7 bit signal, and after summing eight phase shifter outputs, the resulting I or Q beam output is a 1040 MS/s 10 bit signal. This summation is performed with a conventional multibit adder.

The 1040 MS/s 10 bit *I/Q* beam outputs are finally decimated by four to produce 260 MS/s 13 bit *I/Q* beam outputs. Decimation (or down sampling) requires low-pass filtering to avoid aliasing, and the low-pass filtering can be realized by a cascaded sinc filter. For decimation by four, the sinc filter performs a moving average of four input samples. The transfer function of the sinc filter is given by

$$H_{\text{sinc}}(z) = \frac{1}{4} \sum_{0}^{3} z^{-1} = \frac{1}{4} \frac{1 - z^{-4}}{1 - z^{-1}}.$$
 (3)

To decimate the fourth-order CTBPDSM output, five  $\sin c$  filters are cascaded so that the roll-off of the cascaded filter is steeper than the slope of the shaped noise of the CTBPDSM. The transfer function of the cascade of five  $\sin c$  filters is given by

$$H_{\text{sinc}}^{5}(z) = \left(\frac{1}{4} \frac{1 - z^{-4}}{1 - z^{-1}}\right)^{5} = \frac{1}{4^{5}} \left(\frac{1}{1 - z^{-1}}\right)^{5} \left(1 - z^{-4}\right)^{5}.$$
(4)

As shown in (4), the cascaded  $\sin c$  filter can be realized by a cascade of five integrators and five differentiators. By down sampling by four before the differentiators, implementing (4) becomes more efficient, replacing  $z^{-4}$  with  $z^{-1}$  [24].



Fig. 9. (a) DSP and (b) BSP implementations of eight-element DBF with CTBPDSMs.



Fig. 10. Power and area comparison between DSP and BSP implementations.

## D. Comparison Between DSP and BSP

To demonstrate the efficiency of BSP for eight-element DBF with CTBPDSMs, a BSP implementation with a single decimator [Fig. 9(b)] is compared to a more conventional DSP implementation with multiple decimators [Fig. 9(a)]. In comparison, each implementation is synthesized with 65 nm CMOS digital standard cells, and simulated at transistor level. As shown in Fig. 10, the area of the BSP implementation is only 32% of that of the conventional DSP implementation due to simple MUX-based CWM and single decimation. Two major observations can be made regarding power consumption. 1) The power consumptions of the CWM blocks in both BSP and conventional DSP are comparable. This means that the penalty for the higher clock rate in BSP is overcome by the simplicity of multiplexing. 2) Decimation is a power hungry operation, and single decimation greatly reduces the total power consumption. Overall, the power consumption of the BSP implementation is only 36% of that of the DSP implementation.

## III. CTBPDSM

Digital beamformer requires a large number of ADCs, and therefore the power consumption and area of the ADC have a large bearing on the power consumption and area of the entire beamformer. To achieve an area-efficient implementation, the prototype fourth-order CTBPDSM, shown in Fig. 11, is based on single op-amp resonators [21] instead of bulky LC-tank resonators. The feedback structure is also modified to save power and area. Conventionally, a CTBPDSM requires a pair of feedback DACs, consisting of a return-to-zero (RZ) DAC and a



Fig. 11. Circuit implementation of the fourth-order CTBPDSM.

half-clock-delayed return-to-zero (HZ) DAC per each resonator [25]. The addition of a feedforward path allows the elimination of a feedback DAC [21]. In the prototype CTBPDSM, a single feedforward path around the second resonator removes the need for the RZ DAC to the first resonator input, achieving further power and area efficiency. Removing this DAC also has advantage of reducing noise in the modulator, since this DAC directly contributes to the input-referred noise of the modulator. The feedforward path also reduces the signal swing at the second resonator, resulting in lower power consumption and better linearity. The current through the feedforward path is combined with the output current from the second resonator, and then converted to a voltage by a transimpedance amplifier (TIA). A five-level flash quantizer digitizes this voltage at 1040 MS/s (i.e., four times of the resonator center frequency of 260 MHz). Any excessive loop delay in the feedback path is corrected by a 3 bit tunable delay shown in Fig. 11, which aligns the quantizer sampling time and the time when the DAC current is fed back to the resonator input.

# A. Single Op-Amp Resonator

A schematic of the single op-amp resonator is shown in Fig. 12, and the transfer function of the resonator  $(H_r\left(s\right))$  is expressed as

$$H_{r}(s) = \frac{I_{\text{out}}(s)}{I_{\text{in}}(s)}$$

$$= \frac{R_{1}}{R'} \frac{1 + \tau_{2}s}{1 + \tau's} \frac{\tau's}{1 + (\tau_{1} + \tau_{2} - R_{1}C_{2})s + \tau_{1}\tau_{2}s^{2}}, (5)$$

where  $\tau_1=R_1C_1$ ,  $\tau_2=R_2C_2$ , and  $\tau'=R'C'$ . To derive (5), we assume that the op-amp is ideal and the inputs are virtual grounds. In addition, the outputs of the resonator are also assumed to be connected to virtual grounds since they are connected to the inputs of the next resonator (or the TIA) in the CTBPDSM, which are virtual grounds. When  $\tau_1=\tau_2=\tau'=\tau$ , (5) is simplified as

$$H_r(s) = \frac{R_1}{R'} \frac{\omega_o s}{s^2 + (\omega_o/Q) s + \omega_o^2} , \qquad (6)$$



Fig. 12. Single op-amp resonator and consideration on two output branches.

where  $\omega_o=1/\tau$  and  $Q=\tau/\left(2\tau-R_1C_2\right)$ . Choosing  $R_1=R$ ,  $C_1=C$ ,  $R_2=R/2$ ,  $C_2=2C$ , R'=2R, C'=C/2 gives  $\tau=RC$  and  $Q=\infty$ . As a result, (6) is expressed as

$$H_r(s) = \frac{0.5 \,\omega_o s}{s^2 + \omega_o^2}.\tag{7}$$

The center frequency  $(\omega_o)$  is designed to be 260 MHz. Process variation and mismatch of resistors and capacitors can result in a center frequency shift, and a finite Q factor. To adjust the center frequency and to maximize the Q factor,  $C_1$  and  $C_2$  are implemented as tunable capacitors with a 4 bit resolution.

Although the first resonator in the prototype CTBPDSM has two output branches due to the feedforward path, the transfer function from the resonator input to each output branch is still represented by (7). When the resonator has two identical output branches as shown in Fig. 12, resistors (R') and capacitors (C') in the branches can be merged for analysis, resulting in an equivalent single branch with halved resistance and doubled capacitance. The time constant of the equivalent single branch is still R'C', which is the same as the time constant when there is no feedforward branch. With the same time constant, the transfer function of the resonator with the two identical output branches  $(H'_r(s))$  is two times of (7) because R' in (6) is replaced with 0.5 R'. As a result, the transfer function  $(H'_r(s))$  is given by

$$H'_r(s) = \frac{I'_{\text{out}}(s)}{I_{\text{in}}(s)} = \frac{\omega_o s}{s^2 + \omega_o^2}.$$
 (8)

The output current of the resonator  $(I'_{\rm out}(s))$  is equally divided to each output branch. Therefore, the transfer function from  $I_{\rm in}$  to  $I_{\rm out1}$  (or  $I_{\rm out2}$ ) is half of (8), which is the same as (7) as follows:

$$\frac{I_{\text{out1}}(s)}{I_{\text{in}}(s)} = \frac{I_{\text{out2}}(s)}{I_{\text{in}}(s)} = \frac{0.5 \,\omega_o s}{s^2 + \omega_o^2}.\tag{9}$$

#### B. Quantizer

Fig. 13(a) shows the five-level quantizer (flash ADC) which consists of four comparators and two resistor ladders. The reference voltages of the resistor ladders  $REF_P$  and  $REP_N$  are set to



Fig. 13. (a) Five-level quantizer. (b) Comparator.



Fig. 14. Unit current cell of the DAC.

be 0.9 and 0.6 V. With the double-tail dynamic comparator [26] shown in Fig. 13(b), the input devices can be sized small to minimize input capacitance while the tail current of the output latch is large for fast regeneration. Comparator offsets are calibrated by two 4 bit trim currents [27] as shown in Fig. 13(b). The comparators are followed by SR latches to hold the output for an entire clock period. The output thermometer code (i.e.,  $T_3$ ,  $T_2$ ,  $T_1$ , and  $T_0$ ) directly drives current steering DACs. A summer converts the thermometer code to a 3 bit binary value [28].

## C. Current Steering DAC

The current steering DAC consists of four unit current ( $I_{\rm LSB}$ ) cells driven by the 4 bit thermometer code from the quantizer. As shown in Fig. 14, each unit current cell is composed of current source devices ( $M_1$ ,  $M_7$ , and  $M_8$ ), cascode devices ( $M_2$ ,  $M_5$ , and  $M_6$ ), switch devices ( $M_3$  and  $M_4$ ), and a latch. The unit current ( $I_{\rm LSB}$ ) through  $M_1$  is steered to one of the DAC outputs.  $M_7$  and  $M_8$  inject a fixed current of half of the unit current to each DAC output. This injected current through  $M_7$  and  $M_8$  ensures a net dc current of zero from the DAC to the input of the resonator. The current source devices ( $M_1$ ,  $M_7$ , and  $M_8$ ) are biased with high overdrive voltages to reduce thermal



Fig. 15. Die micrograph of the 65 nm CMOS prototype beamformer.

noise. The high overdrive voltage of  $M_1$  also reduces mismatch of the unit currents, and therefore improves the linearity of the DAC. Noise and linearity are especially important for the DAC connected to the first resonator input. The cascode devices ( $M_2$ ,  $M_5$ , and  $M_6$ ) increase the output impedance of the DAC, and the linearity is improved by the increased output impedance. In addition,  $M_2$  isolates the large drain capacitance of  $M_1$  from the switch devices to achieve fast settling of the output current.

A latch with two digital inputs (D<sub>L</sub> and D<sub>H</sub>) provides complementary outputs ( $D_O$  and  $\overline{D_O}$ ) to drive the switch devices  $(M_3 \text{ and } M_4)$ . When the clock (CLK) is low,  $M_9$  and  $M_{10}$  are turned ON, and  $D_L$  and  $\overline{D_L}$  are transferred to the outputs. When the clock is high,  $M_{11}$  and  $M_{12}$  are turned ON, and  $D_H$  and  $\overline{D_H}$ are transferred to the outputs. Since one of the two digital inputs (D<sub>L</sub> and D<sub>H</sub>) and its complementary signal are transferred to the outputs depending on the clock, both RZ and HZ operations can be realized with the latch. Depending on the DAC configuration (RZ or HZ), one of the two digital inputs is connected to the thermometer code from the quantizer, and the other is tied to the supply or ground. As the switch devices are driven by the complementary outputs, the gate voltages of the switch devices ( $V_{G3}$  and  $V_{G4}$ ) cross each other at a high voltage (close to the supply voltage), so that at least one of the switch devices is always conducting current. The high-crossing gate voltage avoids a large voltage fluctuation at the drain of the cascode device  $(V_{D2})$  during switching, helping to achieve fast settling of the output current.

# IV. MEASUREMENTS

The eight-element two-beam prototype beamformer is fabricated in 65 nm CMOS (Fig. 15). The entire beamformer consumes 123.7 mW, occupies 0.28 mm<sup>2</sup>. The prototype beamformer contains eight CTBPDSMs. Each modulator consumes 13.1 mW from a 1.4 V supply, and occupies  $0.03 \, \mathrm{mm^2}$ , which is almost an order of magnitude smaller than the CTBPDSM in [21]. The outputs of the eight CTBPDSMs are fed to the Verilog-synthesized DBF core, which consumes 18.9 mW (15% of the total power consumption) from a 0.9 V supply, and occupies  $0.04 \, \mathrm{mm^2}$  (14% of the total area).

The measured power spectral density (PSD) of the CTBPDSM output is shown in Fig. 16. For 260 and 266 MHz sinusoidal inputs, the measured SNDR is 56 dB over a 20 MHz bandwidth. Fig. 17 plots the measured SNDR versus input amplitude for a 260 MHz sinusoid. From the plot, the dynamic



Fig. 16. PSD of the CTBPDSM output for (a) 260 and (b) 266 MHz inputs.



Fig. 17. SNDR versus input amplitude.



Fig. 18. Two-tone test.

range of the CTBPDSM is 57 dB. Fig. 18 shows the result of a two-tone test. The two tones are 1 MHz apart, and the measured  $IM_3$  is  $-62 \, dB$ . To access the power efficiency of the CTBPDSM, a figure-of-merit for a band-pass modulator (FoM<sub>BP</sub>), proposed in [29], is used. FoM<sub>BP</sub> is defined as

$$FoM_{BP} = \frac{Power}{2^{ENOB}2BW (1 + 6f_{IF}/f_s)}.$$
 (10)

The  $FoM_{\rm BP}$  of the prototype CTBPDSM is 0.25 pJ/conv. Fig. 19 plots the  $FoM_{\rm BP}$  versus area of CTBPDSMs fabricated in CMOS. The plot shows that the prototype CTBPDSM has good power and area efficiency.

To measure beam patterns, eight 266 MHz polyphase sinusoidal inputs are generated by eight synchronized direct digital synthesizers (DDSs) to mimic the received signals from a uniformly spaced eight-element linear antenna array with  $\lambda/2$  spacing. Eight CTBPDSMs digitize the eight 266 MHz signals at 1040 MS/s, and the CTBPDSM digital outputs are fed to the synthesized DBF core, which forms two simultaneous beams. As discussed in Section I, a set of complex weights of  $e^{j(k\theta)}$  adjusts the delay of the received and down-converted signal  $(x_k)$  at the kth element to create a beam  $(=\sum_{k=0}^{7} x_k e^{j(k\theta)})$ 



Fig. 19. FoM<sub>BP</sub> versus area of CTBPDSMs fabricated in CMOS.



Fig. 20. PSD of the beam output with constructive combination.



Fig. 21. Ideal and measured beam patterns with one main-lobe.

with one main-lobe. When the main-lobe is created, eight CTBPDSM digital outputs (having a measured SNDR of 54.4 dB on average) are constructively combined in baseband. With the constructive combination of eight element signals, the fundamental tone linearly increases by 18 dB while the channel noise is uncorrelated, resulting in an overall SNDR of 63.3 dB with an 8.9 dB improvement over a 10 MHz bandwidth as shown in Fig. 20. This 8.9 dB SNDR improvement is very close to the theoretical limit of 9 dB. Fig. 21 shows the measured single main-lobe beam patterns overlaid on ideal beam patterns. The beam pattern is plotted for an incidence angle from  $-90^{\circ}$  to  $+90^{\circ}$  and a measurement step size of 2.5°. Combining two single main-lobe responses creates a single beam with two main-lobes  $(=\sum_{k=0}^{7} x_k \left(e^{j(k\theta_1)} + e^{j(k\theta_2)}\right)/2)$  as shown in Fig. 22. This can be easily done in the digital domain by using combined complex weights of  $\left(e^{j(k\theta_1)} + e^{j(k\theta_2)}\right)/2$  instead of  $e^{j(k\theta)}$  at the cost of a 6 dB reduced array gain. The measured beam patterns with two main lobes are shown in Fig. 23. The measured beam patterns show great consistency with the ideal patterns, which is difficult to achieve in analog beamforming.

Table I summarizes the measured performance of the prototype beamformer.



Fig. 22. Creation of a single beam with two main-lobes.



Fig. 23. Ideal and measured beam patterns with two main-lobes.

### TABLE I PERFORMANCE SUMMARY

| Number of elements           |          | 8                       |       |
|------------------------------|----------|-------------------------|-------|
| Number of beams              |          | 2                       |       |
| IF frequency (MHz)           |          | 260                     |       |
| Bandwidth (MHz)              |          | 20                      |       |
| Sample rate (MS/s)           |          | 1040                    |       |
| Overall array SNDR (dB)      |          | 63.3                    |       |
| SNDR improvement (dB)        |          | 8.9 / 9 <sup>†</sup>    |       |
| Number of phase-shift steps  |          | 240                     |       |
| Technology                   |          | 65 nm CMOS              |       |
| Power (mW)                   | CTBPDSMs | $13.1 \times 8 = 104.8$ | 123.7 |
|                              | DBF core | 18.9                    |       |
| Core area (mm <sup>2</sup> ) | CTBPDSMs | $0.03 \times 8 = 0.24$  | 0.28  |
|                              | DBF core | 0.04                    |       |

<sup>†</sup>Theoretically expected.

# V. CONCLUSION

This paper presents the first IC implementation of IF sampling DBF. The unique combination of CTBPDSMs and BSP avoids high power consumption and large area, which have prevented the low-cost implementation of DBF. With an array of compact (0.03 mm²) CTBPDSMs and MUX-based DDC and phase shifting, the entire prototype beamformer occupies 0.28 mm², which is smaller than a single CTBPDSM in [16]—[19]. The power consumption per unit element of the prototype beamformer is only 6% of the FPGA implementation in [11].

#### ACKNOWLEDGEMENT

The authors would like to thank W. Chappell for helpful discussions. They would also like to thank Berkeley Design Automation for simulation software, and Analog Devices for providing DDSs.

## REFERENCES

- [1] B. D. Van Veen and K. M. Buckley, "Beamforming: A versatile approach to spatial filtering," *IEEE ASSP Mag.*, vol. 5, no. 2, pp. 4–24, Apr. 1988.
- [2] A. Natarajan, A. Komijani, and A. Hajimiri, "A fully integrated 24-GHz phased-array transmitter in CMOS," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2502–2514, Dec. 2005.
- [3] J. Paramesh, R. Bishop, K. Soumyanath, and D. J. Allstot, "A four-antenna receiver in 90-nm CMOS for beamforming and spatial diversity," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2515–2524, Dec. 2005.
- [4] K.-J. Koh and G. M. Rebeiz, "An X- and Ku-band 8-element phasedarray receiver in 0.18-μm SiGe BiCMOS technology," *IEEE J. Solid-State Circuits*, vol. 43, no. 6, pp. 1360–1371, Jun. 2008.
- [5] T. Yu and G. M. Rebeiz, "A 22–24 GHz 4-element CMOS phased array with on-chip coupling characterization," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2134–2143, Sep. 2008.
- [6] S. Lin, K. Ng, H. Wong, K. M. Luk, S. S. Wong, and A. S. Y. Poon, "A 60 GHz digitally controlled RF beamforming array in 65 nm CMOS with off-chip antennas," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, 2011, pp. 1–4.
- [7] R. Tseng, H. Li, D. H. Kwon, Y. Chiu, and A. S. Y. Poon, "A four-channel beamforming down-converter in 90-nm CMOS utilizing phase-oversampling," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2262–2272, Nov. 2010.
- [8] M. C. M. Soer, E. A. M. Klumperink, B. Nauta, and F. E. van Vliet, "Spatial interferer rejection in a four-element beamforming receiver front-end with a switched-capacitor vector modulator," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2933–2942, Dec. 2011.
- [9] M. C. M. Soer, E. Klumperink, B. Nauta, and F. van Vliet, "A 1.5-to-5.0 GHz input-matched +2 dBm P1 dB all-passive switched-capacitor beamforming receiver front-end in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2012, pp. 174–176.
- [10] A. S. Y. Poon and M. Taghivand, "Supporting and enabling circuits for antenna arrays in wireless communications," *Proc. IEEE*, vol. 100, no. 7, pp. 2207–2218, Jul. 2012.
- [11] H. Aliakbarian, V. Volski, E. van der Westhuizen, R. Wolhuter, and G. A. E. Vandenbosch, "Analogue versus digital for baseband beam steerable array used for LEO satellite applications," in *Proc. 4th Eur. Conf. Antennas Propag. (EuCAP)*, 2010, pp. 1–4.
- [12] Y. Atesal, B. Cetinoneri, K. M. Ho, and G. M. Rebeiz, "A two-channel 8–20-GHz SiGe BiCMOS receiver with selectable IFs for multibeam phased-array digital beamforming applications," *IEEE Trans. Microw. Theory Techn.*, vol. 59, no. 3, pp. 716–726, Mar. 2011.
- [13] S. R. Freeman et al., "Delta-sigma oversampled ultrasound beamformer with dynamic delays," *IEEE Trans. Ultrason., Ferroelectr., Freq. Contr.*, vol. 46, no. 2, pp. 320–332, Mar. 1999.
- [14] J. Jeong, N. Collins, and M. P. Flynn, "An IF 8-element 2-beam bit-stream band-pass beamformer," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, 2015, pp. 287–290.
- [15] N. Beilleau, H. Aboushady, F. Montaudon, and A. Cathelin, "A 1.3 V 26 mW 3.2 GS/s undersampled LC bandpass ΣΔ ADC for a SDR ISM-band receiver in 130 nm CMOS," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, 2009, pp. 383–386.
- Symp. Dig., 2009, pp. 383–386.
  [16] J. Ryckaert et al., "A 2.4 GHz low-power sixth-order RF bandpass converter in CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 11, pp. 2873–2880, Nov. 2009.
- [17] J. Ryckaert, A. Geis, L. Bos, G. Van der Plas, and J. Craninckx, "A 6.1 GS/s 52.8 mW 43 dB DR 80 MHz bandwidth 2.4 GHz RF bandpass  $\Delta\Sigma$  ADC in 40 nm CMOS," in *IEEE Radio Freq. Integr. Circuits Symp. Dig.*, 2010, pp. 443–446.
- [18] J. Harrison, M. Nesselroth, R. Mamuad, A. Behzad, A. Adams, and S. Avery, "An LC bandpass  $\Delta\Sigma$  ADC with 70 dB SNDR over 20 MHz bandwidth using CMOS DACs," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, 2012, pp. 146–148.
- [19] E. Martens *et al.*, "RF-to-baseband digitization in 40 nm CMOS with RF bandpass modulator and polyphase decimation filter," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 990–1002, Apr. 2012.
- [20] H. Chae and M. P. Flynn, "A 69 dB SNDR, 25 MHz BW, 800 MS/s continuous-time bandpass ΔΣ ADC using DAC duty cycle control for low power and reconfigurability," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, 2013, pp. 62–63.
- [21] H. Chae, J. Jeong, G. Manganaro, and M. P. Flynn, "A 12 mW low power continuous-time bandpass  $\Delta\Sigma$  modulator with 58 dB SNDR and 24 MHz bandwidth at 200 MHz IF," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 405–415, Feb. 2014.
- [22] A. Peled and B. Liu, "A new approach to the realization of nonrecursive digital filters," *IEEE Trans. Audio Electroacoust.*, vol. 21, no. 6, pp. 477– 484, Dec. 1973.

- [23] D. A. Johns and D. M. Lewis, "Design and analysis of delta-sigma based IIR filters," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 40, no. 4, pp. 233–240, Apr. 1993.
- [24] J. C. Candy, "Decimation for sigma delta modulation," *IEEE Trans. Commun.*, vol. 34, no. 1, pp. 72–76, Jan. 1986.
- [25] O. Shoaei and W. M. Snelgrove, "A multi-feedback design for LC bandpass delta-sigma modulators," in *Proc. IEEE Int. Symp. Circuits Syst.* (ISCAS), May 1995, vol. 1, pp. 171–174.
- [26] M. Miyahara, Y. Asada, D. Paik, and A. Matsuzawa, "A low-noise self-calibrating dynamic comparator for high-speed ADCs," in *Proc. IEEE Asian Solid-State Circuits Conf. (ASSCC)*, 2008, pp. 269–272.
- [27] G. Mitteregger, C. Ebner, S. Mechnig, T. Blon, C. Holuigue, and E. Romani, "A 20-mW 640-MHz CMOS continuous-time ADC with 20-MHz signal bandwidth, 80-dB dynamic range and 12-bit ENOB," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2641–2649, Dec. 2006.
- [28] C. Donovan and M. P. Flynn, "A "digital" 6-bit ADC in 0.25-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 37, no. 3, pp. 432–437, Mar. 2002.
- [29] I. Galdi, E. Bonizzoni, P. Malcovati, G. Manganaro, and F. Maloberti, "40 MHz IF 1 MHz bandwidth two-path bandpass  $\Delta\Sigma$  modulator with 72 dB DR consuming 16 mW," *IEEE J. Solid-State Circuits*, vol. 43, no. 7, pp. 1648–1656, Jul. 2008.



**Jaehun Jeong** received the B.S. degree in electrical engineering from Seoul National University, Seoul, South Korea, in 2006, and the M.S. and Ph.D. degrees in electrical engineering from the University of Michigan, Ann Arbor, MI, USA, in 2011 and 2015, respectively.

He joined Broadcom Corporation, Irvine, CA, USA, in 2015. He received a scholarship from the Korea Foundation for Advanced Studies (KFAS) in 2009



**Nicholas Collins** (S'08) received the B.S. and M.S. degrees in electrical engineering from the University of Michigan, Ann Arbor, MI, USA, in 2008 and 2010, respectively, where he is currently pursuing the Ph.D. degree focusing on SAR ADCs.

In the summers of 2008 and 2009, he worked with the Home and Portable Audio Amplifier Groups, Texas Instruments, Dallas, TX, USA. For the latter half of 2014, he worked with Isocline, Austin, TX, USA. He worked with the Automotive Amplifier Design Group, Harman International, Stamford, CT,

USA, in 2015. He joined PsiKick, Inc., Santa Clara, CA, USA, in December 2015.



Michael P. Flynn (M'95–SM'98–F'15) received the Ph.D. degree in electrical engineering from Carnegie Mellon University, Pittsburgh, PA, USA, in 1995.

From 1988 to 1991, he was with the National Microelectronics Research Centre, Cork, Ireland. He was with National Semiconductor, Santa Clara, CA, USA, from 1993 to 1995. From 1995 to 1997, he was a Member of Technical Staff with Texas Instruments, Dallas, TX, USA. During the 4-year period from 1997 to 2001, he was with Parthus Technologies, Cork, Ireland. He joined the University of Michigan,

Ann Arbor, MI, USA, in 2001, where he is currently a Professor. His research interests include RF circuits, data conversion, serial transceivers, and biomedical systems.

Dr. Flynn is the Editor-in-Chief of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC). He served as Associate Editor of the JSSC and of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. He is a former Distinguished Lecturer of the IEEE Solid-State Circuits Society. He serves on the Technical Program Committee of the European Solid State Circuits Conference (ESSCIRC) and formerly served on the Technical Program Committees of the IEEE International Solid State Circuits Conference (ISSCC), the Asian Solid-State Circuits Conference (ASSCC), and the Symposium on VLSI Circuits. He was the recipient of the 2011 Education Excellence Award and the 2010 College of Engineering Ted Kennedy Family Team Excellence Award from the College from Engineering at the University of Michigan, the 2005–2006 Outstanding Achievement Award from the Department of Electrical Engineering and Computer Science, University of Michigan, the NSF Early Career Award in 2004, the 1992–1993 IEEE Solid-State Circuits Pre-doctoral Fellowship, and the 2008 Guggenheim Fellowship.