# A Low-Power 8-PAM Serial Transceiver in 0.5-μm Digital CMOS David J. Foley and Michael P. Flynn, Senior Member, IEEE Abstract—An 8-PAM CMOS transceiver is described in this paper. Pre-emphasis is implemented without an increase in DAC resolution or digital computation. The receiver oversamples with three fully differential 3-bit ADCs. The prototype transmits at up to 1.3 Gb/s and has a measured bit error rate of less than 1 in $10^{13}$ for an 810-Mb/s pseudorandom bit sequence transmission. The device, packaged in a 68-pin ceramic leadless chip carrier, is implemented in 0.5- $\mu$ m digital CMOS, occupies 2 mm<sup>2</sup>, and dissipates 400 mW from a 3.3-V supply. Index Terms—Analog-digital conversion, calibration, clock synthesizer, pulse amplitude modulation, serial transceiver. #### I. INTRODUCTION IGH-SPEED serial transmission between ICs reduces both the number of IC pins and PCB tracks. The serial transceiver described in this paper employs multilevel signaling (8-PAM). For a given transfer rate, this 8-PAM scheme reduces the channel symbol rate to one third of that of a conventional 2-PAM transceiver [1]. This symbol rate reduction lowers both the intersymbol interference (ISI) in the channel and the maximum required on-chip clock frequency. Transmitter pre-emphasis is implemented without an increase in DAC resolution or digital computation. The receiver oversamples with three fully differential 3-bit ADCs. From the three samples, the one that is determined to be closest to the center of the data eye is selected as the recovered data. Baseline and gain compensation are performed in the receiver. The transceiver contains a 3-bit fixed pattern generator, a $2^{10}-1$ pseudorandom bit sequence (PRBS) generator, and a verifier for built-in self-test (BIST). The transceiver also incorporates a delay-locked loop (DLL)-based clock synthesizer that generates clock phases running at three times the reference frequency for the three receive ADCs. One of these clock phases also activates the transmitter. The DLL is self-correcting; it will not false lock to a multiple of clock periods. The circuit does not require the delay control voltage to be set on powerup. It can recover from missing reference clock pulses, and because the delay range is not restricted, it can accommodate a variable reference clock Manuscript received July 23, 2001; revised October 17, 2001. This work was supported by Parthus Technologies, Ireland. D. J. Foley was with Parthus Technologies, Dublin, Ireland, and the Department of Microelectronics, National University of Ireland, Cork, Ireland. He is now with Massana Design, Dublin, Ireland (e-mail: david.foley@massana.com). M. P. Flynn was with Parthus Technologies, Cork, Ireland. He is now with the Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor, MI 48109 USA. Publisher Item Identifier S 0018-9200(02)01686-4. Fig. 1. (a) Conventional $N\operatorname{-PAM}$ transmitter block diagram. (b) 2-PAM output waveforms. frequency. This allows the transceiver to be used for a wide range of data frequencies. The synthesizer has a measured rms jitter of 3 ps for a 500-MHz output frequency. The transceiver transmits at up to 1.3 Gb/s and has a measured bit error rate (BER) of $<10^{-13}$ for an 810-Mb/s PRBS transmission. The device, packaged in a 68-pin ceramic leadless chip carrier (CLCC), is implemented in 0.5- $\mu$ m digital CMOS, occupies 2 mm<sup>2</sup>, and dissipates 400 mW from a 3.3-V supply. This circuit compares favorably in terms of reduced area to [1] (1 Gb/s, 0.5- $\mu$ m CMOS, 450 mW, 4 mm<sup>2</sup>) and [2] (8 Gb/s, 0.3- $\mu$ m CMOS, 1.1 W, 4 mm<sup>2</sup>). In Section II, the transceiver architecture is discussed. The transmitter design is introduced in Section III, and this is followed by a description of the receiver design in Section IV. The performance constraints of the transceiver are analyzed in Section V. The circuit layout and measured results are introduced in Sections VI and VII, respectively. Finally, in Section VIII, the paper concludes with a summary of the achievements of this work. ## II. TRANSCEIVER ARCHITECTURE Fig. 1(a) shows a conventional 2-PAM transmitter that converts the input parallel data word to a serial output (o/p). A clock synthesizer provides evenly spaced clock phases that control the serialization. In the example shown in Fig. 1(b), a 3-bit parallel data word is serialized. The frequency of the serial data is three times that of the parallel data input. In general, for a N-bit parallel data word, the data rate at the transmitter output is N times the parallel data rate. The high data rate in the channel can lead to problems with ISI and increased electromagnetic interference (EMI) emissions. To overcome these problems, a multilevel transmitter can be employed, as shown in Fig. 2(a). The serializer is replaced by a DAC that provides an analog Fig. 2. (a) N-PAM transmitter block diagram. (b) 8-PAM output waveforms. Fig. 3. Basic multilevel transceiver block diagram. output equivalent to the digital input data. As an example, waveforms for an 8-level or 8-PAM transmitter are shown in Fig. 2(b). The transmitter output data rate is the same as the parallel data rate. Fig. 3 shows a basic block diagram of *chip A* using an *N*-bit DAC to transmit data across a channel. *Chip B* recovers this data using an *N*-bit ADC. This basic transceiver architecture functions correctly provided that the ADC can sample the received analog signal close to the center of its symbol eye [Fig. 4(a)]. In one scheme, [4], this is achieved by adaptively setting the phase of the sampler. A simple alternative to this approach is to oversample the received data by a factor of three [Fig. 4(b)]. The additional samples are processed to determine the sample that is closest to the center of the transmitted symbol. Oversampling could be performed using one ADC running at three times the channel data rate. This single ADC would have to sample at frequencies of 1 Gb/s or above. Such high sampling frequencies are not feasible on a 0.5- $\mu$ m CMOS process. A simpler alternative is to use three ADCs sampling at the data rate. The ADCs are activated in sequence by three equally spaced clock phases. In Fig. 5, a 3-bit DAC in chip A generates an 8-PAM output. Three interleaved ADCs and a data selection engine in *chip B* decode the transmitted data. Clock phases for the transmitter and receiver are derived from a low-speed system or board clock. A low-jitter DLL-based synthesizer generates clock phases (ck(3:1)) running at three times the reference frequency for the three receive ADCs. The DLL is self-correcting [3]; it does not false lock to a multiple of clock periods. This allows the transceiver to be used for a wide range of data frequencies. A similar synthesizer generates a clock for the transmitter. Fig. 4. Receive symbol waveforms. (a) Single sample per symbol period. (b) Three samples per period. Fig. 5. Two chips employing the transceiver interface. Fig. 6. Transceiver block diagram. The data selection engine selects the ADC output that is closest to the center of the data eye. Fig. 6 shows the transceiver in more detail. In addition to the transmit DAC, the transmitter incorporates a pre-emphasis block, generating pre-emphasis for both step-up and step-down code changes. Pre-emphasis compensates for the limited bandwidth of the package leads and link medium [4]. Baseline and gain compensation are performed in the receiver. The transceiver also contains a 3-bit fixed pattern generator, a 2<sup>10</sup>–1 PRBS generator, and verifier for BIST. ## III. TRANSMITTER DESIGN The transmitter is shown in more detail in Fig. 7. The output of the 3-bit current DAC, TXDAC, is converted to an 8-PAM Fig. 7. Transmitter block diagram. Fig. 8. Transmitter output waveforms. (a) Without pre-emphasis. (b) With pre-emphasis. voltage signal by the (off-chip) termination network. Pre-emphasis might entail digital preprocessing and an increase in the resolution of the DAC. This approach was judged too slow and power hungry in this application, particularly in 0.5- $\mu$ m CMOS. Instead, pre-emphasis is implemented with the help of two auxiliary DACs (NDAC and PDAC). While active, NDAC generates a current proportional to the present symbol value. Similarly, PDAC generates a current of opposite polarity, proportional to the previous symbol. In this way, a pre-emphasis pulse is generated that is proportional to the change in the transmit code, but without any digital computation. The reference current for NDAC and PDAC, as well as the pre-emphasis pulsewidth are programmable. The magnitude of the pre-emphasis current can be increased to compensate for greater attenuation over longer channels. This allows the transceiver to communicate over a range of track and cable lengths. Fig. 8(a) shows waveforms, with pre-emphasis disabled, for the transmitter output and the output of a bandwidth limited channel that is being driven by the transmitter. The channel output waveform has very rounded edges, making it difficult to correctly recover the transmitted data. Fig. 8(b) shows the pre-emphasized waveforms. The channel output more closely tracks the original transmitted pattern, making recovery more straightforward. The magnitudes of the pre-emphasis currents are optimized to minimize errors in the received data. To minimize the settling time of the output current, the three DACs (TXDAC, PDAC, and NDAC) are activated together. To achieve this, the PDAC and NDAC employ the same input data latch architecture as the TXDAC. All of these latches are triggered by the same clock. Fig. 9. Receiver ADC block diagram. #### IV. RECEIVER DESIGN A block diagram of the receiver can be seen in Fig. 6. The receiver comprises three identical 3-bit ADCs (for oversampling). A clock synthesizer provides the three clock phases for the ADCs. These clock phases $(ck\langle 3:1\rangle)$ are running at three times the reference frequency and are phase separated by one ninth of the reference clock period. A data select block selects one of the output words from the three ADCs. A PRBS verifier compares the transmitted and received data. This verifier facilitates the testing of the device in a loopback mode. Gain adjust is used during calibration to adjust the receiver for any DC attenuation in the channel or in the ADCs. Baseline adjust is also employed during calibration to correct for a common-mode offset in the received signal. The ADCs and this adjust circuitry are described in the following section. ### A. Receiver ADC and Calibration A block diagram of a receiver ADC is presented in Fig. 9. For clarity, the converter is shown with single-ended comparators, but in practice it accepts a differential input. A single resistor ladder is shared by the three flash ADCs. Each ADC consists of seven comparators, and incorporates four-level bubble suppression logic and thermometer decode logic. If the receiver and transmitter are using different reference voltages (perhaps they are on different PCBs) or if there are poly resistor process variations, then the reference voltages from the resistor ladder (ref[7:1]) will not be optimally centered and may not cover the complete input voltage range [Fig. 10(a)]. Ideally, ref1 should be 1/2 LSB below the maximum input voltage and ref7 should be 1/2 LSB above the minimum input voltage. To solve this problem, the receiver can be calibrated for a gain error or a common mode offset in the transmitted signal. A block diagram of the calibration circuitry is shown in Fig. 11. Two extra single-ended comparators (comp1 and Fig. 10. Input pulse (max to min swing) and reference voltage levels. (a) Pre-calibration. (b) Baseline calibration. (c) Gain calibration. Fig. 11. Receiver calibration block diagram. comp2) are included to allow manual calibration of vbase and Igain. During calibration, a maximum differential input voltage vin is sent to the receiver. comp1 compares vbase to the positive component of vin and comp2 compares vend to the negative component of vin. vbase is initially set low and during calibration is gradually increased, by incrementing Ibase\_int until the over control signal is deactivated by comp1. The maximum input voltage is now just below vbase [Fig. 10(b)]. On power-up Igain is deliberately set low and, (following the calibration of vbase) it is gradually increased, by incrementing Igain\_int until the under control signal is deactivated by comp2. The minimum input voltage is now just above vend [Fig. 10(c)]. The ladder reference voltages now cover the complete input voltage range and are optimally centered as shown in Fig. 10(c). # B. ADC Comparator The comparator schematic is shown in Fig. 12. This circuit is similar to that in [5] but is modified to accommodate a differential reference and a differential input. The comparator is driven by nonoverlapping clocks $\Phi 1$ and $\Phi 2$ . P-channel MOSFETs, MP1 and MP2, are added to reduce kickback onto the ladder and onto the input signal. The transistors are sized to minimize $V_T$ mismatch, removing the necessity to autozero. Fig. 12. ADC comparator schematic. Fig. 13. Clock synthesizer block diagram. #### C. Clock Synthesizer Fig. 13 shows a simplified view of the block diagram for the self-correcting DLL-based synthesizer. The nine DLL output phases, $\phi(1:9)$ , are combined in optimized AND–OR structures to generate the three clocks, $ck\langle 3:1\rangle$ . These three clocks are phase separated by one ninth of a reference clock period and have a frequency three times that of the reference clock. The clock synthesizer is described in more detail in [3]. ## D. Data Selection The receiver captures three samples during each symbol period. One of the 3-bit outputs from the three sampling ADCs is selected as the received data word. The position of the center of the symbol eye is redetermined during each symbol period. Fig. 14. Data selection. (a) Sample selection example waveforms. (b) Selection algorithm. Fig. 14(a) shows how the sample closest to the center of the eye is chosen, depending on the location of the code transition for these three samples. The three samples are first aligned in time and XOR logic is then used to determine the transition locations. For the selection algorithm to function correctly, there must be at least two valid samples per symbol period. #### V. PERFORMANCE CONSTRAINTS The data selection algorithm requires that at least two of three samples are valid, therefore the interphase spacing (one ninth of the reference clock period) must be long enough to account for the following factors: the input data (DATA) settling time and jitter, the comparator setup time, the reference clock (REFCK) jitter, and the phase jitter of the DLL synthesizer generated clock (DLLCK). The comparator setup time is the time required for the inputs to be setup before their comparison result can be latched. The maximum frequency at which the transceiver can operate is therefore dictated by this minimum interphase spacing requirement $(T_{\rm min})$ : $$T_{\min} = (DATA_{\text{settle}} + DATA_{\text{jitter}}) + COMPARATOR_{\text{setup\_time}} + REFCK_{\text{jitter}} + DLLCK_{\text{jitter}}.$$ (1) The total DLLCK jitter, including the contribution from the reference clock, was measured at 50 ps [including edge and period (deterministic) jitter]. The comparator setup time was determined, with the aid of simulations, to be at most 300 ps. Jitter on the input signal is assumed to be less than 100 ps. The DATA settling time, $DATA_{\rm settle}$ , can be defined in terms of its present Fig. 15. Transceiver die photo. value $V_{\rm out}$ , its target value $V_{\rm in}$ , the input capacitance C and termination resistance R. $$DATA_{\text{settle}} = -R \times C \times \ln\left(1 - \frac{V_{\text{out}}}{V_{\text{in}}}\right).$$ (2) For 3-bit accuracy, 50- $\Omega$ termination, and 3-pF input capacitance $$DATA_{\text{settle}} = 420 \text{ ps.}$$ (3) Putting these values into (1) $$T_{\text{min}} = (420 \text{ ps} + 100 \text{ ps}) + 300 \text{ ps} + 50 \text{ ps} = 870 \text{ ps}.$$ (4) $T_{\rm min}$ is equal to one ninth of the reference clock period. The data rate is nine times the reference clock frequency (the clock synthesizer multiplies the reference clock frequency by three and 8-PAM coding effectively multiplies the data rate by three). The maximum effective transmit data rate is therefore 1.15 Gb/s (1/870 ps). Pre-emphasis reduces the data settling time and should result in a faster transmit rate. The effective data rate of the transceiver could be improved by increasing the number of PAM levels. However, as the number of transmit levels is increased, matching between the DAC output stages becomes more critical. Increasing the transistor size [6] improves matching, but also results in increased output loading. The receiver's ability to recover multiple levels is restricted because of additive noise in the channel. Switching of the comparators is a major source of noise and results in kickback voltage on the received signal (and ladder). Decoupling capacitors on the ladder segments, minimum size ladder segments, and the extra transistors MP1 and MP2 in the comparator (Fig. 12) ensure that kickback is less than 0.2 LSB. ## VI. LAYOUT A chip photomicrograph is shown in Fig. 15. The transceiver was fabricated on a generic 0.5- $\mu$ m triple-metal single-poly digital CMOS process and has a die area of 2 mm<sup>2</sup>. Fig. 16. 1.3-Gb/s transmission eye diagram. Fig. 17. 810-Mb/s stairs pattern and recovered data. ## VII. TEST RESULTS During evaluation, the transmitter output is looped back to the receiver input over 15 cm of 50- $\Omega$ PCB track. The on-chip PRBS generator drives the transmitter for BER measurements. A delayed version of this PRBS data is compared to the recovered data at the receiver. If an error is detected then the *error* output is activated. This error signal triggers an external counter. The prototype is capable of a maximum 8-PAM data transmission rate of 1.3 Gb/s (Fig. 16). With 810-Mb/s random data, no receive errors were detected in $1 \times 10^{13}$ bits (the on-chip PRBS generator provided the random data for the transmitter). Fig. 17 shows an 8-level transmitted staircase, as well as the three received data bits. For a transmitted 8-PAM repetitive staircase pattern, with an effective data rate of 1.1 Gb/s, no receive errors were detected in $1 \times 10^{13}$ bits. Some success was achieved recovering PRBS data at 1.18 Gb/s. Fig. 18 shows the three recovered bits and the error output signal as captured on the scope. From the scope photo, one can see that while the transceiver successfully recovered many transmitted symbols, errors still exist in the recovered data (BER < 1 in 1 $\times$ 10<sup>3</sup>). An eye diagram for 900-Mb/s PRBS transmission, with pre-emphasis disabled, is shown in Fig. 19(a). The advantage of pre-emphasis can be appreciated by comparing this to the eye Fig. 18. 1.18-Gb/s recovered PRBS data and error signal. Fig. 19. 8-PAM transmission eye diagrams. (a) Without pre-emphasis. (b) With pre-emphasis. TABLE I SUMMARY OF MEASURED CHARACTERISTICS | Synthesizer clock jitter | 3.1ps (rms), 20ps (pk-pk) | |---------------------------------------|--------------------------------------| | PRBS BER @ 810Mb/s | $< 1 \text{ in } 1 \text{x} 10^{13}$ | | Periodic stairs pattern BER @ 1.1Gb/s | $< 1 \text{ in } 1 \times 10^{13}$ | | Supply | 2.8V to 3.3V | | Total Power @ 1Gb/s, 3.3V | 400mW | | TX (analog) | 120mW | | RX (analog) | 90mW | | DLL | 105mW | | Die size | 2mm <sup>2</sup> | | Process | 0.5μm digital CMOS | | Package | 68 pin Ceramic LCC | | PCB | 4-layer, FR4 substrate | diagram for 1.3-Gb/s PRBS transmission, with pre-emphasis enabled, shown in Fig. 19(b). The measured chip performance is summarized in Table I. The device consumes 400 mW from a 3.3-V supply for a 1-Gb/s transmission over 15 cm of PCB track. The DLL-based clock synthesizer has a measured rms jitter of 3.1 ps. # VIII. CONCLUSION A high-speed low-power 8-PAM transceiver is described in this paper. Transmitter pre-emphasis is implemented without an increase in DAC resolution or digital computation. The device transmits at up to 1.3 Gb/s and has a measured BER of less than 1 in $1\times 10^{13}$ for an 810-Mb/s PRBS transmission over 15 cm of PCB track. This high performance was achieved with a standard 0.5- $\mu$ m CMOS process and a low-cost package (CLCC) and board (FR4 material). The 8-PAM scheme employed in this transceiver reduces the symbol rate in the channel and therefore reduces the ISI in the channel, the EMI emissions from the channel, and the maximum required on-chip clock frequency. #### ACKNOWLEDGMENT The authors thank J. Ryan for his helpful suggestions and also acknowledge contributions from S. Mullins, P. Mullarney, B. Kinsella, A. O'Connell, P. Conlon, F. Quinlan, B. Nauta, P. Kennedy, P. C. Andresen (Nordic VLSI), and ESM. #### REFERENCES - A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan, "A 1.0625Gbps transceiver with 2x-oversampling and transmit signal pre-emphasis," in *ISSCC Dig. Tech. Papers*, Feb. 1997, pp. 238–239. - [2] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. Lee, "A 0.3 μm CMOS 8-Gb/s 4-PAM serial link transceiver," *IEEE J. Solid-State Circuits*, vol. 35, pp. 757–764, May 2000. - [3] D. Foley and M. Flynn, "CMOS DLL based 2V, 3.2ps jitter, 1GHz clock synthesizer and temperature compensated tunable oscillator," in *Proc. IEEE Custom Integrated Circuits Conf.*, May 2000, pp. 371–374. - [4] J. Everitt, J. F. Parker, P. Hurst, D. Nack, and K. R. Konda, "A CMOS transceiver for 10-Mb/s and 100-Mb/s Ethernet," *IEEE J. Solid-State Circuits*, vol. 33, pp. 2169–2177, Dec. 1998. - [5] G. Yin, F. Op't Eynde, and W. Sansen, "A high-speed CMOS comparator with 8-b resolution," *IEEE J. Solid-State Circuits*, vol. 27, pp. 208–211, Feb. 1992. - [6] M. J. Pelgrom, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, pp. 1433–1439, Oct. 1989. Ireland. **David J. Foley** received the B.Eng. degree from the National University of Ireland at Limerick in June 1988. He received the M.Eng.Sc. and Ph.D. degrees from the National University of Ireland at Cork in 1994 and 2001, respectively. He has worked in IC design with the following companies: NEC, Tamagawa, Japan, from 1988 to 1990, AT&T Bell Labs, Tokyo, Japan, from 1990 to 1992, and Parthus Technologies, Dublin, Ireland, from 1994 to 1998. He is currently designing Gigabit Ethernet transceivers with Massana Design, Dublin, Michael P. Flynn (S'92–M'95–SM'98) was born in Cork, Ireland. He received the B.E. and M.Eng.Sc. degrees from the National University of Ireland at Cork in 1988 and 1990, respectively. He received the Ph.D. degree from Carnegie Mellon University, Pittsburgh, PA, in 1995. From 1998 to 1991, he was with the National Microelectronics Research Centre, Cork. He was with National Semiconductor in Santa Clara, CA, from 1993 to 1995. From 1995 to 1997, he was a Member of Technical Staff with Texas Instruments, DSP R&D lab, Dallas, TX. During the four-year period form 1997 to 2001, he was with Parthus Technologies, Cork, where he held the position of Technical Director. During that period, he was also a part-time Lecturer at the Department of Microelectronics, National University of Ireland at Cork. He is currently with the Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor. Dr. Flynn received the 1992–1993 IEEE Solid-State Circuits Predoctoral Fellowship. He is a member of Sigma Xi.