# A 9-Gbit/s Serial Transceiver for On-Chip Global Signaling Over Lossy Transmission Lines Jun Young Park, Joshua Kang, Sunghyun Park, and Michael P. Flynn, Senior Member, IEEE Abstract—A 9-Gbit/s serial link transceiver for on-chip global signaling, and techniques for the design of on-chip transmission lines, are presented. In a prototype device, a transmitter serializes 8-b 1.125-Gbyte/s parallel data and transmits serial data over a 5.8-mm lossy on-chip transmission line. A receiver de-serializes the received data with the help of a digitally tuned interpolator. An on-chip lossy transmission line scheme is described. In the prototype, self-test circuitry verifies the recovered, de-serialized data against the original data and counts the number of discrepancies. The prototype transceiver, implemented in 0.13-μm 8-metal CMOS, achieves 9 Gbit/s with pre-defined data patterns. Index Terms—On-chip signaling, serial data communication, transceivers, transmission lines. #### I. INTRODUCTION 7 ITH the increase in clock frequencies to multi-gigahertz (GHz) rates, it has become impractical to move data across a die in a single clock cycle with conventional parallelbus-based communication. There are also reliability problems due to timing errors, skew, and jitter. Noise, coupling, and inductive effects become significant for both intermediate length and global routing. Global signaling is responsible for an everincreasing portion of total power consumption. Buses are consuming too much area, yet interconnect is reverse scaling, while the required communication bandwidth on an IC is growing exponentially [1]. Repeaters are commonly used to improve long distance signaling by breaking up long lines into shorter sections. Although this technique is very effective, the optimum number of repeaters can be large. The repeaters have an adverse effect on power consumption and chip area, and also generate significant supply noise. The performance of repeaters is also sensitive to edge rates. Furthermore, repeaters are not always optimally located; for example, it may not be possible to optimally place repeaters when routing over a critical logic block. This paper presents techniques for global serial signaling over lossy, on-chip transmission lines, as an alternative to the use of Manuscript received March 12, 2009; revised May 26, 2009. This work was supported by the National Science Foundation under Grant 0429700. First published date July 28, 2009; current version published August 21, 2009. This paper was recommended by Guest Editor S. Mirabbasi. - J. Park and S. Park were with the University of Michigan Ann Arbor, MI 48109 USA. They are now with Qualcomm, San Diego, CA 92121 USA (e-mail: junyoung@qualcomm.com; sunghyun@qualcomm.com). - J. Kang was with the University of Michigan Ann Arbor, MI 48109 USA. He is now with Marvell Semiconductor, Inc., Santa Clara, CA 95054 USA (e-mail: joshuak@marvell.com). - M. P. Flynn is with the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: mpflynn@eecs.umich.edu). Digital Object Identifier 10.1109/TCSI.2009.2027634 global parallel buses with repeaters. Much of the research in this area has been focused on the analysis and design of on-chip wires, in order to increase bandwidth for high-speed data transmission. Transmission line structures such as micro-strip and coplanar lines have become popular, since these provide large bandwidth and a well-defined current return path [1], [2]. Resistive termination provides higher bandwidth compared to conventional capacitive termination [2]-[4]. Loss in on-chip wires can be compensated with regularly spaced negative-impedanceconverters placed along a line [5]. Twisting differential wire pairs reduces cross talk to neighboring lines [4], [6], [7]. Pulsewidth modulation has been used to decrease Inter-Symbol-Interference (ISI) [2], [4]. Capacitively-driven on-chip wires have been investigated in order to further increase power efficiency [7], [8]. Some other schemes for high-speed on-chip signaling use up-conversion [9], and equalization [10], while in [11] special metal processing is used to reduce resistance. We investigate signaling over long lossy transmission lines without the use of modulation, equalization or distributed negative impedance converters. An optimum resistive termination scheme, first presented by the authors in [3], enables large bandwidth (10+ GHz) communication over lossy transmission lines (with more than 200 $\Omega$ parasitic series resistance) without ISI. An interleaved pulse-driver scheme efficiently drives the transmission line. At the receive-end, a digitally-tuned sampler samples the received signal at the optimum phase, regardless of clock or signal delays. The prototype transceiver achieves a data rate of 9 Gbit/s over a 5.8 mm on-chip transmission line with a measured BER of less than $10^{-10}$ . In the next section, a technique for the design of on-chip transmission lines is presented. Tradeoffs in the physical parameters of on-chip transmission lines, as well as termination resistance and length are considered. The overall transceiver architecture and the individual circuit blocks are described in Section III. The prototype transceiver incorporates a transmitter, and a receiver, and uses an on-chip transmission line as the data channel. Section IV presents the measured performance of the prototype and is followed by a conclusion in Section V. ## II. ON-CHIP TRANSMISSION LINE Metal resistivity in conventional CMOS poses significant challenges to implementing transmission lines for long-range (~ 10 mm), on-chip, digital communication. Series resistance causes dispersion, leading to considerable inter-symbol interference (ISI). Dispersion is caused by differing propagation velocities for the low frequency slow-wave (i.e., RC) propagation mode and the higher frequency TEM mode [12]. For a 10-mm on-chip link, the breakpoint between these two Fig. 1. Response in the case of (a) capacitive and (b) resistive termination, respectively. modes can be as high as several GHz. Others have avoided this problem by up-converting to a limited frequency band within the high frequency TEM region [13]. Significant dispersion is avoided by utilizing only a narrow band at high frequency; however, this approach adds complexity to the link and utilizes only a fraction of the potential bandwidth. Another solution is to use very thick, non-standard, metal interconnect lines. In [11], a 50-GHz bandwidth is achieved over a 20-mm link implemented as a coplanar transmission line, formed with a very thick (5 $\mu$ m) non-standard metal layer. We show that on-chip transmission lines, 10 mm long or longer, can be built with standard low-level metal layers through proper selection of characteristic impedance and related physical parameters such as width of signal lines and space between differential lines. ## A. Termination Fig. 1 shows the response of an on-chip transmission line with capacitive termination in (a) and resistive termination in (b), for an input train of high-speed random pulses. Since with capacitive termination the output DC level tends towards the input DC level, the output voltage with capacitive termination may not reach the steady-state level for high-speed inputs. On the other hand, resistive termination limits output steady-state voltage levels so that high-speed data transmission can still be achieved. Fig. 2. Step response for capacitive termination [1], [3]. Fig. 3. Step response of short and long on-chip lossy transmission lines. Fig. 4. Ideal length of transmission line where initial step rising voltage level is equal to the final steady-state DC level. The output transient response for capacitive termination of a lossy transmission line can be broken down into two different responses as shown in Fig. 2. For a step input, the output shows different effects different frequency components. High frequency components propagate along the line at the speed of light of the dielectric (wave propagation) and arrive first at the end of the line. On the other hand, low frequency components of the input show a slow *RC* effect at the output. Since the final steady-state DC voltage is same as the input DC step voltage level, the output is an initial voltage step from the wave propagation, followed by a slow *RC* settling to the final DC level. Fig. 5. Ideal transmission line lengths versus conductor width and spacing for a differential transmission line implemented in M5. ## B. Length of On-Chip Transmission Line Due to resistive losses, the response of a lossy transmission line shows both wave propagation and diffusion from the slow RC effect. Assuming that a transmission line is infinitely long, in other words that there is no reflected signal, a step input propagates as a wave along the line (high frequency TEM mode). The amplitude of this traveling wave is attenuated along the line and at position x, where r is the series resistance per unit length, and $Z_0$ is the characteristic impedance of the transmission line given by [2], [14] $$V_{step}(x) = V_{step}(0) \cdot e^{-(r/2Z_0)x}.$$ (1) Since standard low-level on-chip metal layers have large series resistance, r, the amplitude of the traveling wave on lower level metal layers decays much faster along a line than the case with higher, less resistive, metal layers. For this reason, the lowest level metal layers cannot be used to form long transmission lines. Equation (1) also implies that longer lines are possible if the characteristic impedance, $Z_0$ is larger. This equation is also valid for a transmission line terminated with the characteristic impedance of the transmission line. Fig. 3 shows the step response for two lossy transmission lines, one 2 mm and the other 20 mm long. The amplitude of the step term, due to wave propagation, decreases exponentially along the line. Because of this decay, wave propagation is not observed in very long, lossy, transmission lines, and therefore long lines show only diffusion and cannot operate as transmission lines. After RC settling, the voltage on the line tends to a value set by the parasitic series resistance of the line and the termination resistance. The steady-state voltage at the end of a transmission line of length x, for a step input, is decided by voltage division, described in (2) $$V_{\text{steady-state}}(x) = V_{\text{step}}(0) \cdot \frac{Z_0}{rx + Z_0}$$ (2) where rx is the total resistance of the transmission line. Therefore, the voltage at the end of a transmission line shows a step with an amplitude given in (1) and finally stabilizes to the voltage given in (2). There is a long RC diffusion period from the initial rising step due to wave propagation (level given in (1)), to this final steady-state voltage level [1], [3] as shown in Fig. 2 (low frequency slow-wave propagation mode). For an ideal step-input voltage to a transmission line, and assuming the source resistance is zero and that the line is terminated with $Z_0$ , there exists an ideal line length, where the voltage-step, due to wave propagation, is equal to the final steady-state value in (2), as shown in Fig. 4. Fig. 5 shows the ideal length of transmission line versus transmission line conductor width and spacing for a differential signal line implemented in M5. The strip-line transmission line is formed with two M5 conductors over a ground plane. As an example, the figure indicates an ideal transmission length of 14 mm when the differential signal lines are 8 $\mu$ m wide and separated by 8 $\mu$ m. Fig. 5 is a good starting point in the design of an on-chip transmission line. Once the length of a transmission line is decided, signal line and ground plane combinations, as well as specific physical parameters, can be chosen with the help of this figure. For example, an 8 mm transmission line formed with M5 conductors over an M2 ground plane has an 8 $\mu$ m conductor width and 1 $\mu$ m conductor spacing. However M5 conductors with same physical parameters (8 $\mu$ m width and 1 $\mu$ m spacing) over an M4 ground plane, (4 mm transmission line in Fig. 5), cannot be used for an 8 mm transmission line since the TEM component is too small when it reaches the receiver. This technique assumes ideal voltage sources, with zero output impedance. Considering non-ideal effects such as finite output impedance of the line drivers and reflection due to the impedance mismatch, the physical parameters for a longer transmission line should be used, since these non-idealities tend to reduce the length of the line over which wave propagation is apparent. Fig. 6. Measured eye-diagram for a prototype 7.2-mm link in 0.18- $\mu$ m CMOS. Fig. 7. Block diagram of the prototype serial transceiver for on-chip global signaling. Fig. 8. Block diagram of the transmitter. # C. Prototype Interconnect in 0.18-µm CMOS To verify the performance of this scheme, a differential lossy transmission line was implemented in 0.18- $\mu$ m CMOS [1], [3]. A 7.2-mm differential lossy line is implemented as two 9 $\mu$ m wide, 2.3 $\mu$ m thick, M6 signal lines, separated by 1 $\mu$ m, over a single ground (M2). EM simulation indicates a high-frequency, characteristic impedance of 28.6 $\Omega$ . Fig. 6 shows the measured eye-diagram for a 14-Gb/s 32 bit PRBS signal. The open eye indicates minimal dispersion and ISI. ## III. TRANSCEIVER FOR ON-CHIP GLOBAL SIGNALING A serial link makes data transmission possible with only one line or two lines (in case of differential signaling). Assuming that the link replaces an 8-bit wide bus, the signaling rate on a serial link should be 8 times higher than that of the original parallel data bus. A significant challenge in implementing a serial link is the requirement of a high clock rate. For this 8-bit serialization, we need a clock rate eight times faster than the original parallel-bus clock rate for serialization. An even faster clock rate might be required to oversample data in the receiver. An interleaved implementation employing two or more transmit and receive blocks allows us to instead use multiple phases of a slower clock. In this prototype, two interleaved transmit blocks working off 4.5-GHz clock phases generate the 9-Gbit/s transmit signal. The sampling blocks in the receiver also run off phases of a 4.5-GHz clock. Fig. 7 shows a block diagram of the transceiver. The prototype device consists of a clock-generator-PLL, a transmitter, a receiver, and self-test error checking logic. In the prototype an LC-oscillator-based PLL generates a 4.5-GHz differential clock signal, which is routed to both the transmitter and the receiver. The transmitter serializes 8-b wide 1.125-Gbyte/s parallel data and drives the serialized data down the lossy on-chip transmission line. Optimum resistive termination at the receiver-end minimizes dispersion and ISI [3]. A phase-tuned receiver samples and de-serializes the received signal. Since the sampling instant is tuned to match the received signal-eye, there is no requirement to match clock and signal routing or clock and data signal delays. A self-test error-checking block verifies the recovered data from the receiver against the original data and counts the number of discrepancies. Fig. 9. Schematic of the data serializer. Fig. 10. Schematic of the interleaved line drivers and pre-drivers. Fig. 11. Example waveforms for the line driver and predriver; input data $DS_{-}1$ , sets up voltages at internal nodes, xnI and xpI. and while clock signal CK is high, short pulses are generated at cnI and cpI depending on the voltages at xnI and xpI. ## A. Transmitter Fig. 8 shows a block diagram of the transmitter. Along with the data serializer and transmission-line drivers, the transmitter employs a clock divider to distibute low frequency clocks at 2.25- and 1.125-GHz. 8-bit 1.125-GHz data from the self-test logic is serialized to a 2-bit 4.5-GHz differential data signal by the serializer [15]. Two interleaved drivers each running at 4.5 GHz perform the final serialization to 9 Gbit/s. Two Fig. 12. Block diagram of the receiver. Fig. 13. Schematic of the RC-CR and phase interpolator. Fig. 14. Schematic of the comparator. Fig. 15. Block diagram of the PLL and clock distribution. 4.5-GHz clock phases are used in the line driver. The phase difference between these two 4.5-GHz differential clocks is equivalent to the period of the 9-Gbit/s data. A low-output-resistance 9-GHz line driver is impractical in 0.13- $\mu$ m CMOS technology, and so instead an interleaved transmitter architecture is adopted. Two interleaved parallel streams of 4-b 1.125-Gbit/s data are serialized to 4.5 Gbit/s. The final 9-Gbit/s serialization is achieved using a pair of interleaved line Fig. 16. Buffer stage for divide by 32 and TX. drivers. Fig. 9 shows one of the two identical 4-bit serializers in detail. In order to properly serialize the data, the original parallel data ( $D1\sim D8$ ) at 1.125 GHz are distributed in the bit sequence D1, D3, D5, and D7 to one module, and D2, D4, D6, and D8 to the other identical module. The serialized outputs are buffered in order to drive the next stage. The final 9 Gbit/s serialization is achieved using interleaved 4.5 GHz line drivers formed by *M1*, *M2*, *M3*, and *M4* as shown in Fig. 10. The two interleaved 4.5-GHz 1-b data patterns drive *DS\_1* and *DS\_2*. Only one driver is active during each half cycle of the 4.5-GHz clock, facilitating 9 Gbit/s serialization at the node *OUT*. Separate dynamic pre-drivers (*M5~M8*, *M9~M12*, *M13~M16*, and *M17~M20*) drive the nMOS and pMOS line driver devices, with the help of differential 4.5 GHz clock phases, *CK* and *CKB*. Data arrives at the pre-drivers ahead of the clock signals so that the internal pre-driver nodes (*xp1*, *xn1*, Fig. 17. Block diagram of the self-test error-check block. Fig. 18. Error counter. xp2, and xn2) are pre-evaluated. During the half clock period while CK and CKB are high the driver signals (cp1, cn1, cp2, and cn2) are generated activating only one of the four line driver devices (i.e., one of M1, M2, M3, or M4). As an example in Fig. 11, if $DS_1$ is one during CK high, setting both xp1 and xn1 to zero, then only the cp1 is activated. cp1 goes low on the rising edge of CK and returns high on the falling edge of CK. On the other hand the cn1 node remains at zero during this time. # B. Receiver Fig. 12 shows an overall block diagram of the receiver. It consists of RC-CR filter and phase interpolator for phase tuning, interleaved comparators as data samplers, and FIFO for data deserialization. The receiver samples the received signal with interleaved samplers operating at 4.5 GHz (the same frequency as the transmitter). The sampled data is de-serialized and downsampled to 1.125 GHz. To compensate for signal delay over the long transmission-line, the phase of the sample clock signals is tuned for proper data recovery. With appropriate phase control, we can employ only two comparators sampling at 4.5 GHz. The comparator clock phases are adjusted to sample the input data at the center of data eye. An RC-CR filter and a phase interpolator block, shown in Fig. 13, allow control of the sampling phase. The differential outputs of the PLL, PLL\_out+ and PLL\_out-, are the inputs to a pair of RC-CR filters, which generate four equally-spaced 4.5-GHz clock phases. The digitally-phase-controlled interpolator takes these four clock signals and generates four finely phase-controlled clock signals for the two sampling comparators. Although the two outputs of an RC-CR filter have a 90-deg phase difference at all frequencies, the magnitudes of these two signals are only the same at 1/(RC). Therefore, gain stages follow the RC-CR filters to compensate for amplitude mismatches [16]. The resistance, R and capacitance, C should be selected carefully in order to ensure that the 1/(RC) frequency is close to the operating frequency, since amplitude mismatch between RC-CR filter outputs can cause duty-cycle-distortion of the receiver clock. The phase of the outputs of the interpolator is changed, by changing the tail bias currents. Two sets of differential clock signals (or four phases) with a phase difference of 90-deg, (i.e., 56 ps at 4.5 GHz), drive the inputs of the interpolator. Since there are eight differential control-switches (C1-C8 and C1b-C8b), turning one switch on or off causes approximately 7 ps of advancement or delay, respectively. The gain of the interpolator also helps suppress amplitude mismatches of the signals from the RC-CR filters. Comparators are used at the first stage of the receiver to sample the received signal and convert a small input signal to CMOS voltage levels. Most high-speed comparators consist of a preamplifier followed by a regenerative latching stage, with each stage driven by complementary clock signals. While on, the preamplifier operates as a differential amplifier and the latching stage, in turn, amplifies the signal. The voltage amplitude at the end of a long lossy transmission line (from the transmitter) can be as small as 200 mV due to resistive attenuation along the transmission line. Since the data rate is fast (9 Gbit/s), the comparator is formed as a cascade of two pairs of preamps and latching stages, as shown in Fig. 14, Fig. 19. Cross-section of the on-chip transmission line. Fig. 20. Chip micrograph. in order to have sufficient gain to avoid metastability. Since the sampling instances for the two interleaved comparators differ in phase by 180-deg, the outputs of both comparators are synchronized to a single clock phase before the data is de-serialized. # C. Clock Distributionp Fig. 15 shows a block diagram of the type II charge-pump PLL. The frequency of one of the two output signals from an *LC* oscillator is divided by 32 and compared with the reference clock, and the filtered output of the charge-pump controls the VCO frequency. In order for the RC-CR filter in the receiver to function properly, the input signals to this filter, provided by the PLL, *PLL\_out+* and *PLL\_out-*, should be in form of sine waves. Typically, the differential outputs of an *LC* oscillator are buffered and distributed to the other functional blocks as square-waves. However, since the RC-CR filter requires sine wave inputs, the outputs of the *LC* oscillator are distributed as sine waves to the frequency divider in PLL, to the clock driver in the transmitter, and phase interpolator in the receiver. The buffer stage, shown in Fig. 16, is employed in the first stage of frequency divider in the PLL and in the transmitter to convert sine waves from an *LC* oscillator to square waves. The buffer stage consists of an AC coupling capacitor followed by a pair of inverter stages each with shunt feedback. This feedback configuration sets DC input voltage level at the first inverter. Both inverters are minimum size. This buffer structure is also used as an output buffer for the LC oscillator to drive large capacitive load ( $\sim 2$ pF), with most of the capacitive load coming from the RC-CR filters in phase-tuned receiver. The main difference between this output buffer and the previous one is that this buffer stage has much bigger second stage to allow large drive strength without saturation. ## D. Self Test A self-test error-check block compares data sent to the transmitter with data from the receiver, and generates an error signal when there is a mismatch. The error counter receives and compares two 8-bit data words every 1.125-GHz clock cycle. As shown in Fig. 17, the original 8-bit data sent to the transmitter is stored in registers in the *Fixed Delay* and *Variable Delay* blocks until the serialized data is sampled and recovered at the receiver, and sent to the self-test block. In order to decide whether the recovered data is the same as the original data sent by the transmitter, the data from the receiver is first transferred to the clock domain of the transmitter. The clock synchronization block performs clock-domain alignment. A *WINDOW SELECT* block stores recovered data for two 1.125-GHz clock cycles and selects an 8-bit window from the 16 stored data-bits for comparison. Every clock cycle, the error counter shown in Fig. 18 compares two 8-bit input data patterns and increments the error count if a mismatch is found. By performing a bit-wise XOR of the two 8-bit data words and ORing the result, an *ENABLE* signal increments the error count whenever a mismatch is found. ## E. Transmission Line In the prototype device a lossy on-chip differential transmission line, as shown in Fig. 19, is used for global signaling over 5.8 mm. A differential strip-line transmission line is formed with two 6 $\mu$ m wide metal-5 wires separated by 3 $\mu$ m, over a 21 $\mu$ m wide metal-2 ground plane. From Fig. 5, with these physical parameters the ideal length is more than 6 mm. The simulated Fig. 21. Measured clock and serialized data at the output of the drivers. Fig. 22. Output of the self-test logic with (left) and without (right) deliberate timing error for the (a) 10101010 and (b) 01001110 patterns. characteristic impedance is 30.6 $\Omega$ and the line is terminated at the receiver with a 30.6 $\Omega$ polysilicon resistor. Fig. 23. Measurement setup; Cascade GSG probes are used to measure the operating frequency, RMS jitter, and the serialized data. #### TABLE I PERFORMANCE SUMMARY. | | 1.7mm × 2.1mm | | |--------------|--------------------------------------------|-------| | Area | $0.5$ mm $\times 0.3$ mm $(TX)$ | | | | $1.4 \text{ mm} \times 0.4 \text{mm} (RX)$ | | | Power | Analog | 105mW | | ( Ref. CLK : | TX | 420mW | | 140MHz, | RX | 180mW | | VDD: 1.5V) | Self-Test Error-Check | 240mW | | BER | 10 <sup>-12</sup> (10101010) | | | | 10 <sup>-10</sup> (01001110) | | | Technology | 0.13μm 8M CMOS | | ## IV. EXPERIMENTAL RESULTS The prototype is fabricated in a 0.13 $\mu$ m 8-metal CMOS process, and measures 1.7 mm $\times$ 2.1 mm including pads. The transmitter and receiver occupy 0.15 and 0.56 mm<sup>2</sup>, respectively. The chip micrograph is shown in Fig. 20. The 5.8-mm transmission line is routed from the transmitter to the receiver outside the I/O pads. Functionality of the prototype is verified with pre-defined test data patterns. Measured data waveforms are shown in Figs. 21 and 22. Fig. 21 shows the pre-defined serialized patterns, 10101010, 11001100, 1110000, and 01001110, at the output of the drivers, along with the 4.5-GHz PLL clock, all of which are measured directly using a GSG probe on a probe station. The received de-serialized data are monitored by the on-chip self-test error-checking logic and the error count output is recorded with a mixed-signal oscilloscope. Fig. 23 shows the test setup to measure the serialized data and the recovered data. Since the frequency of the main PLL, is high at 4.5 GHz, a Cascade ACP40 GSG probe is used for direct measurement. Another GSG probe is used at the same time to record serialized data at the output of the transmitter. A Tektronix TDS 694C digital real-time oscilloscope is used for the measurement of the recovered and de-serialized data from the receiver, and three MSB outputs of eight bit error counter bits are monitored with the Agilent 54641D mixed signal oscilloscope (The 8-bit recovered data words are static when there is no timing error). The device was packaged in a low-cost ceramic LCC package, soldered to a custom designed four-layer FR4 PCB. When a deliberate mismatch is introduced between transmitted and recovered data through a deliberate timing error, the error counter accumulates as shown on the left side of Fig. 22. When the receiver correctly recovers data, the error counter does not increment as shown on the right side of Fig. 22. The measured BER is less than $10^{-12}$ with a 10101010 pattern, and is less than $10^{-10}$ with a 01001110 pattern. The prototype operates with a 1.5 V supply and the total currents for the analog circuits (i.e., charge pump and LC oscillator in the PLL, comparator, and phase interpolator), transmitter, receiver, and self-test error-checking logic are 70, 280, 120, and 160 mA, respectively. The performance of the chip is summarized in Table I. ## V. CONCLUSION A complete on-chip transceiver communicating over a 5.8-mm lossy on-chip transmission line is described. Even though on-chip standard metal layers show relatively high resistivity, ISI can be avoided by careful selection of appropriate termination and the physical parameters of the transmission line. Proper selection of metal layers and optimal selection of characteristic impedance and its related physical parameters allow the large bandwidth required for high speed digital data communication. ## REFERENCES - M. P. Flynn and J. J. Kang, "Global signaling over lossy transmission lines," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Des.*, 2005, pp. 985–992. - [2] A. P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed current-mode signaling for nearly speed-of-light intrachip communication," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 772–780, Apr. 2006. - [3] J. J. Kang, J. Park, and M. P. Flynn, "Global high-speed signaling in nanometer CMOS," in *Proc. Asian Solid-State Circuits Conf.*, 2005, pp. 393–396. - [4] D. Schinkel, E. Mensink, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, "A 3-Gb/s/ch transceiver for 10-mm uninterrupted RC-limited global on-chip interconnects," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 297–306, Jan. 2006. - [5] A. P. Jose and K. L. Shepard, "Distributed loss-compensation techniques for energy-efficient low-latency on-chip communication," *IEEE J. Solid-State Circuits*, vol. 42, no. 6, pp. 1415–1424, Jun. 2007. - [6] E. Mensink, D. Schinkel, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, "Optimal positions of twists in global on-chip differential interconnects," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 4, pp. 438–446, Apr. 2007. - [7] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, "Low-power, high-speed transceivers for network-on-chip communication," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 17, no. 1, pp. 12–21, Jan. 2009. - [8] H. Ron, T. Ono, R. D. Hopkins, A. Chow, J. Schauer, F. Y. Liu, and R. Drost, "High speed and low energy capacitively driven on-chip wires," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 52–60, Jan. 2008. - [9] R. T. Chang, N. Talwalkar, C. P. Yue, and S. S. Wong, "Near speed-of-light signaling over on-chip electrical interconnects," *IEEE J. Solid-State Circuits*, vol. 38, no. 5, pp. 834–838, May 2003. - [10] R. Ho, K. Mai, and M. Horowitz, "Efficient on-chip global interconnects," in *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, 2003, pp. 271–274. - [11] B. Kleveland, T. H. Lee, and S. S. Wong, "50-GHz interconnect design in standard silicon technology," in *Proc. IEEE MTT-S Int. Microw. Symp. Dig.*, 1998, pp. 1913–1916. - [12] H. Hasegawa, M. Furukawa, and H. Yanai, "Properties of microstrip line on Si-SiO<sub>2</sub> system," *IEEE Trans. Microw. Theory Techn.*, vol. MTT-19, no. 11, pp. 869–881, Nov. 1971. - [13] B. Kleveland, C. H. Diaz, D. Vook, L. Madden, T. H. Lee, and S. S. Wong, "Exploiting CMOS reverse interconnect scaling in multigigahertz amplifier and oscillator design," *IEEE J. Solid-State Circuits*, vol. 36, no. 10, pp. 1480–1488, Oct. 2001. - [14] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*. Englewood Cliffs, NJ: Prentice Hall, 2003. - [15] P. Chiang, W. J. Dally, M. J. E. Lee, R. Senthinathan, O. Yangjin, and M. A. Horowitz, "A 20-Gb/s 0.13-μm CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer," *IEEE J. Solid-State Circuits*, vol. 40, pp. 1004–1011, 2005. - [16] B. Razavi, *RF Microelectronics*. Englewood Cliffs, NJ: Prentice Hall, 1998 **JunYoung Park** (M'99) received the B.S. and M.S. degrees in electrical engineering from Inha University, Incheon, South Korea, in 1998 and 2000, respectively, and Ph.D. degree in electrical engineering from the University of Michigan, Ann Arbor, in 2008. He joined the mixed signal group at Qualcomm, San Diego, CA, in 2008 and participated in the design of oscillators, PLLs, and DDR memory interface timing circuits. His research interests are in highspeed circuits for on-chip and off-chip communica- tion and RF circuits for the wireless communication. **Joshua Kang** (M'99) received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1998, and the M.S. and Ph.D. degrees in electrical engineering from University of Michigan, Ann Arbor, in 2004 and 2009 respectively. From 1998 to 2002, he worked at a biomedical electronics venture, MGB Endoscopy, where he was involved in designing medical electronic devices. He joined Marvell Semiconductor, Inc., Santa Clara, CA, in 2008. His current field of research is the design of integrated circuits for wired communication application. Dr. Kang received a Samsung Scholarship for his Ph.D study in 2002 and was a co-recipient of the Outstanding Student Designer Award from Analog Device, Inc. in 2004. **Sunghyun Park** (S'02–M'06) received the B.S. degree from Seoul National University, South Korea, in 1998, and the M.S. and Ph.D. degrees from the University of Michigan, Ann Arbor, in 2003 and 2006, respectively, all in electrical engineering. He has been with Qualcomm Inc., Santa Clara, CA, since 2006. In 1998, he was with the Optimal Robust Control Laboratory, Seoul National University, Seoul, Korea. From 1998 to 2001, he was in the Republic of Korea Army. From 2004 to 2006, he was with Intel in Hillsboro, OR. His technical interests include integrated circuits design in the fields of data conversion, active analog filtering, power management, and calibration systems. Dr. Park received the Korean IT national scholarship in 2001, the Analog Devices Outstanding Student Designer Award, in 2003 and the Intel Foundation Ph.D. fellowship in 2004. Michael P. Flynn (S'92–M'95–SM'98) was born in Cork, Ireland. He received the B.E. and M.Eng.Sc. degrees from the National University of Ireland at Cork, Ireland, in 1988 and 1990, respectively, and the Ph.D. degree from Carnegie Mellon University, Pittsburgh, PA. in 1995. From 1998 to 1991, he was with the National Microelectronics Research Centre, Cork, Irealand, and from 1993 to 1995, he was with National Semiconductor, Santa Clara, CA. From 1995 to 1997, he was a Member of Technical Staff with DSP R&D Labo- ratory, Texas Instruments Incorporated, Dallas, TX. During the four year period from 1997 to 2001, he was with Parthus Technologies, Cork, Ireland, where he held the positions of Technical Director and Fellow. During that time, he was also a part-time faculty member at the Department of Microelectronics, National University of Ireland (UCC), Cork. Since 2002, he has been with joined the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor. His technical interests are in data conversion, gigabit serial transceivers, and RF circuits. Dr. Flynn received the 1992 to 1993 IEEE Solid-State Circuits pre-doctoral Fellowship. He received the NSF Early Career Award in 2004. In March 2006, he received the 2005–2006 Outstanding Achievement Award from the Department of Electrical Engineering and Computer Science at the University of Michigan. He is a 2007 Guggenheim Fellow. He was Associate Editor of the IEEE Transactions on Circuits and Systems—II: Analoag and Digital Signal Processing from 2002 to 2004. He is an Associate Editor of the IEEE Journal of Solid State Circuits (JSSC) and serves on the Technical Program Committees of the International Solid State Circuits Conference (ISSCC). He is a member of Sigma Xi. He is Thrust Leader responsible for Wireless Interfaces at Michigan's Wireless Integrated Microsystems NSF Engineering Research Center.