Roverato, Enrico; Kosunen, Marko; Lemberg, Jerry; Martelius, Mikko; Stadius, Kari; Anttila, Lauri; Valkama, Mikko; Ryynanen, Jussi

A High-Speed DSP Engine for First-Order Hold Digital Phase Modulation in 28-nm CMOS

Published in:
IEEE Transactions on Circuits and Systems II: Express Briefs

DOI:
10.1109/TCSII.2018.2818759

Published: 01/01/2018

Document Version
Peer reviewed version

Please cite the original version:
A High-Speed DSP Engine for First-Order Hold Digital Phase Modulation in 28-nm CMOS

Enrico Roverato, Marko Kosunen, Member, IEEE, Jerry Lemberg, Student Member, IEEE, Mikko Martelius, Student Member, IEEE, Kari Stadius, Member, IEEE, Lauri Anttila, Member, IEEE, Mikko Valkama, Senior Member, IEEE, and Jussi Ryynänen, Senior Member, IEEE

Abstract—Conventional delay-based digital phase modulators use a zero-order hold (ZOH) phase control word to modulate the square-wave RF carrier. Recently, new architectures capable of performing first-order hold (FOH) digital phase modulation have been proposed, thus improving the wideband performance to a level suitable for 5G base stations. While currently available literature focuses on the generic operation principle, this brief details the first on-chip implementation of the DSP engine required for actual FOH computations. The circuit is based on a simple iterative algorithm, which can be pipelined for high-speed operation. The DSP engine has been integrated as part of a prototype 5G base-station outphasing transmitter, fabricated in 28-nm CMOS. When processing a 100-MHz orthogonal frequency-division multiplexing signal, the DSP achieves an adjacent-channel leakage ratio of –53 dBc, which is 12 dB better than with conventional ZOH phase modulation. Furthermore, the system enables flexible upconversion to any frequency between 0.35 and 2.1 GHz from a fixed 1.5-GHz reference clock. The power consumption of a single engine is lower than 18 mW.

Index Terms—Digital phase modulator, DSP, first-order hold (FOH), phase interpolation.

I. INTRODUCTION

DURING the recent years, digital-intensive delay-based architectures have become a popular approach to implement the phase modulation in RF transmitters for wireless communications [1]–[7]. The basic idea behind delay-based phase modulation, illustrated in Fig. 1, is to change the delay of the square-wave local oscillator (LO) signal on a sample-by-sample basis, according to the phase control word $\Phi_w[n]$. The implementation of this concept requires only digital building blocks such as buffers and multiplexers, thus taking advantage of the improved time resolution of modern deep-submicrometer CMOS processes.

The system of Fig. 1 is equivalent to performing phase modulation with the zero-order hold (ZOH) version of the original continuous-time phase signal $\Phi(t)$. The interaction between such discrete-time operation and the harmonics of the square-wave RF carrier was recently analyzed in the context of pulse-width modulation [8]–[11] and outphasing [12] digital transmitter architectures. In the latter case, the sampling images of $\Phi_w[n]$, only weakly attenuated by the ZOH, are shifted in frequency by the LO harmonics and fall on top of the main signal band. This causes a degradation of the adjacent-channel leakage ratio (ACLR), which becomes worse as the RF bandwidth increases. Therefore, the ZOH...
is a fundamental performance limiting factor for wideband operation of digital outphasing transmitters.

To overcome this problem, [12] introduced the digital interpolating phase modulator (DIPM), a new architecture capable of performing first-order hold (FOH) phase interpolation. FOH enables significantly better sampling image attenuation compared to ZOH, which improves the wideband performance to a level suitable for 5G base stations. The new architecture relies on DSP hardware to perform the actual FOH computations, since it requires real-time processing of the interpolation problem illustrated in Fig. 2. However, despite the central role of the DSP engine in FOH digital phase modulation, the focus of currently available literature is mainly on the generic operation principle, while the DSP details are omitted.

This brief presents the first on-chip implementation of the DSP engine for a FOH digital phase modulator. The circuit is based on an iterative algorithm, which uses only elementary arithmetic and can be pipelined for high-speed operation. In order to demonstrate the superior wideband performance and high reconfigurability enabled by FOH, a prototype 5G base-station outphasing transmitter has been fabricated in 28-nm CMOS [13]. The chip includes two DIPMs driven by eight instances of the proposed DSP engine, which are integrated as part of the digital front-end (DFE). Experimental results show that FOH improves the ACLR of a 100-MHz orthogonal frequency-division multiplexing (OFDM) signal from −41 to −53 dBC, when compared to conventional ZOH phase modulation. Furthermore, the system enables flexible upconversion to any frequency between 0.35 and 2.1 GHz, while utilizing a fixed 1.5-GHz reference clock.

The rest of this brief is organized as follows. Section II briefly introduces the outphasing transmitter architecture. The background on FOH digital phase modulation is summarized in Section III. Section IV explains the architectural details of the proposed DSP. Chip implementation and experimental results are discussed in Section V. Finally, Section VI draws the conclusions.

II. THE OUTPHASING TRANSMITTER

In the outphasing transmitter architecture, the RF carrier \( V(t) \) with amplitude and phase modulation is formed by combining two constant-envelope phase-modulated signals \( S_1(t) \) and \( S_2(t) \). Outphasing is defined by the set of equations

\[
V(t) = S_1(t) + S_2(t),
\]

\[
S_{1,2}(t) = \frac{1}{2} \sin(\omega_t t + \Phi_{1,2}(t)),
\]

\[
\Phi_1(t) = \phi(t) + \theta(t),
\]

\[
\Phi_2(t) = \phi(t) - \theta(t),
\]

\[
\theta(t) = \arccos(r(t)), \quad r(t) \in [0,1],
\]

where \( \omega_t \) is the angular carrier frequency, and \( r(t) \) and \( \phi(t) \) are the normalized magnitude and phase of the complex baseband data, respectively.

The outphasing signals \( S_1(t) \) and \( S_2(t) \) are traditionally defined with the cosine function. For the sake of coherence with the notation adopted in the following Sections, here they are defined with the sine function. As \( \cos(\alpha) = \sin(\alpha + \pi/2) \), using the sine function simply leads to an irrelevant \( \pi/2 \) phase shift in the RF carrier.

III. FOH DIGITAL PHASE MODULATION

A. General Problem Formulation

In ideal phase modulation, a continuous-time signal \( \Phi(t) \in [-\pi, \pi] \) modulates the phase of the RF carrier as

\[
S_{\text{id}}(t) = \sin(\omega_t t + \Phi(t)).
\]

In digital phase modulation, all signals are rail-to-rail, and the modulator would ideally generate the square-wave version of (6) given by

\[
S(t) = \text{sgn}(S_{\text{id}}(t)) = \text{sgn}(\sin(\omega_t t + \Phi(t))),
\]

where \( \text{sgn}(\cdot) \) is the sign function. By examining (7), it is clear that \( S(t) \) toggles whenever the argument of the sine function crosses any integer multiple of \( \pi \), as expressed by the condition

\[
(\omega_t t + \Phi(t)) \mod \pi = 0.
\]

Hence, digital phase modulation can be realized by toggling the RF carrier at the time instants which fulfill (8).

Fig. 3 shows the circuit architecture of a digital phase modulator based on this concept [12], [13]. The basic idea is to toggle a T flip-flop at the time instants generated by a digital-to-time converter (DTC). Because (8) can occur multiple times within one sample period \( T_s \), four DTCs are implemented in parallel, where each DTC covers 25% of \( T_s \). This configuration enables instantaneous frequencies up to twice the sample rate \( F_s = 1/T_s \).

The challenge is to design an efficient DSP hardware which is able to compute the solutions of (8).

B. Piecewise Linear Approximation

For sufficiently high sample rates, we can estimate \( \Phi(t) \) from the discrete-time phase control word \( \Phi_{\text{w}}[n] = \Phi(nT_s) \) by linearly interpolating between consecutive samples. This is equivalent to using the FOH approximation

\[
\Phi(t) \approx \Phi_F(t) = \sum_{n=-\infty}^{+\infty} \Phi_{\text{w}}[n] \cdot \text{tri}\left(\frac{t-nT_s}{T_s}\right).
\]
where \( \text{tri}(x) \) is the triangular function:

\[
\text{tri}(x) = \begin{cases} 
1 - |x| & \text{for } |x| < 1 \\
0 & \text{otherwise}.
\end{cases}
\]  

Because (9) is the equation of a linear time-invariant filter, the spectrum of \( \Phi_F(t) \) contains the digital images of \( \Phi_w[n] \) attenuated by a \( \text{sinc}^2 \) response, which is the Fourier transform of the triangular impulse response of the filter. Furthermore, by applying (9) to (6)–(8), the term \( \omega_c t + \Phi_F(t) \) becomes a piecewise linear function.

Next, we want to find the solutions of (8) with a resolution of \( 2^R \) within each sample period, where \( R \in \mathbb{N} \). By defining

\[
\rho[n] = \omega_c \cdot n T_s + \Phi_w[n],
\]

\[
\Delta \rho[n] = \rho[n] - \rho[n - 1],
\]

we can formulate the piecewise linear phase increment of \( \omega_c t + \Phi_F(t) \) between samples \( (n - 1)T_s \) and \( nT_s \) as

\[
\rho_F[n,k] = \rho[n - 1] + \Delta \rho[n] \cdot \frac{k}{2^R},
\]

where \( k \in \{0, \ldots, 2^R - 1\} \) gives the wanted resolution. According to the implementation of Fig. 3, we divide (13) into four “sections”, each of which covers 25% of \( T_s \) and yields at most one \( \pi \) crossing. The linear phase increment within the four sections is thus given by

\[
\rho_{F,i}[n,k_i] = \rho[n - 1] + i \cdot \frac{\Delta \rho[n]}{4} + \Delta \rho[n] \cdot \frac{k_i}{2^R},
\]

where \( i = 0, \ldots, 3 \), \( k_i \in \{0, \ldots, 2^{R-2} - 1\} \), and the initial phase for each section \( \rho_{F,i}[n,0] \) is highlighted. Note that the generalization \( \rho_{F,i}[n,0] = \rho[n] \) will be used later in the text.

In conclusion, the computing task can be formulated as follows. During each sample period, we need to calculate in real-time the values \( k_i[n] \) which yield the best possible approximation of

\[
\rho_{F,i}[n,k_i][n] \mod \pi \approx 0,
\]

for \( i = 0, \ldots, 3 \). These values will be used as the control words for the DTCs of Fig. 3, in order to determine the toggle instants of \( S(t) \). In addition, each individual DTC must be also capable of not generating a toggle event, if (15) is not satisfied for any \( k_i \in \{0, \ldots, 2^{R-2} - 1\} \). This can be handled by calculating additional enable conditions \( e_i[n] \), for \( i = 0, \ldots, 3 \).

IV. DSP IMPLEMENTATION

Fig. 4 depicts the top-level block diagram of the proposed DSP, which can be used in conjunction with the RF front-end of Fig. 3. At the core of the circuit are four DSP engines, which compute the control words \( k_i[n] \) and enable signals \( e_i[n] \) for the corresponding DTCs. The rest of the logic calculates \( \rho_{F,i}[n,0] \) and \( \Delta \rho[n]/4 \), which are used by the DSP engines to process (14)–(15).

The implementation complexity is greatly simplified by employing fixed-point signed arithmetic with proper choice of the full-scale signal ranges. Referring to Fig. 4, \( \rho[n] \) requires a range of \([-4\pi, +4\pi]\) in order to account for the worst-case increment of \( \omega_c t + \Phi_F(t) \) during one \( T_s \). On the other hand, all other signals describe the phase during 25% of \( T_s \), and a reduced range \([-\pi, +\pi]\) is sufficient.

In the remainder of this Section, the datapath used to compute \( e_i[n] \) and \( k_i[n] \) within each DSP engine is described in details.

A. Enable Logic

For each \( i = 0, \ldots, 3 \), the enable condition \( e_i[n] \) determines whether the \( i \)-th DTC generates a toggle event during the \( n \)-th sample period. Therefore, \( e_i[n] \) is true if the piecewise linear term \( \omega_c t + \Phi_F(t) \) crosses any integer multiple of \( \pi \) during the time interval

\[
(n - 1 + \frac{i}{4}) T_s \leq t < (n - 1 + \frac{i + 1}{4}) T_s.
\]

The enable conditions are straightforward to calculate. Fig. 5 shows the datapath for \( e_i[n] \) implemented within each DSP engine. A crossing of \( \pi \) happens during (16) if:

- \( \rho_{F,i}[n,0] \mod \pi = 0 \) (i.e. the crossing is located at the beginning of the section), OR
- the signs of \( \rho_{F,i}[n,0] \) and \( \rho_{F,i+1}[n,0] \) are different. AND \( \rho_{F,i+1}[n,0] \mod \pi \neq 0 \) (i.e. the crossing is not located at the beginning of the next section).

Note that because of the fixed-point range \([-\pi, +\pi]\), all “mod \( \pi \)” operations correspond to discarding the MSB of the operand and interpreting the result as unsigned, while all \( \text{sgn}(\cdot) \) operations are equivalent to selecting the MSB of the operand.

B. FOH Logic

For each \( i = 0, \ldots, 3 \), the control word \( k_i[n] \) determines the location of the toggle event (if any) generated by the \( i \)-th DTC during (16). In the proposed DSP engine, \( k_i[n] \) is calculated
converges to iteration, in a binary search fashion. Ultimately, the algorithm stage (the cascade of (a) Overview of the structure, showing the initial conditions followed by a from the initial phase By analyzing the code, it can be seen that the algorithm starts crossing within a (b) Implementation of the $\rho_j$ path inside the $j$-th stage (the $x_j$ path is implemented in a similar fashion).

by means of an iterative algorithm with $R - 2$ stages, with pseudocode given below.

\[
\begin{align*}
  x_0 & = 0 \\
  \rho_0 & = \rho_{F,i}[n,0] \\
  \text{for } j = 1 \text{ to } R - 2 \text{ do} & \\
  & \text{if } \rho_j \bmod \pi = 0 \text{ then} \\
  & \quad x_j \leftarrow x_{j-1} \\
  & \quad \rho_j \leftarrow \rho_{j-1} \\
  & \text{else if } \text{sgn}(\rho_{j-1}) \neq \text{sgn}(\rho_{F,i+1}[n,0]) \text{ then} \\
  & \quad x_j \leftarrow x_{j-1} + 2^{R-2-j} \cdot \Delta \rho[n]/4 \\
  & \quad \rho_j \leftarrow \rho_{j-1} + 2^{-j} \cdot \Delta \rho[n]/4 \\
  & \text{else} \\
  & \quad x_j \leftarrow x_{j-1} - 2^{R-2-j} \\
  & \quad \rho_j \leftarrow \rho_{j-1} - 2^{-j} \cdot \Delta \rho[n]/4 \\
  \text{end for} \\
  k_i[n] & = x_{R-2} \\
  \rho_{F,i}[n,k_i[n]] & = \rho_{R-2}
\end{align*}
\]

By analyzing the code, it can be seen that the algorithm starts from the initial phase $\rho_{F,i}[n,0]$, and successively approximates the $\pi$ crossing within a $\pm 2^{-j}$ relative error after the $j$-th iteration, in a binary search fashion. Ultimately, the algorithm converges to $k_i[n]$ with the desired resolution of $2^{R-2}$.

Fig. 6(a) shows the datapath for $k_i[n]$ implemented within each DSP engine. In order to achieve a throughput of $F_s$, the $R - 2$ algorithm iterations are unrolled and implemented in a cascade of stages. Therefore, the circuit can be pipelined for high-speed operation. Fig. 6(b) shows the path used to update the value of $\rho_j$ within the $j$-th stage. Only elementary arithmetic operations and logic gates are used, and the same observations about "mod $\pi$" and $\text{sgn}()$ hold here as for the enable logic. The circuit for the $x_j$ path is similar to Fig. 6(b).

\[\text{V. Experimental Results}\]

A prototype 5G base-station outphasing transmitter has been fabricated in a 28-nm FDSOI CMOS process [13]. The two signal components $S_{1,2}(t)$ are generated by two DIPMs (Fig. 3), each including the DSP of Fig. 4. Therefore, eight instances of the DSP engine described in Section IV are integrated into the DFE of the chip. The chosen phase resolution $R = 10$ bits requires 8 cascaded stages for the datapath of Fig. 6(a), which are pipelined into 16 levels in order to run at $F_s = 1.5$ GHz. Pipelining is implemented by taking advantage of the retiming feature of the digital synthesis software, which automatically distributes the pipeline registers to optimal locations within the datapath. An FPGA feeds the discrete-time phase samples

\[\rho_{1,2}[n] = \omega_c \cdot nT_s + \Phi_{1,2}(nT_s)\]

for the two modulators through a 48-Gb/s deserializer with eight data lanes. Note that $\omega_c$ in (17) is a variable, and can thus be used to perform upconversion to any carrier frequency. The only constraint is that (15) yield at most one toggle instant for each 25% section of $T_s$.

The chip micrograph is shown in Fig. 7. Besides the DSP block described in this brief, the DFE includes also the data alignment logic for deserialization, as well as some control functionality for the entire chip. The 1.28 \times 0.36-mm$^2$ DFE is synthesized and place-and-routed from 42.4k standard cells with an area utilization of 9%. The layout is made intentionally large in order to fill up all empty space within the input data pads with supply decoupling cells.

Fig. 8 shows the spectra of the signal at the DFE output. This signal is reconstructed with an ideal system-level model of the RF front-end, in order to evaluate the performance of the DFE alone. In Fig. 8(a), a baseband signal with 100-MHz RF bandwidth, formed by aggregating five 20-MHz OFDM carriers, is upconverted to a carrier frequency $f_c = \omega_c / 2\pi = F_s = 1.5$ GHz. For comparison, the same signal generated with a system-level model of ZOH digital outphasing modulation is also shown in Fig. 8(a). The ACLR improvement is 12 dB, from $-41$ to $-53$ dBc, or equivalently $93.7\%$ lower in absolute terms. This enhancement is due to the $\text{sinc}^2$ frequency response of FOH, which enables significantly better sampling image attenuation compared to the $\text{sinc}$ response of ZOH. Fig. 8(b) demonstrates flexible upconversion of a 20-MHz OFDM signal to carrier frequencies between 0.35 and 2.1 GHz, performed without changing the 1.5-GHz reference.\textsuperscript{2} 100 MHz is the maximum RF bandwidth specified by the newly released 5G New Radio standard (3GPP TS 38.104) for frequency bands below the mm-wave range. Generating this bandwidth by aggregation of five 20-MHz OFDM carriers (as in 4G) does not change the RF properties of the signal.
clock. Further measurement results can be found in [13], demonstrating that the circuit works also when part of the full base-station 5G transmitter.

Because the DFE and deserializer share the same supply, it is not possible to measure the power consumption of the DSP directly. Nevertheless, according to post-layout simulations, the DSP consumes at most 60% of the total combined power. Since the measured power consumption of DFE+deserializer is 233 mW, and the DSP includes eight engines, we can estimate that a single DSP engine consumes less than 18 mW.

VI. CONCLUSION

We presented the first on-chip DSP engine for a FOH digital phase modulation in 28-nm CMOS. In outphasing configuration, the system improves the ACLR of a 100-MHz OFDM signal by 12 dB compared to ZOH phase modulation, and can perform upconversion to any frequency between 0.35 and 2.1 GHz with a fixed 1.5-GHz reference clock. Therefore, the DSP described in this brief is the key enabler of superior wideband performance and high reconfigurability in digital outphasing transmitters, making it a competitive approach for future 5G base stations.

REFERENCES