# **Real-time Implementation of Digital Coherent Detection**

R. Noé, U. Rückert, S. Hoffmann, R. Peveling, T. Pfau, M. El-Darawy, A. Al-Bermani



UNIVERSITÄT PADERBORN Die Universität der Informationsgesellschaft

University of Paderborn, Electrical Engineering Optical Communications and High-Frequency Engineering

# Outline

- Introduction
- Real-time constraints for coherent receiver algorithms
- Angle-based phase estimation for QPSK
- Combination with polarization multiplex
- Integrated DSPU for a PDM-QPSK receiver
- Real-time measurement results
- Conclusion and outlook

# **Coherent QPSK transmission**

- QPSK transports 2 bits per transmitted optical symbol compared to OOK
- Lower symbol rate enhances chromatic and polarization mode dispersion tolerance (10 Gbaud Polarization multiplex QPSK = 40 Gbit/s)
- Feedforward receiver concepts can easily be implemented using digital signal processing compared to classical OPLL approach.
- Off-the-shelf, low-cost, small-sized DFB lasers suffice in spite of phase noise.

#### **Coherent optical receiver structure**



Compensation of intermediate frequency and phase noise, polarization crosstalk, PMD, CD, (nonlinear effects,) ... using digital signal processing.

### Internal structure of the DSPU



# **Real-time constraints for receiver DSP algorithms**

- Demultiplexing and parallelization allows to use standard logic elements with relaxed clock speed requirements.
- Delay robustness of control algorithms for all the cases when feedback loops cannot be avoided at all.
- Efficient hardware is required to enable a high degree of parallelization with moderate area and power consumption.

## **Feasibility of parallel processing**

ADC sampling frequency: 10 GHz to 56 GHz

Demultiplexing to 16...128 parallel channels

DSPU clock frequency: 100 MHz to 800 MHz



### **Comparison of FIR and IIR filters**



# **Real-time constraint: Hardware efficiency**

Computationally complex algorithms increase the required chip area, power consumption and cost.

Ways to increase hardware efficiency:

• Signal transformations, example: FFT/ IFFT

Convolution FFT/IF



Multiplication

- Use of look-up tables
- Optimization of the required precision

## **Real-time constraint: Tolerance against feedback delays**

- Digital signal processing for coherent optical receivers requires massive
- parallel processing,
- pipelining.



### **Decision-directed carrier recovery**



### **Feedforward carrier recovery**

Viterbi & Viterbi Algorithm<sup>[1]</sup>:



[1] R. Noé, IEEE J. Lightwave Technology, Vol. 23, No. 2, Feb. 2005, pp. 802-808
[2] S. Hoffmann et al., IEEE Photon. Technol. Lett., Vol. 21, No. 3, Feb. 2009, pp. 137-139

### **Barycenter algorithm**



# **Digital synchronous QPSK receiver scheme**

Differential encoding of data quadrant number  $n_d$  in TX



Functionally identical with analog scheme

UNIVERSITÄT PADERBORN

 $\underline{X} \sim \underline{E}_{TX} \underline{E}_{LO}^*$ 





### Detection and correction of quadrant phase jump



- Data bits  $d_1, d_2 \Leftrightarrow$  quadrant number  $n_d$
- Differential encoding of quadrant number in transmitter:  $n_c(i) = (n_d(i) + n_c(i-1)) \mod 4$
- Differential decoding of quadrant number in receiver, taking phase jumps into account !



| <i>d</i> <sub>1</sub> , Re <u><i>c</i></u> , <i>o</i> <sub>1</sub> | $d_2$ , Im $\underline{c}$ , $o_2$ | $n_d, n_c, n_o$ |
|--------------------------------------------------------------------|------------------------------------|-----------------|
| 1                                                                  | 1                                  | 0               |
| -1                                                                 | 1                                  | 1               |
| -1                                                                 | -1                                 | 2               |
| 1                                                                  | -1                                 | 3               |

# **Electronic polarization control**







- $\langle \mathbf{Q} \rangle$  be a perfect estimate of MJ  $\mathbf{M} := \langle \mathbf{Q} \rangle^{-1} \mathbf{M} = (\mathbf{MJ})^{-1} \mathbf{M} = \mathbf{J}^{-1}$ •  $\langle \mathbf{Q} \rangle \rightarrow \mathbf{1} \Rightarrow \langle \mathbf{Q} \rangle^{-1} = (\mathbf{1} - (\mathbf{1} - \langle \mathbf{Q} \rangle))^{-1} \approx \mathbf{1} + (\mathbf{1} - \langle \mathbf{Q} \rangle)$
- A be a data vector  $\Rightarrow \langle N \rangle = \langle (NA)A^+ \rangle$

$$\mathbf{Q}(i) = (1/2) \cdot \mathbf{X}(i-N) \cdot e^{-j\varphi(i)} \cdot \mathbf{r}(i)^{+}$$

 $\mathbf{M} := (\mathbf{1} + g(\mathbf{1} - \mathbf{Q}))\mathbf{M}$ 

- $g \ge 10^{-3}$  results in well sufficient accuracy of matrix elements and control time constant on the order of  $\le 10^3$  cycles.
- At 10 Gbaud control time constants down to ≤ 100 ns are possible.

## **Decision-directed polarization control**



R. Noé, IEEE Photon. Technol. Lett., Vol. 17, No. 4, April 2005, pp. 887-889

# DSP components for real-time synchronous QPSK transmission

### Single-chip system



- Highest integration
  - $\rightarrow$  small footprint
- Simple interfacing
- Ommon technology for ADCs and DSPU
  - $\rightarrow$  suboptimal performance



- Optimum performance
- Possibility to use commercial ADCs
- 8 Complex interface
- 8 Increased footprint

# **Chip Specifications**

| SiGe ADC                  |                     | CMOS ASIC                                      |                                         |                    |            |
|---------------------------|---------------------|------------------------------------------------|-----------------------------------------|--------------------|------------|
| Technology                | 0.25µm SiGe         | Standard Cell Design                           |                                         | Full Custom Design |            |
| Resolution                | 5 bit               | Gates                                          | 306,963                                 | Devices            | 11,838     |
| Number of transistors     | 3378                | Std. cells                                     | 121,576                                 | Max. frequency     | 10 GHz     |
| Chip size                 | 5.4 mm <sup>2</sup> | Max. frequency                                 | 625 MHz                                 | Supply voltages    | 1.8 V,1.2V |
| Supply voltage            | -4 V, 1.8 V         | Supply voltage                                 | 1.2 V                                   | Power consumpt     | ion 1.5 W  |
| Measured power consumpt   | ion 2.7 W           | Power consumption 0.5 W                        |                                         |                    |            |
| Measured full scale range | 410 mV              |                                                |                                         |                    |            |
| Measured DNL              | < ± 0.45 LSB        |                                                |                                         |                    |            |
| Measured INL              | < ± 0.35 LSB        | Combined Standard Cell and Full Custom Designs |                                         |                    |            |
| Sampling frequency        | > 10 GHz            | Power                                          |                                         |                    | 2 W        |
| Measured input bandwidth  | > 5GHz              | Technology                                     |                                         | 130 nm bulk        | ( CMOS     |
|                           |                     | Size                                           |                                         | 15.73              | 37 mm²     |
|                           |                     | Pads                                           | 000000000000000000000000000000000000000 |                    | 146        |
|                           |                     | Supply voltages                                |                                         | 1.2 \              | V, 1.8 V   |
|                           |                     |                                                | 0000                                    |                    |            |

### 5-bit 10-GS/s analog-to-digital converter



| technology                                       | 0.25 µm SiGe:C BiCMOS  |  |  |
|--------------------------------------------------|------------------------|--|--|
| resolution                                       | 5 bit                  |  |  |
| number of transistors                            | 3378                   |  |  |
| chip size                                        | 5.4 mm <sup>2</sup>    |  |  |
| supply voltages                                  | -4 V<br>+1.8 V         |  |  |
| measured power consumption                       | 2.7 W                  |  |  |
| measured full scale<br>range (V <sub>FSR</sub> ) | 410 mV                 |  |  |
| measured DNL                                     | < ± 0.45 LSB           |  |  |
| measured INL                                     | < ± 0.35 LSB           |  |  |
| sampling frequency                               | > 10 GHz               |  |  |
| measured input<br>bandwidth                      | > 5 GHz (10 GSymbol/s) |  |  |
| measured SNR                                     | up to 30 dB            |  |  |

UNIVERSITÄT PADERBORN

20

O. Adamczyk et al., Electron. Lett., Vol. 44, No. 15, July 2008, pp. 895-896

# **Digital signal processing unit**





|                          | Full-custom     | Standard-cell | ASIC            |
|--------------------------|-----------------|---------------|-----------------|
| Complexity [transistors] | 11,838          | 1,216,000     | 1,227,838       |
| Area [mm <sup>2</sup> ]  | 5.952           | 5.34          | 15.737          |
| Frequency [MHz]          | 5,000 half-rate | 625           | 5,000 half-rate |
| Power Supply [V]         | 1.8             | 1.2           | 1.8, 1.2        |

# **Digital signal processing unit**



|                          | Full-custom     | Standard-cell | ASIC            |
|--------------------------|-----------------|---------------|-----------------|
| Complexity [transistors] | 11,838          | 1,216,000     | 1,227,838       |
| Area [mm <sup>2</sup> ]  | 5.952           | 5.34          | 15.737          |
| Frequency [MHz]          | 5,000 half-rate | 625           | 5,000 half-rate |
| Power Supply [V]         | 1.8             | 1.2           | 1.8, 1.2        |

## **Components**



5-bit 10 Gsample/s flash A/D converter chip Size: 2.1 mm×2.55 mm 0.25µm SiGe



CMOS ASIC 4.1 mm×4.1 mm 130 nm bulk CMOS



Co-packaged module Ceramic substrate 8.5 cm×6.0 cm

# 10 Gb/s polarization-multiplexed QPSK transmission



 no x-talk: SOP is manually adjusted, that the polarization cross-talk is minimized.

 $\rightarrow$  Switching noise is minimized.

 x-talk: SOP is manually adjusted, that the polarization cross-talk is maximized.

 $\rightarrow$  Switching noise is maximized.

 50 rad/s: SOP is scrambled with 50 rad/s on the Poincaré sphere.
 → Switching noise is the average value of best and worst case.

### **Experimental transmission setup**



# Measurement results – fast polarization changes & receiver sensitivity penalty



## **Measurement results – polarization-dependent loss**





28

UNIVERSITÄT PADERBORN



# Conclusion

General real-time requirements for the DSPU:

Parallelization Delay tolerance Hardware Efficiency

- Angle-based phase recovery concept (barycenter): simple, linewidth-tolerant
- Polarization diversity with automatic polarization demultiplex
- Realtime coherent receiver implementation: SiGe ADC, CMOS DSPU
- Test results: 10 Gb/s, 40 krad/s

### Acknowledgement

European Commission FP6 contract 004631 http://ont.upb.de/synQPSK



**synQPSK** Univ. Paderborn, Germany CeLight Israel Photline, France IPAG, Germany Univ. Duisburg-Essen, Germany

# Thank you for your attention!