Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules
Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules
Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules Enabling The Next Generation of Cloud & Ai Using 800Gb/S Optical Modules
ENABLING THE
THE NEXT
NEXT GENERATION
GENERATION
OF
OF CLOUD
CLOUD &
& AI
AI USING
USING 800GB/S
800GB/S
OPTICAL
OPTICAL MODULES
MODULES
PROMOTERS:
Contents
1. Cloud Expansion Sets Pace for Optical Modules 01
Cloud applications, AR/VR, AI, and 5G application generate more and more traffic. The explosive growth of traffic requires higher
bandwidth. As shown in Figure 1, global interconnection bandwidth capacity will grow at a 48% CAGR in 2017 - 2021.
WORLDWIDE GROWTH
10,000(Tbps)
4,000
2,000
0
2017 2018 2019 2020 2021
US. EU AP LATAM
As shown in Figure 2, market analysts are projecting a first adoption of 400G Datacom modules in 2020 with a larger adoption
of 2x400G/800G modules in 2022-23.
$7,000
$6,000
LIGHTCOUNTING
$5,000
Sales ($M)
$4,000
$3,000
$2,000
$1,000
$-
2020 2021 2022 2023 2024
100G 200G 400G 2x400G
Figure 2 – Projection of the market revenue for datacom modules (Source: Light Counting)
01 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
“Our LightCounting Forecast model indicates that operators of Cloud datacenters will need to deploy 800G optics by 2023-2024
to keep up with the growth of data traffic,” stated founder and CEO of LightCounting Market Research, Vladimir Kozlov, PhD.
“Most of 800G will be still pluggable transceivers, but we expect to see some implementation of co-packaged optics as well.”
Data center cloud architectures are being paced by the capacity scaling of switching ASICs, which is doubling approximately
every two years, unfazed by the talk about the end of Moore’s Law. Today, 12.8Tb/s Ethernet switching chips are being
commercially deployed with first chip design firms already prototyping 25.6Tb/s silicon for deployment next year, as shown in
Figure 3. This puts further pressure onto the densification of optical interconnects, which do not scale at the speed of CMOS due
to the lack of a common design methodology across the various components and a common large scale process.
In the past few years, the rapid expansion of cloud services was fueled by the rapid adoption and price erosion of 100G short
reach optical modules based on direct detection technology and non-return to zero
(NRZ) modules. After the beginning of the 400GbE Bandwidth Assessment activity
in IEEE in March 2011, initial deployment of 400G optical modules is finally
starting in 2020 with a stronger ramp projected for 2021, as shown in Figure 2.
In fact, in the initial use cases, 400G modules will be mainly used to transport
4x100G over 500m in DR4 application and 2x200G FR4 optics over 2km, not
making use of the 400GbE MAC. At the same time, it seems unlikely that
IEEE would soon standardize the next generation of optics, such as 800GbE,
meaning that the standardization of higher density optics for the transport of
8x100GbE or 2x400GbE for the 25.6Tb/s and 51.2Tb/s switching generations
would be well behind actual market timeline requirements of 2021-22. This
raises the need for 800G industry interoperability outside of the established
standard bodies.
800G
QSFP112-DD&OSFP
400G 32x@1U/64x@2U
Ethernet switching chip capacity
QSFP56-DD&OSFP
32x@1U 25.6T/51.2T
100G QSFP28
64x@2U 12.8T 256/512 Lanes
100G QSFP28
32x@1U 6.4T 256 Lanes
10G Serdes
40G QSFP+
32x@1U 3.2T 256 Lanes 25G Serdes
50G Sedes
1.28T 128 Lanes
100G Serdes
128 Lanes
www.800Gmsa.com 02
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
One can derive at least two main types of typical data centers architectures. Figure 4 shows the common abstraction of a hyper-
scale data center and its optical interconnect roadmap. In general, these architectures are larger, have a certain convergence
from layer to layer, e.g. 3:1, and rely on coherent ZR interconnects at the Spine layer. An important boundary constraint for 800G
networking in this case is that 200G interconnects, albeit not serial, are used at the server to TOR layer, whereas the TOR-leaf/
spine layer would typically rely on PSM4 4x200G in a fan-out configuration.
DC
Typical Optical module evolution
Spine .....
.....
Leaf
Scenario2 40G QSFP+ 100G QSFP28 400G Q-DD 800G
Scenario2
(Leaf-TOR) SR4 SR4/PSM4 SR8/DR4 PSM8/4
TOR .....
Server
2012 2016 2019 2022
For the typical hyper scale data center network (DCN), deploying 200G servers will require an 800G fabric. It’s a traffic
convergence network, which depends on the balance between service requirements and Capex optimization. Table 1 shows the
detailed reach requirements depending on the DCN layer.
03 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
Figure 5 shows the data center network architecture of an AI cluster, with less layers than the hyper scale network due to the
fact that it lacks any convergence between the layers. The design of an AI cloud implies different traffic flows with much larger
big data flows and less frequent switching.
.....
Spine
Optical module rate evolution(AI/HPC Cluster)
Scenario2
Scenario2
400G PSM4 800G PSM8
(Spine-Leaf)
.....
Leaf
Scenario1
Scenario1
2*200GE 2*400GE
(Leaf-Server)
Server
2019 2021
For the AI/HPC cluster DCN, deploying 400G servers will require an 800G fabric. This DCN doesn’t have any traffic convergence,
with faster deployment than in the case of Figure 4. Table 2 shows the detailed requirements.
4m within rack;
Distance 500m
20m cross-rack
Not explicitly shown, but also relevant, are DC networks for smaller cloud or enterprises, where the downstream to the server is
decoupled from the fan-out rates of the Leaf-Spine layer and typically has slower server interconnect speeds.
www.800Gmsa.com 04
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
RECONCILIATION
8x100G
400GMII 400GMII
400GBASE-R 400GBASE-R
PCS PCS
DSP
PMA PMA
PMA PMA
800G MSA
PMD PMD
05 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
Table 3 – Fiber channel bandwidth and transmission distance of MMF reckoned by the theoretical model used in IEEE
Fiber channel
Transmission
Bit rate Signal Type Fiber Type bandwidth IEEE standards
Distance (m)
(GHz•km)
In order to guarantee the advantages on the cost and power consumption of the SMF based solution, reasonable PMD
standard requirements are indispensable in 800G-SR8. The PMD requirements to be defined should ensure that 1) diverse
transmitter techniques, such as DML, EML, and silicon photonics (SiPh), can be applied in such scenario; 2) all the potential of
the components can be released adequately to achieve the targeting link performance; 3) key parameters in PMD layers should
be relaxed as much as possible, in the context of maintaining a reliable link performance. According to these three principles, we
will conduct some brief investigations and discussions as follows.
The power budget of the SMF based 800G-SR8 solution would be quite similar with that defined in IEEE 400G-SR8. The only
issue to be determined is the insertion loss of new defined PSM8 SMF connectors. It means that the power budget in SR
scenario can be achieved without a hitch based on currently mature optical and electronic components and DSP ASICs used
in 400GE optical interconnection. Therefore, apart from specifying the connector for the PSM8 modules, the key issue for the
definition of PMD parameters in 800 SR8 scenario is to find out the suitable optical modulation amplitude (OMA), extinction
ratio (ER), transmitter dispersion eye closure quaternary (TDECQ) of the transmitter and sensitivity of receiver. In order to set
these parameters into the suitable position, the bit error ration (BER) performance of the diverse transmitters is investigated and
assessed.
EML BER vs. OMA SiPh. BER vs. OMA DML BER vs. OMA
1.00E-02 1.00E-02 1.00E-02
1.00E-03 1.00E-03 1.00E-03
1.00E-04 1.00E-04 1.00E-04
1.00E-05 1.00E-05 1.00E-05
BER
BER
BER
Figure 7 – (a) EML BER vs. OMA results based on commercial available 400G DSP ASICs; (b) Silicon Photonics BER vs. OMA results
based on commercial available 400G DSP ASICs, (c) DML BER vs. OMA results based on commercial available 400G DSP ASICs
Figure 7 shows three BER vs OMA curves of 100Gbps PAM4 signal, which correspond to different transmitter technologies
respectively, as online results and obtained using commercial 400G DSP ASICs. Actually, the BER performances of EML and SiPh
for 100G per lane illustrated in Figure 7 (a) and (b) are well-known results since these two solutions have been extensively
discussed in the past few years. Considering relatively low launching optical power of SiPh transmitter and good enough
sensitivity of all three solutions, the minimum OMA requirement in 800G-SR8 is recommended to be relaxed appropriately.
www.800Gmsa.com 06
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
The BER performance of the DML in Figure 7 (c) shows that the OMA sensitivity in this case is comparable with that in the case
of EML or SiPh, even though the commercial DML used in here has relatively lower bandwidth than EML and SiPh. This result
implies that the commercial DSP ASICs used in practice have much stronger equalization ability than the reference receiver IEEE
defined in 400GE, and thus it can support the transmitter with comparatively low bandwidth to achieve the targeting power
budget 800G-SR required. In order to release the potential of the DSP unit adequately for 800G SR8 PMD, reference receiver for
compliance test (i.e. TDECQ) requires to be re-defined to match the practical equalization ability of commercial DSPs, i.e. more
taps numbers than currently defined 5 taps are desired. Meanwhile, considering the relatively low sensitivity requirement in SR
scenario and restriction of the power consumption of the 800G module, a low complexity DSP mode is recommended in future
modules.Another key parameter is ER that is related to the power consumption directly. A lower ER is favored as long as it does
not impact the reliability of the link. Based on the above analysis, we believe that a low cost and power consumption SMF-
based solution is feasible and promising in 800G-SR8 scenario.
RECONCILIATION
8x112G 8x112G
400GMII 400GMII
400GBASE-R 400GBASE-R
PCS PCS
DSP DSP
PMA PMA
MEDIUM
07 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
Since SNR deteriorates about 3 dB compared with 100G/lane as the baud rate doubles, it is expectable that a stronger FEC is
necessary to maintain the reasonable receiver sensitivity (~-5dBm) and margin of error floor. Therefore, as mentioned above,
on the top of the KP4, an additional low power, low latency FEC as a wrapper will be carried out in the optical module. The
threshold value of the new FEC is determined according to the link performance and power budget requirement.
Link performance of 200G/pane is presented using simulation and experiment. The parameters of the devices adopted in the
link are listed in Table 5. The experimental result shows that the receiver sensitivity can reach the target value while the new
FEC’s threshold is set to 2E-3 as depicted in Figure 9 (a). However, in this experiment, maximum likelihood sequence estimation
(MLSE) was required to compensate the excessive inter-symbol-interference induced by channel bandwidth limitation. The
dash line in Figure 9 (a) shows the simulation based on the model in which the measured parameters of the devices used in
the experiment are adopted. Together with experimental results, simulations show that the system is limited by the bandwidth
of components, such as AD/DA, driver and E/O modulators. Considering that high bandwidth components are expected to be
available in the years to come, simulation results by using the same system model but with expanded bandwidth is illustrated in
Figure 9 (b). It shows the receiver sensitivity of @ 2E-3 can meet the above-mentioned requirement with only FFE equalization
in the DSP unit, which is in accordance with the theoretical expectation.
www.800Gmsa.com 08
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
Based on the above analysis, TDECQ is still suggested to be followed in compliance testing in the 800G-FR4 scenario. However,
FFE tap numbers of the reference receiver adopted in TDECQ measurement is anticipated to be increased to a reasonable
value and needs to be further discussed. Additionally, it should be noted that if the ability of future device targeting 100Gbaud
underperforms our expectation, more complicated algorithms (e.g. MLSE) may be used in FR4 scenarios, which implies that a
new compliance metrology must be developed.
1.00E-03 1.00E-03
1.00E-04 1.00E-04
1.00E-05 1.00E-05
-8 -6 -4 -2 0 2 4 -8 -6 -4 -2 0 2 4
OMA (dBm) OMA (dBm)
(a) (b)
Figure 9 – (a) 200G/lane experiment and simulation results match well with each other; (b) 200G/lane simulation result: FFE
equalization can meet the requirement of power budget when component bandwidth in link is improved.
-24
Solution B
-26
-28
-30 Solution A
-32
-34
0 10 20 30 40 50 60 70 80
freq.GHz
Figure 10 – Two packaging solutions for the transmitter. The S21 simulation puts the RF line (marked in red), the wire-bonding and
modulator into consideration, and the bandwidth@-3dB of the EML COC is 60GHz.
09 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
At the receive side, the high bandwidth photodiode (PD) with less parasitic capacitance and the high bandwidth trans-
impedance amplifier (TIA) are needed to ensure the bandwidth performance of the receiver. There is no obstacle to realizing
these components by the state-of-art semiconductor technology. As far as we know, some stakeholders in industry already put
much effort in developing these components that are desired to be available in 1~2 years. On the other hand, the connection
between PD and TIA is also critical. The parasitic effect in the connection always degrades the performance and thus should be
carefully analyzed and optimized.
4.4 Forward error correction (FEC) code for 200G per lane
A stronger FEC with a threshold performance of 2E-3 is required to achieve the sensitivity requirement of 200G PAM receiver.
Figure 11 illustrates a comparison between terminated scheme and concatenated scheme. In the first option, KP4 is terminated
and replaced with a new FEC with larger overhead. Termination has advantages on NCG and overhead. In the second option, a
concatenated scheme keeps KP4 as the outer code and combines it with a new inner code. Concatenation has advantages on
latency and power consumption and is more suitable in 800G-FR4 application scenario.
100G/lane AUI
New
100G/lane AUI C2M
Legacy Legacy
C2M C2C
KP4
KP4
800G 800G
8
C2M
Concatenated
Concatenated
KP4&new FEC
Serial concatenation of KP4 and an algebraic code shown in Figure 12 is a straight forward solution to achieve 2E-3 BER
threshold performance, as well as to minimize the power consumption and end-to-end latency, since KP4 is not terminated.
Noise with bit error rate p e < 1E-5 introduced in C2M electrical interface is transparent to PMA. The overall performance of
the concatenated scheme will not be deteriorated by p e since p e is much lower than the decoding threshold of KP4. Hamming
codes with single error correcting capability and BCH codes with double error correcting capability are good candidates for the
algebraic code in this concatenated scheme. The overhead of these two inner code candidates is ~6%. With a simple soft-in
hard-out (SIHO) Chase decoding algorithm of about 64 test patterns, both Hamming and BCH codes can achieve BER threshold
better than 2E-3. The symbol distribution defined in 400GBASE-R is inherently an interleaver thus can serve as interleaver (πe).
Interleaver (πo) with ~10k bit latency is good enough to decorrelate the noise introduced in the optical medium.
www.800Gmsa.com 10
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
Electrical Optical
PCS PMA PMD
Interface Interface
Algebraic
KP4 + PAM-4
Encoder
πe Code πo Mapper
Encoder
1+α·D
Pe
+ AWGN
Algebraic
KP4 Equalizer &
πe-1 + Code πo-1 De-Mapper
Decoder
Decoder
Pe
Figure 12– Block diagram of concatenated FEC scheme of KP4 and algebraic code
11 www.800Gmsa.com
Enabling The Next Generation Of
Cloud & Ai Using 800Gb/s Optical Modules
The 800G Pluggable MSA targets to release first specifications in Q4/2020, with several subcomponents targeted in the MSA
already being prototyped and first 800G modules expected to sample in 2021. With the 400GbE generation ready to be rolled
out in the market, 800G pluggable modules will leverage this new eco-system and offer higher density and cost-optimized
100G/lane and 200G/lane interconnects for the next generation of 25.6T and 51.2T switches.
Looking beyond 800G towards 1.6T, the industry begins to see the possible limitations of pluggable modules. SerDes for
C2M interconnects is unlikely to scale to 200G/lane using classical PCBs, which might require bringing analog electronics and
optics closer to the switching ASIC. But whether the path is leading to co-packaged optics, on-board optics or an evolution of
pluggables, we believe that 200G/lane interconnects defined in this MSA, will be an essential building block of the 800G and 1.6T
interconnect generations.
www.800Gmsa.com 12
About Us
The 800G Pluggable MSA group was formed in September 5, 2019 and promotes a
joint industry exchange and collaboration between data center operators and vendors of
infrastructure equipment, optical modules, optoelectronic chips, and connectors.
It focuses on the data center network interconnection scenario, targeting to determine the
optimal interconnect architecture, define interface specifications of the 800G pluggable
optical modules, build the ecosystem, and guide healthy development of the industry.