How to Build the Ultimate Deep Learning Beast in the Post-Moore Era
Over the past few years we have witnessed the enormous effort of the semiconductor industry in its attempt to tackle the fundamental limits of the post Moore’s Law era, where development capabilities are no longer improving as they have in the past few decades. The endeavor to achieve higher performance-per-watt has driven hardware architects to seek various solutions outside the scope of conventional general-purpose CPUs and GPUs. Several heterogeneous solutions and even the search for analog and neuromorphic accelerators have emerged as promising solutions, but still far from overcoming the major bandwidth and energy bottleneck for the development of future exascale solutions and beyond.
It is in this scenario that solutions in photonics are emerging as some of the most promising for overcoming these challenges. The advent of silicon photonics, which allowed for cost-effective integration of optical components based on CMOS manufacturing, has been one of the major catalysts for chip-scale photonic solutions, both in interconnections and in accelerators. In particular, over the past 30 years several proposals for photonic accelerators for deep learning have emerged providing unprecedented levels of energy efficiency and parallelism, achieving in some cases orders of magnitude better performance compared to the most energy efficient electronic accelerators today.
More recently, startup Lightmatter introduced Envise, the world’s first general-purpose AI photonic accelerator, promising unprecedented levels of performance and energy per operation: 3x higher inferences per second than the Nvidia DGX-A100 with 7x the inferences/s/W on BERT-Base with the SQUAD dataset.
As astonishing as it may seem, this announcement is just the first step in a photonic revolution that is coming to the AI market. Looking at the patents recently developed by Intel we can get a real sense of some fundamental breakthroughs that are on the way.
The new Intel AI photonic accelerator
Earlier this month two new Intel patents have been published that reveal how the company plans to overcome Nvidia and other competitors in the AI market by developing their own photonic accelerator and building a complete heterogeneous system by integrating this new accelerator into their Xeon processors.
The first patent describes a method of implementing a 2×2 optical unitary matrix multiplier, which serves as the building block for the photonic acceleration of optical neural networks. This method was initially proposed in 1994 by Reck and Zeilinger in their famous article “Experimental Realization of Any Discrete Unitary Operator”, where the authors proposed the use of Mach-Zehnder interferometers (MZI) to perform the linear transformation of the data encoded in the laser pulses and optical attenuators which scale the data value in a waveguide, thus realizing a complete multiply-accumulate operation.
Through this method it is possible to factor any NxN unitary matrix into a sequence of two-dimensional beam splitter transformations. These are the fundamental bases of both the photonic accelerator proposed by Intel and the photonic processor revealed by Lightmatter. In both cases these photonic elements are organized in high programmable meshes in such a way as to perform MAC operations.
The second patent shows several possibilities for implementing this photonic accelerator which can be used in different applications, but the one that deserves special attention here is the proposal to integrate this photonic accelerator together with a Xeon processor. The proposed heterogeneous accelerator is formed by the photonic IC (PIC) which includes an Optical Neural Network having multiple layers of optical unitary matrix multipliers and an optical nonlinearity function implemented via nonlinear optical devices, an ASIC (EIC) substantially responsible for all functions required for pre- and post-processing of data provided between the PIC die and the CPU, and by the CPU + SRAM set, in which this combination includes memory in a chiplet or high-efficiency angular division multiplexing (ADM) memory chiplets or a RAMBO memory chiplet. In addition, it is shown in the patent that the EIC can be stacked vertically above or below the PIC and coupled to PIC via wireless or EMIB interconnection.
The watershed of the post-Moore Law era
Perhaps the reader is wondering how powerful this AI photonic accelerator proposed by Intel can be and how Nvidia and AMD can compete against it. What Intel is proposing in their patents is a complete watershed, an unprecedented breakthrough in the development of AI accelerators. Even without confirmation in the present moment of certain technical data about the implementation of this accelerator, it is possible to state based on the principles on which this accelerator is built that we can expect at least the same performance as the Lightmatter photonic accelerator in all terms. In fact, when we look at the integration of this photonic accelerator on the Xeon platform together with the entire AI software development effort at Intel, we can surely expect a better performance and software integration than the Lightmatter solution will be able to offer. Through my research in multiple patent databases it possible to state that neither Nvidia nor AMD have any solution minimally capable of competing directly against this photonic behemoth.
By integrating this photonic accelerator into its processors, Intel may be able to develop the Ultimate Deep Learning beast for the Post-Moore Era, surely capable of achieving unprecedented levels of performance and energy footprint. At this point we are left wondering how Intel’s competitors will react once this monstrous device is released, who among them will try to develop their own photonic solution, and perhaps most importantly, who will try to acquire Lightmatter first.
editor’s note: NVidia’s chief scientist Bill Dally recently revealed a photonics DGX AI system prototype currently in development at the company, the details of which are limited but look to be mostly focused on signaling and the interconnects. Source: youtube
Some references and reading recommendations:
- Lightmatter press release – Introducing Envise, Idiom and Passage – Next Generation AI Compute, Compile and Interconnect Platforms [Link]
- US20210063645 – 2×2 optical unitary matrix multiplier – Wenhua Lin – Intel [Link]
- US20210064958 – Heterogeneously integrated optical neural network accelerator – Lin et al. – Intel [Link]
- Michael Reck, Anton Zeilinger, Herbert J. Bernstein, and Philip Bertani – “Experimental realization of any discrete unitary operator” – Phys. Rev. Lett. 73, 58 (1994)