IBM has developed an AI chip with internal storage, and it’s an order of magnitude faster than Nvidia accelerators

IBM's AI chip is faster than Nvidia's

IBM said it has completed testing of a new prototype processor for artificial intelligence tasks. The new development, codenamed NorthPole, was 4,000 times better than the company’s previous AI architecture, TrueNorth, and “mind-blowingly” outperformed all of the most advanced CPUs and GPUs.

The NorthPole chip is manufactured using the 12nm process and contains 22 billion transistors on an area of 800mm2. It is actually a “network on a chip” – this processor contains 256 cores with a branched interface and on-chip memory. It is the on-chip memory that enables the processor to achieve industry-leading power efficiency, latency reduction and effective area.

In a single clock cycle, the NorthPole processor performs 2048 operations per core (with 8-bit precision). For 4-bit and 2-bit precision, the number of operations performed is doubled and quadrupled, respectively. Such ability is aimed, first of all, at image processing. More precisely – for digital machine vision, and these are autopilots, autosurgeons, and so on.

The bottleneck of von Neumann architecture was and still is the separation of memory and processor. IBM developers overcame this obstacle when they created a processor that stores all data in itself without sending it to external memory devices.

Testing on the ResNet50 model, which is a 50-layer neural network for testing solutions for image recognition and classification, showed that the NorthPole chip’s power efficiency is 25 times better than that of conventional 12nm GPUs and 14nm CPUs. It was also 22 times better in terms of latency, which was lower than IBM’s chip. The developers called this a “mind-blowing” result. Finally, in terms of utilised chip area (number of transistors), IBM’s architecture also outperformed all competitors, including even 4nm GPUs.

About the author