Your device’s processor performs millions of calculations every second and is responsible for running your computer. Working with the CPU is the Arithmetic Processing Unit (ALU), which is responsible for mathematical tasks and is controlled by CPU microcode.
Now this CPU microcode is not static and can be improved, and one such improvement is the Intel AVX-512 instruction set. However, Intel intends to kill the AVX-512 by permanently removing its functionality from its processors. But why? Why is Intel killing the AVX-512?
How does ALU work?
Before getting to know the AVX-512 instruction set, it is important to understand how the ALU works.
As the name suggests, the arithmetic processor is used to perform mathematical tasks. These tasks include operations such as addition, multiplication, and floating point calculations. To perform these tasks, the ALU uses a dedicated digital circuit that is controlled by the CPU clock.
Therefore, the CPU clock rate determines the speed at which instructions are processed in the ALU. So, if your CPU is running at 5 GHz, the ALU can process 5 billion instructions in one second. For this reason, CPU performance improves as clock speed increases.
However, as the processor clock speed increases, the amount of heat generated by the processor also increases. For this reason, power users use liquid nitrogen when overclocking their systems. Unfortunately, this increase in temperature at high frequencies does not allow processor manufacturers to increase the clock speed above a certain threshold.
So how does a new generation processor provide better performance than older versions? Well, CPU manufacturers use the concept of parallelism to improve performance. This parallelism can be achieved through the use of a multi-core architecture, which uses several different processing cores to increase the processing power of the CPU.
Another way to improve performance is to use the SIMD instruction set. Simply put, a Single Instruction Multiple Data instruction allows an ALU to execute the same instruction at different data points. This type of parallelism improves CPU performance, and AVX-512 is a SIMD instruction used to improve CPU performance when performing certain tasks.
How does data get into the ALU?
Now that we have a basic understanding of how the ALU works, we need to understand how data gets into the ALU.
To reach the ALU, the data must pass through different storage systems. This data path is based on the memory hierarchy of the computing system. A brief overview of this hierarchy is given below:
- Secondary memory: The secondary memory on the computing device consists of read only memory. This device can store data permanently, but not as fast as a central processing unit. Because of this, the CPU cannot access data directly from the secondary storage system.
- Main memory: The primary storage system consists of random access memory (RAM). This storage system is faster than the secondary storage system, but cannot store data permanently. So when you open a file on your system, it moves from your hard drive to RAM. However, even the RAM is not enough for the processor.
- Cache: Cache memory is built into the CPU and is the fastest memory system on a computer. This memory system is divided into three parts, namely L1, L2 and L3 cache. Any data that needs to be processed by the ALU is moved from the hard disk to RAM and then to the cache. However, the ALU cannot access data directly from the cache.
- Processor registers: The CPU register on a computing device is very small, and depending on the architecture of the computer, these registers can hold 32 or 64 bits of data. Once the data is moved into these registers, the ALU can access it and perform its task.
What is AVX-512 and how does it work?
The AVX 512 instruction set is the second iteration of AVX and was introduced to Intel processors in 2013. Short for Advanced Vector Extensions, the AVX instruction set was first introduced in the Intel Xeon Phi (Knights Landing) architecture and later ported to the Intel server. processors in Skylake-X processors.
In addition, the AVX-512 instruction set found its way into Cannon Lake consumer systems and was later supported by the Ice Lake and Tiger Lake architectures.
The main purpose of this set of instructions was to speed up tasks related to data compression, image processing, and cryptographic computing. Offering twice the processing power of earlier versions, the AVX-512 instruction set provides a significant performance boost.
So how did Intel double the performance of its processors using the AVX-512 architecture?
Well, as explained earlier, the ALU can only access the data present in the CPU’s register. The Advanced Vector Extensions instruction set increases the size of these registers.
Because of this increase in size, the ALU can process multiple data points in a single instruction, improving system performance.
In terms of register size, the AVX-512 instruction set offers thirty-two 512-bit registers, double the size of the older AVX instruction set.
Why is Intel ending the AVX-512?
As explained earlier, the AVX-512 instruction set offers several computational advantages. In fact, popular libraries such as TensorFlow use the instruction set to provide faster computations on processors that support the instruction set.
So why is Intel disabling AVX-512 on their latest Alder Lake processors?
Well, Alder Lake processors are different from the older ones made by Intel. While older systems used cores running on the same architecture, Alder Lake processors use two different cores. These cores in Alder Lake processors are known as P and E cores and are based on different architectures.
While the P kernels use the Golden Cove microarchitecture, the E kernels use the Gracemont microarchitecture. This difference in architectures prevents the scheduler from working correctly when certain instructions can be executed on one architecture but not on another.
In the case of Alder Lake processors, the AVX-512 instruction set is one such example, since P-cores have the hardware to process the instruction, while E-cores do not.
For this reason, Alder Lake processors do not support the AVX-512 instruction set.
However, the AVX-512 instruction may work on some Alder Lake processors where Intel has not physically disabled them. To do the same, users must disable E-cores during BIOS.
Is AVX-512 needed on consumer chipsets?
The AVX-512 instruction set increases the CPU register size to improve CPU performance. This performance improvement allows the CPU to process numbers faster, allowing users to run video/audio compression algorithms at higher speeds.
However, this performance improvement can only be observed when the instruction defined in the program is optimized to work with the AVX-512 instruction set.
For this reason, instruction set architectures such as the AVX-512 are more suitable for server workloads, and consumer-grade chipsets can operate without complex instruction sets such as the AVX-512.