PicoZ

Haswell GPU Architecture & Iris Pro

In 2010, Intel’s Clarkdale and Arrandale CPUs dropped the GMA (Graphics Media Accelerator) label from its integrated graphics. From that point on, all Intel graphics would be known as Intel HD graphics. With certain versions of Haswell, Intel once again parts ways with its old brand and introduces a new one, this time the change is much more significant.

Intel attempted to simplify the naming confusion with this slide:

While Sandy and Ivy Bridge featured two different GPU implementations (GT1 and GT2), Haswell adds a third (GT3).

Basically it boils down to this. Haswell GT1 is just called Intel HD Graphics, Haswell GT2 is HD 4200/4400/4600. Haswell GT3 at or below 1.1GHz is called HD 5000. Haswell GT3 capable of hitting 1.3GHz is called Iris 5100, and finally Haswell GT3e (GT3 + embedded DRAM) is called Iris Pro 5200.

The fundamental GPU architecture hasn’t changed much between Ivy Bridge and Haswell. There are some enhancements, but for the most part what we’re looking at here is a dramatic increase in the amount of die area allocated for graphics.

All GPU vendors have some fundamental building block they scale up/down to hit various performance/power/price targets. AMD calls theirs a Compute Unit, NVIDIA’s is known as an SMX, and Intel’s is called a sub-slice.

In Haswell, each graphics sub-slice features 10 EUs. Each EU is a dual-issue SIMD machine with two 4-wide vector ALUs:

Low Level Architecture Comparison
 AMD GCNIntel Gen7 GraphicsNVIDIA Kepler
Building BlockGCN Compute UnitSub-SliceKepler SMX
Shader Building Block16-wide Vector SIMD2 x 4-wide Vector SIMD32-wide Vector SIMD
Smallest Implementation4 SIMDs10 SIMDs6 SIMDs
Smallest Implementation (ALUs)6480192

There are limitations as to what can be co-issued down each EU’s pair of pipes. Intel addressed many of the co-issue limitations last generation with Ivy Bridge, but there are still some that remain.

Architecturally, this makes Intel’s Gen7 graphics core a bit odd compared to AMD’s GCN and NVIDIA’s Kepler, both of which feature much wider SIMD arrays without any co-issue requirements. The smallest sub-slice in Haswell however delivers a competitive number of ALUs to AMD and NVIDIA implementations.

Intel had a decent building block with Ivy Bridge, but it chose not to scale it up as far as it would go. With Haswell that changes. In its highest performing configuration, Haswell implements four sub-slices or 40 EUs. Doing the math reveals a very competent looking part on paper:

Peak Theoretical GPU Performance
 Cores/EUsPeak FP ops per Core/EUMax GPU FrequencyPeak GFLOPs
Intel Iris Pro 5100/520040161300MHz832 GFLOPS
Intel HD Graphics 500040161100MHz704 GFLOPS
NVIDIA GeForce GT 650M3842900MHz691.2 GFLOPS
Intel HD Graphics 460020161350MHz432 GFLOPS
Intel HD Graphics 400016161150MHz294.4 GFLOPS
Intel HD Graphics 300012121350MHz194.4 GFLOPS
Intel HD Graphics 20006121350MHz97.2 GFLOPS
Apple A6X328300MHz76.8 GFLOPS

In its highest end configuration, Iris has more raw compute power than a GeForce GT 650M - and even more than a GeForce GT 750M. Now we’re comparing across architectures here so this won’t necessarily translate into a performance advantage in games, but the takeaway is that with HD 5000, Iris 5100 and Iris Pro 5200 Intel is finally walking the walk of a GPU company.

Peak theoretical performance falls off steeply as soon as you start looking at the GT2 and GT1 implementations. With 1/4 - 1/2 of the execution resources as the GT3 graphics implementation, and no corresponding increase in frequency to offset the loss the slower parts are substantially less capable. The good news is that Haswell GT2 (HD 4600) is at least more capable than Ivy Bridge GT2 (HD 4000).

Taking a step back and looking at the rest of the theoretical numbers gives us a more well rounded look at Intel’s graphics architectures :

Peak Theoretical GPU Performance
 Peak Pixel Fill RatePeak Texel RatePeak Polygon RatePeak GFLOPs
Intel Iris Pro 5100/520010.4 GPixels/s20.8 GTexels/s650 MPolys/s832 GFLOPS
Intel HD Graphics 50008.8 GPixels/s17.6 GTexels/s550 MPolys/s704 GFLOPS
NVIDIA GeForce GT 650M14.4 GPixels/s28.8 GTexels/s900 MPolys/s691.2 GFLOPS
Intel HD Graphics 46005.4 GPixels/s10.8 GTexels/s675 MPolys/s432 GFLOPS
AMD Radeon HD 7660D (Desktop Trinity, A10-5800K)6.4 GPixels/s19.2 GTexels/s800 MPolys/s614 GFLOPS
AMD Radeon HD 7660G (Mobile Trinity, A10-4600M)3.97 GPixels/s11.9 GTexels/s496 MPolys/s380 GFLOPS

Intel may have more raw compute, but NVIDIA invested more everywhere else in the pipeline. Triangle, texturing and pixel throughput capabilities are all higher on the 650M than on Iris Pro 5200. Compared to AMD's Trinity however, Intel has a big advantage.

ncG1vNJzZmivp6x7orrAp5utnZOde6S7zGiqoaenZIN6hZJooKeslaF6qr7IrGSpqp9ignN8j2aeq5mgnbakv4yrnK%2Bhlax6pLvRnmSib2RugnG00GarnqukmrFwfg%3D%3D

Jenniffer Sheldon

Update: 2024-08-13