Q – Why do consoles massively outperform their PC equivalent hardware?
A – Because they use a streamlined software stack and optimised hardware.
Let’s ignore the former for it doesn’t tell us anything interesting, for while every real console (i.e. not ouyu), will seek optimised software and hardware only the latter will help us with the big questions. Consoles are machines specifically designed to pump out graphics so the determining factor will be the GPU, and from a console vendors perspective what is the prime consideration when choosing a GPU?
Price/performance. A combination of the number of transistors and their clock speed.
1. The number of transistors necessary to turn a combination of instruction inputs into a visual screen output
2. The speed at which the transistors must be run in order to achieve the desired performance, and the wattage that requires
The cost of a GPU is in large part determined by the number of units that will fit on the silicon wafer for production, vs the number of resulting units on the silicon wafer that survive the manufacturing process and can function at the desired level of performance.
This is why we have mid-range GPU’s that are either smaller physical designs, run at less demanding clock-speeds, have a reduced number of functions, or a combination of all three.
Doesn’t everyone try to make cost-efficient products that maximise price/performance?
Sure, but the key point here is that a GPU for use in a computer does far more than merely pump out graphics, and so it is the compromise between these competing tasks that will determine the price/performance of pure graphics tasks between competing vendors.
The most important alternate task in recent years, and hence one that will inform the next-gen consoles, is GPU-compute. To cut a very long story short; adding the ability to perform high performance compute tasks to a GPU requires additional complexity in the shaders and schedulers which requires more transistors and more watts to run.
To quote Ryan Smith of Anandtech talking about the AMD HD 6970 (R800):
Back in the days of yore, when shading was new and pixel and vertex shaders were still separate entities, AMD (née ATI) settled on a VLIW5 design for their vertex shaders. Based on their data this was deemed the ideal configuration for a vertex shader block, as it allowed them to process a 4 component dot product (e.g. w, x, y, z) and a scalar component (e.g. lighting) at the same time.
Fast forward to 2007 and the introduction of AMD’s Radeon HD 2000 series (R600), where AMD introduced their first unified architecture for the PC. AMD went with a VLIW5 design once more, as even though the product was their first DX10 product it still made sense to build something that could optimally handle DX9 vertex shaders. This was also well before GPGPU had a significant impact on the market, as AMD had at best toyed around with the idea late in the X1K series’ lifetime (and well after R600 was started).
Now let us jump to 2008, when Cayman’s predecessors were being drawn up. GPGPU computing is still fairly new – NVIDIA is at the forefront of a market that only amounts to a few million dollars at best – and DX10 games are still relatively rare. With 2+ years to bring up a GPU, AMD has to be looking forward at where things will be in 2010. Their predictions are that GPGPU computing will finally become important, and that DX9 games will fade in importance to DX10/11 games. It’s time to reevaluate VLIW5.
Right about now we should say that the WiiU, Nintendo’s new console for 2012, is using a very efficient VLIW5 GPU provided by AMD using their HD 4xxx series cards (R700). However, what Sony and Microsoft will replace the PS3 and Xbox 360 with is not due for at least another year, and will be a higher-priced offering pushing even further up the PC graphics evolutionary tree.
In the past Nvidia, with its heavy focus on providing GPU-compute via its CUDA toolkit, has provided a lower price/performance ratio for purely graphical tasks.
To quote Derek Wilson of Anandtech talking about the Nvidia GTX 280 (Tesla):
Intel’s Montecito processor (their dual core Itanium 2) weighs in at over 1.7 billion transistors, but the vast majority of this is L3 cache (over 1.5 billion transistors for 24MB of on die memory). In contrast, the vast majority of the transistors on NVIDIA’s GT200 chip are used for compute power. Whether or not NVIDIA has used these transistors well is certainly the most important consideration for consumers, but there’s no reason we can’t take a second to be in awe of the sheer magnitude of the hardware. This chip is packed full of logic and it is huge.
The graphics card is no longer a toy. The combination of CUDA’s academic acceptance as a teaching tool and the availability of 64-bit floating point in GT200 make GPUs a mission critical computing tool that will act as a truly disruptive technology. Not only will many major markets that depend on high performance floating-point processing realize this, but every consumer with an NVIDIA graphics card will be able to take advantage of hundreds of gigaflops of performance from CUDA based consumer applications.
But then we have the question of whether or not you should buy one of these things. As impressive as the GT200 is, the GeForce GTX 280 is simply overpriced for the performance it delivers. It is NVIDIA’s fastest single-card, single-GPU solution, but for $150 less than a GTX 280 you get a faster graphics card with NVIDIA’s own GeForce 9800 GX2. The obvious downside to the GX2 over the GTX 280 is that it is a multi-GPU card and there are going to be some situations where it doesn’t scale well, but overall it is a far better buy than the GTX 280.
However, two things changed in the last eighteen months. One; AMD released its new GCN architecture designed for complex compute tasks, and two; Nvidia has now split into two architectures with a low-end for graphics and a high-end for compute.
This has rather turned events on on their head!
To quote Ryan Smith of Anandtech talking about the AMD’s GCN (HD 7970):
GCN at its core is the basis of a GPU that performs well at both graphical and computing tasks. AMD has stretched their traditional VLIW architecture as far as they reasonably can for computing purposes, and as more developers get on board for GPU computing a clean break is needed in order to build a better performing GPU to meet their needs. This is in essence AMD’s Fermi: a new architecture and a radical overhaul to make a GPU that is as monstrous at computing as it is at graphics. And this is the story of the architecture that AMD will be building to make it happen.
The fundamental issue moving forward is that VLIW designs are great for graphics; they are not so great for computing. However AMD has for all intents and purposes bet the company on GPU computing – their Fusion initiative isn’t just about putting a decent GPU right on die with a CPU, but then utilizing the radically different design attributes of a GPU to do the computational work that the CPU struggles at. So a GPU design that is great at graphics and poor at computing work simply isn’t sustainable for AMD’s future
Moving on, it’s interesting that GCN effectively affirms most of NVIDIA’s architectural changes with Fermi. GCN is all about creating a GPU good for graphics and good for computing purposes; Unified addressing, C++ capabilities, ECC, etc were all features NVIDIA introduced with Fermi more than a year ago to bring about their own compute architecture. I don’t believe there’s ever been a question whether NVIDIA was “right”, but the question has been whether it’s time to devote so much engineering effort and die space on technologies that benefit compute as opposed to putting in more graphics units.
To quote Ryan Smith of Anandtech talking about the Nvidia’s GTX 680 (Kepler):
What is clear at this time though is that NVIDIA is pitching GTX 680 specifically for consumer graphics while downplaying compute, which says a lot right there. Given their call for efficiency and how some of Fermi’s compute capabilities were already stripped for GF114, this does read like an attempt to further strip compute capabilities from their consumer GPUs in order to boost efficiency. Amusingly, whereas AMD seems to have moved closer to Fermi (Nvidia 580) with GCN by adding compute performance, NVIDIA seems to have moved closer to Cayman (AMD 69xx) with Kepler by taking it away.
So NVIDIA has replaced Fermi’s complex scheduler with a far more simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIA’s compiler. In essence it’s a return to static scheduling. Ultimately it remains to be seen just what the impact of this move will be. Hardware scheduling makes all the sense in the world for complex compute applications, which is a big reason why Fermi had hardware scheduling in the first place, and for that matter why AMD moved to hardware scheduling with GCN.
What makes this launch particularly interesting if not amusing though is how we’ve ended up here. Since Cypress and Fermi NVIDIA and AMD have effectively swapped positions. It’s now AMD who has produced a higher TDP video card that is strong in both compute and gaming, while NVIDIA has produced the lower TDP part that is similar to the Radeon HD 5870 right down to the display outputs.
Interesting, huh? For the last twelve months all rumours have pointed to AMD getting at least one of the two big next-gen consoles, over and above the WiiU, and many have suggested that AMD have pulled off a hat-trick in bagging all three. Either the next-gen consoles are using old AMD architectures, pre-GCN, or the rumours are standing on pretty shaky foundations.
The graphic below hopefully highlights the trend:
2007 marking the end of AMD’s (ATI as it was then) enthusiasm with big GPU’s.
2008 to 2011 showing how AMD aimed to compete with small power efficient GPU’s.
2012 marking the reversal with AMD’s big compute part and Nvidia’s graphics GPU.
So where does this leave us with regards to the Q4 2013 – Q3 2014 launch window for the PS4 and Xbox 720?
There have been credible rumours that both Sony and Microsoft are going ‘cheap’ for the next-gen as neither can afford the exorbitant cost of a bleeding-edge console in the era of the $0.99 mobile app-store game. This could mean using an older architecture, perhaps a respin of the AMD HD 5870 or Nvidia 560 GTX midrange on a 28nm process node. That, after all, would appear to be what Nintendo has done with its DX10 HD 4xxx part from AMD. However, that comes with its own costs as redesigning a GPU designed for manufacture on one process node to work as well on another needs money, so unless the GPU is particularly small to maximise units-per-wafer, it ‘probably’ makes more sense to use a current generation mid-range GPU straight off the production line. Sony and Microsoft might be unwilling to create another five-year corporate profit-vampire, but they will still want to provide a competitive product so my money is a modern GPU architecture.
If that is true, then Nvidia’s Kepler architecture is looking pretty damned good right now, especially when you consider the very limited power budget available to slim and attractive consumer electronics than need to operate silently in your living room. Compare the current Nvidia/AMD competition across range of pricepoints:
In every case the AMD solution uses a larger and more power hungry die than its competing Nvidia solution. While it brings an advantage in compute the advantage in gaming is actually negative in favour of Nvidia’s more streamlined design, and that is a lot of added cost to little purpose in a games console.
Might I suggest a down-clocked Nvidia Geforce GTX 660 with 768MB of video memory and sporting a TDP of not more than 100W, after all it seems technically feasible………….
Update 21.04.2014 – Chalk this one up for the completely wrong side of the tally sheet. 😉