The AMD Radeon RX 7900 XTX has officially been announced, the first in a salvo of RDNA 3 architecture GPUs that will compete with the best graphics cards. It's a radical new approach to GPU designs, using chiplets — in a similar fashion to how AMD uses chiplets on its Zen 3 and Zen 4 CPUs. Except here the use of chiplets has been tuned and tweaked to work best for graphics rather than CPUs.
The specifications for the Radeon RX 7900 XTX are mostly known at this point, though there are a few missing pieces. For example, AMD provided details on its "Game Clock" but not on the Boost Clock, and it appears to have rounded things off to the nearest 100 MHz. Maybe. We do have a compute teraflops figure that seems to be based off the boost clock, however, which means we mostly have what we need.
AMD Radeon RX 7900 XTX Specifications | |
---|---|
Architecture | Navi 31 |
Process Technology | TSMC N5 + N6 |
Transistors (Billion) | 58 |
Die size (mm^2) | 300 + 222 |
Compute Units | 96 |
GPU Cores (Shaders) | 12288 |
Ray Accelerators | 96 |
Boost Clock (MHz) | 2500 |
VRAM Speed (Gbps) | 20 |
VRAM (GB) | 24 |
VRAM Bus Width | 384 |
Infinity Cache | 96 |
Render Outputs | 192 |
Texture Mapping Units | 384 |
FP32 TFLOPS (Single-Precision) | 61.4 |
FP16 TFLOPS (Half-Precision) | 122.8 |
Bandwidth (GB/s) | 960 |
Total Board Power (Watts) | 355 |
Launch Date | December 13, 2022 |
Launch Price | $999 |
AMD says the RX 7900 XTX won't be competing with the RTX 4090 — that Nvidia's top Ada Lovelace part is effectively in a league of its own. Instead, it will be targeting the next step down, the upcoming RTX 4080, and it certainly looks like AMD will give Nvidia some much-needed competition. Total theoretical compute on the 7900 XTX is 61.4 teraflops, based on a tentative boost clock of 2500 MHz — the actual boost clock will probably be ±10 MHz, given past experience.
Where things get interesting is in some of the architectural changes. AMD now has double the FP32 performance per Compute Unit (CU), except it only has the same INT32 rate per CU. Put another way, AMD has followed Nvidia's example and now it has two blocks of 64 dual-issue Stream processors that are FP32 SIMD registers, with half as many INT32 SIMD registers. What does that mean for performance? Probably AMD's real-world performance per theoretical teraflops will now track closer to Nvidia's real-world performance per theoretical teraflops. We don't know that for sure yet, as we haven't been able to test the cards, but the architectural changes do make that a distinct possibility.
AMD's GPU chiplets are super interesting and should help reduce the overall price of the graphics cards quite a bit. The Graphics Compute Die (GCD) uses TSMC's "5nm" N5 process node, which should be relatively similar to the 4N node Nvidia uses for Ada Lovelace in terms of transistor density and power characteristics — both are effectively 5nm-class, at least. The Memory Cache Die (MCD) meanwhile uses TSMC's slightly older N6 (7nm-class) process node, which probably costs less than half as much per wafer.
The result of the chiplet approach is that the big GCD is less than half the size of Nvidia's AD102 chip, and then the cache and memory interface logic — which doesn't scale well to smaller process nodes — gets put on a bunch of tiny MCDs. Furthermore, AMD can put up to six (functional) MCDs around the GCD, but as we'll see with the RX 7900 XT, it can also do fewer MCDs with a partially disabled GCD to improve overall yields and reduce costs.
Let's quickly cover the die sizes and chips per wafer estimates as well. The GCD measures 300mm^2, with dimensions of approximately 24.6x 12.2mm. Based on that size, AMD can get around 174 chips per N5 wafer — nearly double the number of AD102 chips Nvidia can get, or effectively half the cost. The MCDs meanwhile are 37mm^2, or approximately 7.5x4.9mm, which means AMD gets around 1,650 dies per wafer. There's no need to even worry about harvesting partial die to improve yields with a chip that small; it either works properly or it gets discarded (or turned into a "dummy die" for something like the 7900 XT).
Total combined die size for the 7900 XTX is 522mm^2, so still smaller than Nvidia's biggest Ada chip, with a cost that's roughly equivalent of making a monolithic ~370mm^2 N5 chip. But if AMD had gone the monolithic route, Navi 31 would still be in the ~500mm^2 range, so the net cost savings for AMD looks to be around 25–30 percent.
To quickly cover the speeds and feeds, AMD says it improved overall utilization of the shader resources by increasing the various cache and buffer sizes. It's not clear exactly how much faster the new RDNA 3 CU is versus the RDNA 2 CU, but there's a lot more FP32 available — 160% more than the RX 6950 XT. The Ray Accelerators are also 50% faster, which should help to narrow the performance gap in ray tracing games. Clock speeds aren't too much higher than on RDNA 2, but power requirements haven't really changed much. Where the RX 6950 XT was a 335W TBP, RX 7900 XTX is only 355W — easily handled with slightly tweaked cooling designs.
AMD didn't go the 16-pin 12VHPWR route either, which means we can use the tried and true dual 8-pin power connectors. Partner cards will almost certainly have triple 8-pin connectors on the higher end models, and TBP will probably go up into the 400~450W range on factory overclocked cards, but we'll have to wait and see what that does for actual performance. We suspect AMD targeted optimal efficiency for its own cards, rather than following in the footsteps of Nvidia's power-hungry RTX 4090.
No comments:
Post a Comment