The video card of the future
According to AMD’s own figures, the RX 7900 XTX is on average up to 11 percent faster than an RTX 4080 at 4k ultra without ray tracing, but also 6.6 percent slower once ray tracing is used. The manufacturer has so far completely avoided comparisons with the RTX 4090, and with the previous proportions, it is easy to understand why. That has not prevented AMD from aiming high with the model numbers of the new duo. In this review, you can read how the Radeon RX 7900 XT and 7900 XTX compare to their predecessors and the competition, and how the new video cards are constructed.
Chiplets: challenges and benefits
The computing power, and therefore speed, of a video card traditionally stems from the number of computing cores that a GPU contains. To keep it working properly, the other parts of a video card must be in balance. The more computing power is required, the more transistors are needed, and the larger the chip becomes. A larger chip is also significantly more expensive to produce, since fewer pieces can be cut per wafer and a relatively larger part has to be thrown away in the event of imperfections.
While AMD uses different chiplets with its Ryzen processors and then links the processor cores together by means of the I/O die, the Navi 31 chip on the new Radeon cards also consists of different chiplets. The difference is that AMD has housed all the computing power of the Navi GPU in the central, larger chip. AMD calls this chip the ‘graphics compute die’. The small chips placed around the gcd are called ‘memory cache dies’ and contain both the memory controllers and infinity cache memory.
The manufacturer itself indicates that connecting the current GCD with different MCDs was already a huge challenge. That’s because GPUs require a lot more internal bandwidth than processors. It involves thousands of signals per component per cluster in a gpu, compared to hundreds in the CCD of a processor. The required internal bandwidth is more than a factor of 10 higher on a GPU than on a CPU.
In order to provide the bandwidth between gcd and mcd, AMD uses infinity links . The use of infinity links yields considerable energy savings per bit in the cache compared to vram. It’s not a perfect solution, however, as it introduces higher latency than the full on-die cache that Navi 21 has. AMD says it has brushed this disadvantage away by allowing the infinity fabric to work at a higher speed and thus making the latency on Navi 31 still 10 percent lower than Navi 21.
Because a powerful GPU is usually quite large and much larger than most CPUs, the use of chiplets for a video card is very obvious. With the Navi 31 GPU, AMD chooses to keep all computing cores on one GCD chip and to have them made using a fairly new production process. These components scale quite well with a more modern and smaller process, which also allows the clock speeds to be increased. The memory controller and the cache memory, parts that do not scale well on a newer process, have AMD make them in the form of mcds on the slightly ‘older’ 6nm. In this way, the newer and more expensive 5nm is only used for the parts of a GPU that are very performance-determining.
Another advantage for AMD is that the memory controller in the mcd could be copied almost 1-to-1 from RDNA2. Because 6nm and 7nm belong to the same node, according to the manufacturer, valuable time and energy could be saved by reusing the memory controller, which was then used for the optimization of the infinity links.
The RDNA3 architecture
Navi 31 is the name of the chip on the new Radeon cards and is based on the RDNA3 architecture. Like previous generations, the third version of RDNA is also fully focused on gaming.
The big one is called the gcd, which stands for graphics compute die, and contains all the shaders. The mcd, which stands for memory cache contains infinity cache as well as the memory controllers. The interconnect between these chips is connected to infinity fabric, which AMD says uses a new generation that can handle significantly higher bandwidth.
A total of six shader engines are present on Navi 31, and each shader engine contains eight dual compute units, in addition to the L1 cache and rasterization units. With this layout, the total of the Navi 31 chip comes to 96 compute units, which is 20 percent more than the 80 compute units on the RX 6900 XT and RX 6950 XT. Because AMD still places 64 stream processors per compute unit, the number of shaders totals 6144, which is also 20 percent more than the Radeon top model of the previous generation.
In order to get the performance per watt as high as possible, AMD has decoupled the clock speeds of the shaders and the front end in the design of Navi 31. According to the manufacturer, high clock speeds of the front end are important for performance, but the shaders themselves can run a little slower without degrading performance. On the RX 7900 XTX, it was decided to have the shaders work at a maximum of 2300MHz, while the front end of this chip works at 2500MHz. The manufacturer states that it chooses the best balance because the clock speeds are effectively 15 percent higher than on the previous generation, while at the same time saving 25 percent in energy consumption.
The compute unit on the drawing board
Increasing all these figures looks especially nice in a specification table, but AMD also says that it has also focused on achieving as much computing power as possible per transistor. For this, the manufacturer has gone back to the drawing board with its calculation core. To make the RDNA3 architecture more efficient than its predecessor, AMD has made a series of tweaks and improvements.
The compute unit in RDNA 3 features a dual-issue SIMD unit, with which AMD broadly makes the same adjustment that Nvidia has made with its Ampere architecture. Each stream processor can now work simultaneously on a floating-point, integer, or AI task. The theoretical computing power is doubled with this step, but AMD does not specify the number of computing cores twice as high with this adjustment, as Nvidia has started to do with its CUDA cores. Also new are the AI accelerators, two of which are placed per compute unit. These are expected to be used for FSR3, the counterpart of DLSS3 which generates intermediate frames instead of rendering.
New is the arrival of the multidraw-indirect accelerator, which supports the graphics command processor by collecting and delivering MultiDraw instructions at an accelerated rate. With this, AMD says it can reduce the overhead of both the driver itself and the graphics API used. And Nvidia isn’t the only one we’re hearing about out-of-order execution on GPUs this generation. AMD also says it has developed a new synchronization mechanism to be able to run more tasks in parallel in the pipeline and thus increase the efficiency of the shaders.
An improvement can be found in how many primitives and vertices RDNA3 can process per clock tick, which can be up to 50 percent higher. This is achieved with improved native culling, where it can be hardware determined which parts such as primitives, polygons, and geometry in a frame will not be visible and can therefore be skipped in the process. In addition, the rasterization performance per clock tick is also up to 50 percent faster, which should help especially at high resolutions.
To prevent a GPU from having to wait for data to get started, it is always useful to have the necessary data available. This is done with cache memory, but a balance must always be sought between the performance gains achieved and the costs that the cache details.
Cache memory has received a lot of attention in the development of RDNA3. The L0 and L1 cache are twice the sizes of Navi 21, and the L2 cache is also increased by 50 percent at 6MB, as is the bandwidth between these cache levels. The 96 MB infinity cache is smaller than the 128 MB of the RX 6900 XT, but the bandwidth has also more than doubled here. Finally, the memory bus has also become 50 percent wider and the associated GDDR6 memory also works at a higher speed. The bottom line, that means a memory bandwidth of 960GB per second for the VRAM, and the total bandwidth is further increased to no less than 3.5TB / s thanks to the infinity cache.
With RDNA2, AMD brought hardware acceleration for ray tracing to its video cards for the first time. AMD claims to achieve up to 80 percent better ray tracing performance with RDNA3 than it achieved on its RX 6000 series, thanks to various optimizations in both hardware and software. Much of the performance gain comes from the previously discussed culling, which skips areas not visible to the player during rendering. Improved cache structures allow more rays to be included in calculations.
All of the above improvements collectively result in 20 percent more performance per square millimeter, according to AMD. And per clock tick, that should amount to 17.4 percent more performance than the previous generation.
Radeon RX 7900 XTX and RX 7900 XT
AMD’s reference versions of the RX 7900 XT and XTX are relatively compact, especially when compared to Nvidia’s RTX 4080 and 4090. The RX 7900 XT is 27.6 centimeters long, the XTX is 28.7 centimeters. That is shorter than the Founders Edition of the RTX 4080, which comes out at a length of 30.4 centimeters.
What is striking about the RDNA3 cards is that they have quite modern video connections. The RX 7900 XT and XTX feature the brand new DisplayPort 2.1 standard and can output 12 bits per channel. With DP 2.1, thanks to an enormous bandwidth of 54Gbit/s, according to AMD, it is possible to drive an 8k screen up to 165Hz or a 4k screen up to 480Hz with dsc. That’s a huge step up from DisplayPort 1.4, as used on the RX 6000 series, but also on Nvidia’s new RTX 40 series cards. With DP 1.4 the limit with compression is for 8k at 60Hz and for 4k at 300Hz.
Where the new cards have modern video connections, AMD has opted for the conventional eight-pin PEG connector for power connections. We encounter this in duplicate on both the RX 7900 XT and the XTX. That means the cards can theoretically draw up to 375 watts of power if we include PCIe, according to the specification. However, the stated total board power is somewhat lower: 355W for the RX 7900 XTX and 315W for the RX 7900 XT. The latter card was specified at 300W during the announcement, but AMD has since indicated that it has discovered with power tuning that an extra 15 watts yields a meaningful performance gain on this card.
Like Nvidia, AMD chooses to leave PCIe 5.0 for what it is for now, so RDNA3 cards use the same generation of PCI express as their predecessors. Given the infinity cache and the size of the fast VRAM of the RX 7900 XT and XTX, this should probably not be a problem and the bandwidth of PCIe 4.0 will be sufficient in all cases.
AMD has set the suggested retail prices of the RX 7900 XT and 7900 XTX for the Benelux at 1049 and 1159 euros respectively. Amounts above 1000 euros for high-end video cards, unfortunately, seem to be the new normal these days. Compared to Nvidia, the prices are not too bad. The GeForce manufacturer released its RTX 4080 with a suggested retail price of 1469 euros and the RTX 4090 was even set at 1949 euros. Those suggested retail prices have now been adjusted to 1399 and 1869 euros respectively, but the new Radeons are still cheaper.
|GPU||Navi 31||Navi 31||Navi 21|
|transistors||57.7 billion||57.7 billion||26.8 billion|
|manufacturing process||5nm+6nm TSMC||5nm+6nm TSMC||TSMC 7nm|
|Memory||24GB GDDR6||20GB DDR6||16GB GDDR6|
bandwidth in combination with infinity cache
|PCI Express generation||PCIe 4.0||PCIe 4.0||PCIe 4.0|
|Release date||December 13, 2022||December 13, 2022||May 10, 2022|
Our starting point in GPU tests is always that we want the rest of the system to form the smallest possible bottleneck. That’s why we upgraded our test system in recent weeks, after using an overclocked AMD Ryzen 9 5950X for a long time. Our new GPU test system consists of a water-cooled Intel Core i9-13900K processor, overclocked to 5.5 GHz allcore. Because the e-cores of this processor lead to performance degradation in some of our tests, we have completely disabled the e-cores on our platform. This means that only all eight performance cores with sixteen threads remain active. As working memory, we use 32GB of fast DDR5-7200 with relatively tight timings, to ensure that the CPU and RAM are the limiting factor for measuring GPU performance as little as possible.
The exact specifications of the test system can be found in the table below.
|Test system GPUs|
|Processor||Intel Core i9-13900K (p-cores @5.5GHz, e-cores disabled)|
|motherboard||Gigabyte Aorus Z790 Master|
|random-access memory||G.Skill Trident Z RGB F5 32GB(2x 16GB) DDR5-7200 CL34-45-45-115|
|SSD||Silicon Power XS70 4TB|
|Nutrition||FSP Hydro PTM Pro ATX3.0 1200W|
|cooling||Alphacool Eisblock XPX, Alphacool XT45 480mm radiator, Alphacool D5 water pump, be quiet Pure Wings 2 fans|
|test bench||Streacom BC1 V2 Benchtable|
|operating system||Windows 11|
Tested video cards
For our GPU tests, we always use reference cards or Founders Editions, unless the GPU in question has not been released as such or we do not have one available. In that case, we use a custom model that is as close as possible to the reference specification in terms of clock speeds.
For this review we used the following models:
- Nvidia GeForce RTX 4090 Founders Edition
- Nvidia GeForce RTX 4080 Founders Edition
- Gigabyte GeForce RTX 3090 Ti Gaming OC
- Nvidia GeForce RTX 3090 Founders Edition
- AMD Radeon RX 6950 XTreference card
- Nvidia GeForce RTX 3080 Ti Founders Edition
- Nvidia GeForce RTX 3080 Founders Edition
- AMD Radeon RX 6800 XT reference card
- Nvidia GeForce RTX 2080 Ti Founders Edition
- Nvidia GeForce GTX 1080Ti Founder’s Edition
- AMD Radeon RX 5700 XT reference card
Drivers and measurement method
We tested all video cards for this review with the latest driver available when we started. For the AMD Radeon cards, we used Adrenalin 22.11.2 and for the RX 7900 XT and 7900 XTX we used the Radeon press driver number 22.40; for Nvidia GeForce cards it was GeForce 526.98.
Using PresentMon, we measure performance in each tested game, from which we calculate both the average frame rates, or fps, and the frame times of the 99th and 99.9th percentiles and report the latter two in milliseconds.
In the graphs on the following pages you will find graphs with composite bars, consisting of the 99th percentile, converted to frames per second, noted as minimum frame rate, followed by the average frame rate per second. On that second score, the average number of frames per second that a video card can calculate, is primarily sorted. The frame times do not give an idea of the average frame rate, but of the outliers in a negative sense. After all, these can mean that a game does not feel smooth despite a good average.
The time it takes to render images within a 3D game and therefore within our benchmark varies from frame to frame. With our frametime measurement, the rendering times of all individual frames are stored. Then we discard the 1 percent slowest frames. The highest render time of the remaining 99 percent of the frames, or the slowest frame, is the 99th percentile frametime.
At the request of some readers, we’ve also included the 99.9th percentile values. So for this we only disregard the 0.1 percent slowest frames. In theory this is even more precise, but in practice incidental causes and measurement errors sometimes throw a spanner in the works. For now, we’ve listed them in the review, so keep that in mind when looking at these results.
We regularly review the range of games and make these choices taking into account each game’s API, engine, genre, AMD/Nvidia ratio, age, and technical benchmark details to arrive at the most representative suite possible. to come out.
|Assassin’s Creed: Valhalla||November 2020||DX12||Anvil Next 2.0|
|Call of Duty: Modern Warfare II||October 2022||DX12||IW 9.0|
|Cyberpunk 2077||December 2020||DX12||rescue engine 4|
|Doom Eternal||March 2020||Vulcan||Id tech 7|
|F1 2022||July 2022||DX12||EGO Engine 4.0|
|Forza Horizon 5||November 2021||DX12||ForzaTech|
|Guardians of the Galaxy||October 2021||DX12||Dawn Engine|
|Metro Exodus (+Enhanced)||February 2019||DX12||4A Engine|
|Red Dead Redemption 2||November 2019||Vulcan||RAGE|
|Total War: Warhammer III||February 2022||DX11||TW Engine 3|
In addition to performance, we measure the power consumption of video cards. We perform the current measurement with a riser card from the manufacturer Adex, which we place between the PCIe slot and the video card. That way we can not only measure the current that runs through the loose power cables but also the power that the video card draws directly from the PCIe slot. For the measurement, we use several copies of the Tinkerforge Voltage/Current Bricklet 2.0. For the final measurement, the riser card is equipped with such a bricklet. To measure the current of the PEG cable, the bracket was placed together with the necessary connectors on a printed circuit board specially designed for our test.Depending on the number of power cables that a video card requires, we obviously use various meters.
3DMark and Unigine Superposition
In Time Spy, the RX 7900 XTX is almost 8 percent faster than the RTX 4080 and over 40 percent faster than the RX 6950 XT. The RX 7900 XT is 4 percent slower than the RTX 4080, but 25 percent faster than the RX 6950 XT. In Fire Strike Ultra, the RX 7900 XTX is then 13.5 percent ahead of the RTX 4080 and in Extreme it is even tied for the RX 7900 XT and RTX 4080.
3DMark Port Royal
Port Royal is 3DMark’s benchmark in which raytracing performance is measured. It is a synthetic test that produces both a score and an average frame rate.
In this test, the RX 7900 XTX ranks between the RTX 3090 Ti and RTX 4080, making it 55 percent faster than the RX 6950 XT. The RX 7900 XT ends up between the RTX 3090 and 3090 Ti, and is 35 percent faster than the RX 6950 XT here.
Superposition is a benchmark of Unigine and can be run in both OpenGL and DirectX 11. For our tests we use DX11 and two of the available graphics presets. In the 4k test, the RX 7900 XTX is really close behind the RTX 4080. Below that, the RX 7900 XT manages to end up equal with the RTX 3090 Ti. At 1080p, the RX 7900 XTX doesn’t quite manage to keep up with the RTX 4080, but the RX 7900 XT appears to be slightly faster than the RTX 3090 Ti here.
Assassin’s Creed: Valhalla
In Valhalla, the RX 7900 XTX manages to match the RTX 4080 at 4k ultra. Below that, it will only get better news for AMD, because with lower settings and resolutions, the new Radeons come out more and more favorably. At 1440p ultra, the RX 7900 XT is also slightly faster than the RTX 4080. The flagship of Nvidia’s RTX 40 series usually remains the fastest, only at 1080p medium, the RX 7900 XTX is slightly faster.
Call of Duty: Modern Warfare II
We’ve seen AMD perform well in Modern Warfare II before, and the RDNA3 cards are no exception. At 4k extreme, the RX 7900 XT is already faster than the RTX 4080, and the advantage for Radeons at lower resolutions quickly adds up in this game as well. At 1440p-ultra, the RX 7900 XTX is too fast for the RTX 4090 and at 1080p, the RX 7900 XT can also compete with this green opponent.
The RX 7900 XTX is 10 percent faster than the RTX 4080 in Cyberpunk at 4k-ultra, while the RX 7900 XT is less than 5 percent behind. It is striking that AMD does not perform relatively better at lower resolutions in this game, as can be seen on the previous two pages. Nevertheless, the RDNA3 cards are still at playable frame rates even at 4k ultra. If we also enable ray tracing, the new Radeons fall back a bit further to the level of the faster RTX 30-series cards. RDNA3’s improvements in ray tracing performance over RDNA2 are particularly noticeable at 4k.
In Doom at 4k ultra, the RX 7900 XTX is almost 3 percent slower than the RTX 4080. However, the card is 40 percent faster than the RX 6950 XT. At lower resolutions, the RDNA3 cards perform relatively better, with the RX 7900 XT in particular placing itself further above the RTX 3090 Ti.
Enabling ray tracing in Doom is less of a blow for the RX 7900 XTX and 7900 XT than for the RDNA2 cards. Nvidia’s performance remains better in this case and the new Radeons nestle between the top models of the previous RTX 30 series.
In F1, the RX 7900 XTX positions itself at 4k ultra between the RTX 4090 and 4080, while the RX 7900 XT closes in on the latter’s GeForce card. At 1440p and 1080p, the new AMD cards are at the top. With ray tracing enabled, that lead drops back to more normal proportions compared to Nvidia, while in that case, the RDNA3 cards perform considerably better than their predecessors.
Forza Horizon 5
In Forza Horizon 5, the performance of the RX 7900 XTX and 7900 XT is okay, but the RX 6000 series stands out in particular. The RDNA3 cards are less distinguishable from their predecessors, especially if we look at the ratios at lower resolutions and compare them to previously reviewed games. So compared to the RTX 40 series, the RX 7900 cards don’t do badly; the performance of the RX 6000 series was simply outstanding in this game.
Guardians of the Galaxy
The new Radeons outperform AMD’s previous flagship with ray tracing in this game by up to 58 percent. However, that gain is not enough to keep the cards in the lead with Nvidia’s RTX 40 series, even the fastest cards from the RTX 30 series struggle to keep up. At medium settings, without ray tracing, the ratio is not too bad.
In Metro, the RX 7900 XTX is practically on the same level as the RTX 4080, while the RX 7900 XT holds its own against the RTX 3090 Ti. The RX 7900 XT and RX 7900 XTX are 22 and 42 percent faster at 4k ultra, respectively, than the RX 6950 XT. If we enable ray tracing, we see only a limited fallback in Metro compared to Nvidia cards, at least less than in Guardians of the Galaxy, for example.
Red Dead Redemption 2
Red Dead Redemption 2 is Rockstar’s well-known open world game from late 2019 and runs on the Rockstar Advanced Game Engine. The game can run with both DirectX 12 and the Vulkan API and because DX12 on our test system gives problems, we use Vulkan, which works without any problems. For this test we use the game’s built-in benchmark, using only the last and longest scene.
The RX 7900 XTX can still compete well with the RTX 4080 at 4k resolution, but at lower resolutions, it is strikingly enough that the GeForce card extends further. The RX 7900 XTX and RX 7900 XT are getting closer and closer to each other, while the RTX 4080 is one step above that at 1080p ultra.