Original Link: https://www.anandtech.com/show/11549/the-intel-kaby-lake-x-i7-7740x-and-i5-7640x-review-the-new-single-thread-champion-oc-to-5ghz



Intel’s direction for the high-end desktop space has taken an interesting turn. After several years of iterative updates, slowly increasing core counts and increasing IPC, we have gotten used to being at least one generation of microarchitecture behind the mainstream consumer processor families. There are many reasons for this, including enterprise requirements for long support platforms as well as enterprise update cycles. For 2017, Intel is steering the ship in a slightly different direction, and launching the latest microarchitecture on the HEDT platform. These CPUs don’t feature the high core counts of the other HEDT parts, but offer a higher point up the voltage/frequency scale to be the fastest single thread processors money can buy. They also overclock quite well.

The High-Level Product

Back at Computex, Intel announced its new high-end desktop (HEDT) platform code named Basin Falls. There are three elements to Basin Falls: the new X299 chipset which brings a number of enhancements from the more mainstream consumer line, a set of Skylake-X processors based on the Skylake-SP microarchitecture (and slightly different from the mainstream Skylake-S processors launched in 2015) from 6 cores to 18 cores, and a pair of Kaby Lake-X processors, reusing the Kaby Lake-S silicon launched earlier this year but with a bit more frequency and power consumption.

We covered all three parts in our launch pieces (linked below), but here is a quick recap.

Pushing Peak Performance

The main reasons an enthusiast invests in the HEDT platform usually revolve around multi-threaded performance, the need to support multiple graphics cards, and potentially 1.21 jiggabytes of DRAM (ed: you mean 128GB). Kaby Lake-X (KBL-X for short) doesn’t have any of these features that we normally prescribe to HEDT: it only has four cores, not ten or higher; it only supports one GPU at full PCIe speed, up to two in SLI or three in Crossfire if you are willing to have a lower link speed; and it only has dual channel memory for a maximum of 64GB. So why is Intel bringing KBL-X to HEDT? The answer lies primarily in peak performance.

Users that have been in the enthusiast space will have realized that the Holy Grail for PC performance is single threaded (ST) performance. If you master single thread throughput, then arguably the rest is easier, such as scaling out to more cores. There are usually three barriers to high ST-perf: instructions per clock (IPC), frequency and power consumption. A high IPC is fundamental to such a design, as any gains will propagate through the platform, but is becoming a tough job. Over the last few generations, Intel has increased IPC by 3-10% each generation, making a 30-45% increase since 2010 and Sandy Bridge depending on the benchmark, but we’re unlikely to see 50-100% jumps per generation any time soon. Any IPC gains are multiplied through the frequency at which the processor runs at, which can be limited by a combination of things: production process (e.g 14nm), voltage characteristics, stability, yield etc. These features tie directly into power consumption, which increases as a square of voltage and with frequency/capacitance. With some designs, +10% frequency might be possible, but at the cost of +100% power, so there also needs to be a balance to have something marketable that people will want to buy.

Kaby Lake-X is binding Intel’s latest x86 microarchitecture with the highest IPC, at the highest frequency they have ever launched a consumer CPU, for a reasonable power window. Users can overclock another 10% fairly easily, for a slight increase in power. Simply put, Kaby Lake-X is the highest single-thread performing x86 processor out-of-the-box that exists. This is a different take on HEDT compared to previous generations of HEDT CPUs.

With the extra single thread performance, it would appear to appeal to traders who need fast systems, gamers where single thread performance is limiting the frame rate (or minimum frame rates), or science/enterprise workloads where serial code is a limiting factor for simulations or professional applications.

The Core i7 and the Core i5

At the heart, the new KBL-X processors are no different to their mainstream platform KBL-S brethren. The silicon is the same, but potentially binned for a better voltage/frequency curve, and then packaged into the HEDT platform rather than the mainstream platform. Unlike the mainstream processor stack though, Intel is only launching two processors. A Core i7-7740X and a Core i5-7640X.

Intel Kaby Lake Processors
  Core i7 Core i5
Core i7-7740X Core i7-7700K Core i5-7640X Core i5-7600K
Socket LGA2066 LGA1151 LGA2066 LGA1151
Cores/Threads 4/8 4/8 4/4 4/4
Base Frequency 4.3 GHz 4.2 GHz 4.0 GHz 3.8 GHz
Turbo Frequency 4.5 GHz 4.5 GHz 4.2 GHz 4.2 GHz
TDP 112 W 91 W 112 W 91 W
L2 Cache 256 KB/core
L3 Cache 8 MB 6 MB
DRAM Channels 2
DRAM Support DDR4-2666 DDR4-2400 DDR4-2666 DDR4-2400
Graphics None HD 620 None HD 620
Price (MSRP) $350 $250
Price (7/21) $349 $309 $248 $239
Launched July 2017 Jan 2017 July 2017 Jan 2017

Both CPUs are quad core, with the Core i7 having Hyperthreading for a total of eight threads. The Core i5 does not have hyperthreading, making it the first HEDT processor in the modern Core era to do so. Both will have identical support to their KBL-S siblings, although the increased base/turbo frequencies have resulted in Intel’s TDP increasing from 95W to 112W. The TDP rating is a guide for appropriate cooling: the KBL-S processors were actually very good on their power consumption at stock frequencies, and as shown later, so are the KBL-X processors. Intel could have kept the 95W TDP rating very easily here.

So given what was said in the previous section about peak performance, the Core i7 fits the bill. It has the highest frequencies, and supports hyperthreading for increased performance. It begs the question why the Core i5 exists at all. There are a few guesses as to why:

  • For some enterprise users, hyperthreading is not needed
  • It is a cheaper entry point into the platform
  • It overclocks just as well to perform the same as the Core i7

Intel’s official line is about giving customers options. We will not be seeing a Core i3 on HEDT any time soon, though.

Kaby Lake-X does not come with any bundled cooler, but Intel is promoting its own TS13X liquid cooled loop with the processors. This CLC has been in the market for a good number of years now.

Competition and Market

Because these new KBL-X parts are updates to the KBL-S family, Intel has direct competition with itself. The Core i7-7740X is an alternative to the Core i7-7700K on the mainstream platform, being similar in frequency but giving much higher platform costs.

Comparison: Intel Core i7-7740X vs Core i7-7700K
Intel
Core i7-7740X
Features Intel
Core i7-7700K
X299 Platform Z270 etc
LGA2066 Socket LGA1151
4 / 8 Cores/Threads 4 / 8
4.3 / 4.5 GHz Base/Turbo 4.2 / 4.5 GHz
16 PCIe 3.0 Lanes 16
256 KB/core L2 Cache 256 KB/core
8 MB L3 Cache 8 MB
112W TDP 95W
$349 Retail Price (7/21) $309
MSI X299 Raider: $230
ASUS Prime X299-A: $310
GIGABYTE X299-UD4: $249
Cheap OC
Motherboard
Cost
GIGABYTE Z270-HD3: $114
MSI Z270 SLI PLUS: $136
ASUS TUF Z270-MK2: $120

Similar platform builds for the two put an extra $100-150 cost on the motherboard, potentially limiting the usefulness of the new KBL-X parts on paper, as that extra cost of the build as a whole might not warrant another 100 MHz on the turbo frequency. The proof is in the pudding, which is why we are doing this review. The Core i5-7640X against the Core i5-7600K is in a similar situation.

However, there is also AMD to discuss. With the launch of the Ryzen family of processors, There are now significant comparisons to be made between what Intel offers and what AMD offers. Going at a price-for-price comparison, using MSRP prices (not sale prices or distributor prices), the Core i7-7740X squares against AMD’s Ryzen 7 1700.

Comparison: Intel Core i7-7740X vs AMD Ryzen 7 1700
Intel 
Core i7-7740X
Features AMD
Ryzen 7 1700
X299 Platform X370, B350, A320
LGA2066 Socket AM4
4 / 8 Cores/Threads 8 / 16
4.3 / 4.5 GHz Base/Turbo 3.0 / 3.7 GHz
16 PCIe 3.0 Lanes 16
256 KB/core L2 Cache 512 KB/core
8 MB L3 Cache 16 MB (Victim Cache)
112W TDP 65W
$349 Retail Price (7/21) $270
MSI X299 Raider: $230
ASUS Prime X299-A: $310
GIGABYTE X299-UD4: $249
Cheap OC
Motherboard
Cost
ASUS Prime B350-Plus: $90
MSI B350 Tomahawk: $100
ASRock AB350 Pro4: $100

At this price, both sides have their merits. Intel wins in IPC, which is about 5-15% ahead clock-for-clock, but is also at a higher frequency compounding the difference. In its own attack, AMD wins in cores and threads, offering eight cores and sixteen threads where Intel only offers four cores and eight threads. Intel wins for the IO and chipset, offering 24 PCIe 3.0 lanes for USB 3.1/SATA/Ethernet/storage, while AMD is limited on that front, having 8 PCIe 2.0 from the chipset. Depending on the workload, both sides have merits – AMD will state that it is geared towards more professional multi-threaded workloads, while Intel is geared towards serial code scenarios which still dominate certain tasks. There’s also the platform cost, where the motherboards are $150-$200 cheaper for AMD when looking at cheap overclockable systems.

For the Core i5-7640X, it sits at a similar price to AMD’s Ryzen 5 1600X.

Comparison: Intel Core i5-7640X vs AMD Ryzen 5 1600X
Intel 
Core i5-7640X
Features AMD
Ryzen 5 1600X
X299 Platform X370, B350, A320
LGA2066 Socket AM4
4 / 4 Cores/Threads 6 / 12
4.2 / 4.4 GHz Base/Turbo 3.6 / 4.0 GHz
16 PCIe 3.0 Lanes 16
256 KB/core L2 Cache 512 KB/core
8 MB L3 Cache 16 MB (Victim Cache)
112W TDP 95W
$248 Retail Price (7/21) $230
MSI X299 Raider: $230
ASUS Prime X299-A: $310
GIGABYTE X299-UD4: $249
Cheap OC
Motherboard
Cost
ASUS Prime B350-Plus: $90
MSI B350 Tomahawk: $100
ASRock AB350 Pro4: $100

For this review, we have also included numbers for other processors. These include AMD’s Ryzen 7 1800X, which at $499 is an alternative due to the money saved by investing on the AM4 platform; we also have the Skylake-X based Core i7-7800X, which has an MSRP not so far away from the i7-7740X, and opens up the platform with more PCIe lanes, two more cores, but at the expense of single thread performance; and finally the perennial Core i7-2600K from the Sandy Bridge era, for the benchmarks which we have data.

Given everything said above, it would appear Intel has an uphill struggle to convince users that Kaby Lake-X is worth the investment. In this review we will test and see what the performance story is.

Please note that our AMD data is before AGESA 1.0.0.6. We are planning an article with an update of the latest AGESA soon.

Pages In This Review



Navigating the X299 Minefield: Kaby Lake-X Support

When building a platform, keeping it limited to one particular type of product makes it simple and easy to understand, at the expense of flexibility. The idea of flexibility is one that Intel and AMD have experimented with in the past, enabling users to stay on the same underlying platform and upgrade in future generations, but with X299, Intel is taking a large step forward in support. This is both a good and bad thing, depending on how different the support for the different generations needs to be. In this context, Skylake-X and Kaby Lake-X are like chalk and cheese, which can present a headache for users new to building systems. It has caused some minor headaches with system builders and motherboard manufacturers already.

To recap, the three elements of the Basin Falls platform launch were the motherboards/X299 chipset, the SKL-X processors and the KBL-X Processors.

X299: What Is It?

The X299 chipset supports the new Skylake-X and Kaby Lake-X processors, and like the Z170 chipset and Z270 chipset counterparts on the mainstream consumer line, are basically big PCIe switches. One of the issues with the older X99 chipset was its limited capabilities, and inability to drive many PCIe devices – this changes with the big switch mentality on X299. For the DMI 3.0 link going into the chipset (basically a PCIe 3.0 x4), the chipset has access to up to 24 PCIe 3.0 lanes for network controllers, RAID controllers, USB 3.1 controllers, Thunderbolt controllers, SATA controllers, 10GbE controllers, audio cards, more PCIe slot support, special controllers, accelerators, and anything else that requires PCIe lanes in either an x4, x2 or x1 link.

The total uplink is limited by the DMI 3.0 link, but there will be very few situations where this is saturated. There are a few limits to what support is available (some ports are restricted in what they can handle), and only three PCIe 3.0 x4 drives can use the in-built PCIe RAID, but this should satiate all but the most hardcore enthusiasts.

Skylake-X CPUs: Coming in Two Stages

The Skylake-X family of processors for Basin Falls comes in two stages, based on the way the processors are developed. Normally HEDT processors are cut down versions of enterprise processors, usually through restricting certain functions, but the enterprise processors are typically derived from three different silicon layouts during manufacturing. Internally Intel call these three layouts the LCC (low core-count), HCC (high core-count) and XCC (extreme core-count), based on the maximum number of cores they support. Nominally Intel does not disclose which silicon layout they use for which processors, though it is usually straight forward to work them out as long as Intel has disclosed what the configurations of the LCC/HCC/XCC dies are. In this case, Intel has officially left everyone guessing, but the point here is that historically Intel only uses the LCC silicon from the enterprise line for its consumer desktop processors.

In previous generations, this meant either a 6, 8 or 10 core processor at the top of the stack for consumers, with lower core count models being provided by disabling cores (this is a complex topic involving the quality of the manufacturing process and determining voltage/frequency in a process called binning. We’ve covered it before, but it’s something all manufacturers have to do to get good yields as making processors isn’t a perfect process). Each year we expected one of two things: the top-end SKU either gets more frequency, less power, or more cores, and as such the march of progress has been predictable. If you had asked us two months ago, we would have fully expected Skylake-X to top out with LCC silicon at 10 or 12 cores, depending on how Intel was planning the manufacturing part.

So the first element of Intel’s launch is the LCC processors, running up to 10 cores. We previously published that the LCC silicon was 12 cores, but we can now show it is 10 – more on that later. The three Skylake-X CPUs launching today are using LCC silicon with 6, 8 or 10 cores as the Core i7-7800X, Core i7-7820X and Core i9-7900X respectively. Intel is further separating these parts by adjusting the level of official supported DRAM frequency, as well as the PCIe lanes.

The second element to the Skylake-X launch is the one that has somewhat surprised most of the industry: the launch will contain four processors based on the HCC silicon. Technically these processors will not be out until Q4 this year (one SKU coming out in August), and the fact that Intel did not have frequency numbers to share when announcing these parts show that they are not finalized, calling into question when they were added to the roadmap (and if they were a direct response to AMD announcing a 16-core part for this summer). We’ve written a detailed analysis on this in our launch coverage, but Intel is set to launch 12, 14, 16 and 18-core consumer level processors later this year, with the top part running a tray price (when you buy 1k CPUs at a time) of $1999, so we expect the retail to be nearer $2099.

It should be noted that due to a number of factors, the Skylake-X cores and the communication pathways therein are built slightly differently to the consumer version of Skylake-S, which is something discussed and analyzed in our Skylake-X review.

Kaby Lake-X: The Outliers

The final element to the Basin Falls launch is Kaby Lake-X. This is also an aspect of the Basin Falls platform that deviates from the previous generations. Intel’s HEDT line has historically been one generation behind the mainstream consumer platform due to enterprise life cycles as well as the added difficulty of producing these larger chips. As a result, the enterprise and HEDT parts have never had the peak processing efficiency (IPC, instructions per clock) of the latest designs and have sat in the wings, waiting. By bringing the Kaby Lake microarchitecture to HEDT in the form of a Core i7 and a Core i5, this changes the scene, albeit slightly.

Rather than bringing a new big core featuring the latest microarchitecture, Intel is repurposing the Kaby Lake-S mainstream consumer silicon, binning it to slightly more stringent requirements for frequency and power, disabling the integrated graphics, and then putting it in a package for the high-end desktop platform. There are still some significant limitations, such as having only 16 PCIe 3.0 lanes and dual channel memory which might exclude it from the traditional designation of being a true HEDT processor; however Intel has stated that these parts fill a request from customers to have the latest microarchitecture on the HEDT platform. They also overclock quite well, which is worth noting.

The Kaby Lake-X parts will consist of a Core i7 and Core i5, both of which are quad core parts, with the i7 supporting hyperthreading.

Problem Number 1: PCIe Layouts

Users can choose an X299 motherboard and a SKL-X processor, or an X299 motherboard with a KBL-X processor. Every X299 motherboard should have to support both, and it is the level of support that each processor needs that makes it a more difficult task than one might imagine. The obvious difference between the two is the number of PCIe lanes, and where they come from.

KBL-X processors have 16 PCIe 3.0 lanes from the processor, coming from a single PCIe root complex, and these can be bifurcated into x8/x8 or x8/x4/x4 depending on what the motherboard manufacturer wants to implement.

SKL-X processors have either 28 PCIe 3.0 lanes or 44 PCIe 3.0 lanes, depending on which model you buy, and these can come from up to three PCIe x16 root complexes (some complexes will be limited to fit the number). Each one can still be bifurcated into x8/x4/x4, but typically one would expect one PCIe root complex to fill the first x16, and then the next x16 comes from the second root complex, which will either be filled by x8 on the 28-lane processors or x16 on the 44-lane processors. The PCIe lanes can also fly off to support other things, such as storage or Ethernet controllers.

The issue here is that motherboard manufacturers have to design for all three different PCIe lane counts. It is very easy to design a motherboard for SKL-X, and then when a user has a KBL-X processor half the features do not work. This is going to be true for a lot of PCIe slots, and in order to manage all this, manufacturers have to equip the motherboards with PCIe switches to make sure everything is routed correctly for both CPUs, and these switches add cost to the platform. If there were two different platforms, there would not be this added per-board cost (but vendors would have to build two boards instead, each one easier to design).


Chipset Diagram of MSI's X299 XPower Gaming AC, their high-end MB

Aside from the PCIe slots, storage is also going to become an issue. With the previous generation of X99, we started to see M.2 based PCIe storage coming from the processor – guaranteeing no limit in uplink bandwidth, especially when RAID was used. For the new X299 platform, because of KBL-X support, most M.2 slots will be derived from the chipset, adding a small amount of latency but also somewhat limited by the chipset-to-CPU uplink limit of PCIe 3.0 x4. Using two fast M.2 drives in RAID via the chipset will be limited by its own connection. Motherboard vendors know this, and they want to have as many features supported as much as possible in all situations, and so finding a motherboard with a CPU-derived M.2 slot is going to be a rare thing indeed.

The solution to some of this is to have specific motherboards for each set of CPUs. Sure, both CPUs will still work in the motherboard, but when using the wrong family for the CPU design, functionality is going to be severely limited. ASUS has already posted some details about its new Extreme motherboard for X299, with the disclaimer ‘not designed for Kaby Lake-X CPUs’ because the features on board are aimed at SKL-X customers only. We might see more of this filtering through.

There’s also the DRAM: KBL-X is a dual channel design, SKL-X is quad-channel. On an eight-DRAM slot X299 motherboard, only four are operational for KBL-X, wasting board space. The primary four slots for KBL-X are different to SKL-X as well, just in case a user needs to move DRAM around. If a user buys an X299 motherboard with only four slots, chances are only two will work with KBL-X.

One argument here is that a user can upgrade from KBL-X to SKL-X later, or to the beefier KBL-X CPUs launched next generation.

Problem Number 2: Power

Skylake and Kaby Lake are different x86 microarchitectures – the KBL core design was meant to be an ‘optimization’ implementation of Skylake, hitting a few loose hanging fruit and using an updated 14nm process to give better power consumption and better voltage/frequency response from the silicon. There isn’t so much drastic change in the cores, but there is in how the power is delivered.

Skylake-X uses an integrated voltage regulator, or IVR. If you recognize the term, then that is because Intel launched its Broadwell based CPUs with a FIVR, or fully-integrated voltage regulator. Skylake-X does not go all-in like Broadwell did, but for some of the voltage inputs to the CPU, the processor takes in a single voltage and splits it internally, rather than relying on the external circuitry of the motherboard to do so. This affords some benefits, such as consistency in voltage delivery and to a certain extent, some efficiency power gains, and it should simplify the motherboard design - unless you also have to design for non-IVR CPUs, like Kaby Lake-X.

Kaby Lake-X is more of a typical power delivery design, with all the required voltages being supplied by the motherboard. That means that the motherboard has to support both types of voltage delivery, and also adjust itself at POST if a different CPU has been placed in. This obviously adds to the boot time to check if it is the same, but it does require some voltages to be moved around, as too high a voltage can kill a processor. We’ve already killed one.

Specifically, the VRIN voltage on Skylake-X needs to be 1.8V input into the processor for the IVR to work. The same setting on Kaby Lake-X needs to be 1.1 volts for VCCIO. If the motherboard originally had a SKL-X processor in it and does not detect when a KBL-X processor is in, then the motherboard will supply 1.8 volts into the KBL-X rail and the chip will soon die.

When we received samples for SKL-X and KBL-X, we were told by our motherboard partners that if we were switching between the two CPUs, we would have to flush the BIOS. This involves removing AC power when switched off, and holding the Clear CMOS button for 30-60 seconds to train the capacitors and essentially reset the BIOS to default, so it could then detect which CPU was in play before applying full voltages.

We did this, and still ended up with a dead Kaby Lake i7-7740X. There is now a lump of sand in my CPU box. The interesting thing is that this CPU did not die instantly: we started SYSMark, which involves several reboots during the initial preparation phase. On about the 4th reboot, the system stuck with the BIOS code 0d. Nothing I did was able to go beyond this code, and I put in our Kaby i5 and that ran fine. I put in SKL-X and that ran fine. I put the Kaby i5 in and that ran benchmarks fine. It would appear that our initial Kaby i7 did not have much headroom, and we had to get a replacement for some of the benchmarks.

Incidentally, we also had an i9-7900X die on us. That seems to be unrelated to this issue.

So The Solution?

Motherboard manufacturers have told us that there may be chip-specific motherboards out there in the future. But as it stands, users looking at KBL-X would save a lot of money (and headache) staying with Z270, as the motherboards are cheaper and more streamlined for a Kaby Lake design. Users looking at the top Skylake-X CPUs have nothing to worry about – unless the user really wanted PCIe storage from the CPU. In this case the user will have to find the one or two motherboards that support it, or invest in a PCIe to M.2 riser card and enable it this way – as long as the user puts it into a CPU-based PCIe slot.

So why even bother testing KBL-X if it comes across so downbeat on the platform situation? KBL-X still warrants testing, as the highest frequency processor Intel has released on the latest CPU microarchitecture. As mentioned above, overclocking on KBL-X seems very good, and some users will require the peak single thread performance possible. The argument is more the fact that some of these issues can complicate the platform, reducing accessibility to new builders and causing extra work/time/headaches for motherboard manufacturers and system builders. The issues above are not a significant barrier for any user willing to put in some time to ensure what they buy is suited for their workload. 



Test Bed and Setup

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer. Where possible, we will extend our testing to include faster memory modules either at the same time as the review or a later date.

Test Setup
Processor Intel Core i7-7740X (4C/8T, 112W, 4.3 GHz)
Intel Core i5-7640X (4C/4T, 112W, 4.0 GHz)
Motherboards ASRock X299 Taichi 
MSI X299 Gaming Pro Carbon 
GIGABYTE X299 Gaming 9
Cooling Thermalright TRUE Copper
Silverstone AR10-115XS (for LGA1151)
Power Supply Corsair AX760i PSU 
Corsair AX1200i Platinum PSU
Memory Corsair Vengeance Pro DDR4-2666 2x8 GB
Video Cards MSI GTX 1080 Gaming 8GB 
ASUS GTX 1060 Strix 
Sapphire R9 Fury 4GB 
Sapphire RX 480 8GB 
Sapphire RX 460 2GB
Hard Drive Crucial MX200 1TB
Optical Drive LG GH22NS50
Case Open Test Bed
Operating System Windows 10 Pro 64-bit

Many thanks to...

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Thank you to Sapphire for providing us with several of their AMD GPUs. We met with Sapphire back at Computex 2016 and discussed a platform for our future testing on AMD GPUs with their hardware for several upcoming projects. As a result, they were able to sample us the latest silicon that AMD has to offer. At the top of the list was a pair of Sapphire Nitro R9 Fury 4GB GPUs, based on the first generation of HBM technology and AMD’s Fiji platform. As the first consumer GPU to use HDM, the R9 Fury is a key moment in graphics history, and this Nitro cards come with 3584 SPs running at 1050 MHz on the GPU with 4GB of 4096-bit HBM memory at 1000 MHz.

Further Reading: AnandTech’s Sapphire Nitro R9 Fury Review

Following the Fury, Sapphire also supplied a pair of their latest Nitro RX 480 8GB cards to represent AMD’s current performance silicon on 14nm (as of March 2017). The move to 14nm yielded significant power consumption improvements for AMD, which combined with the latest version of GCN helped bring the target of a VR-ready graphics card as close to $200 as possible. The Sapphire Nitro RX 480 8GB OC graphics card is designed to be a premium member of the RX 480 family, having a full set of 8GB of GDDR5 memory at 6 Gbps with 2304 SPs at 1208/1342 MHz engine clocks.

Further Reading: AnandTech’s AMD RX 480 Review

With the R9 Fury and RX 480 assigned to our gaming tests, Sapphire also passed on a pair of RX 460s to be used as our CPU testing cards. The amount of GPU power available can have a direct effect on CPU performance, especially if the CPU has to spend all its time dealing with the GPU display. The RX 460 is a nice card to have here, as it is powerful yet low on power consumption and does not require any additional power connectors. The Sapphire Nitro RX 460 2GB still follows on from the Nitro philosophy, and in this case is designed to provide power at a low price point. Its 896 SPs run at 1090/1216 MHz frequencies, and it is paired with 2GB of GDDR5 at an effective 7000 MHz.

We must also say thank you to MSI for providing us with their GTX 1080 Gaming X 8GB GPUs. Despite the size of AnandTech, securing high-end graphics cards for CPU gaming tests is rather difficult. MSI stepped up to the plate in good fashion and high spirits with a pair of their high-end graphics. The MSI GTX 1080 Gaming X 8GB graphics card is their premium air cooled product, sitting below the water cooled Seahawk but above the Aero and Armor versions. The card is large with twin Torx fans, a custom PCB design, Zero-Frozr technology, enhanced PWM and a big backplate to assist with cooling.  The card uses a GP104-400 silicon die from a 16nm TSMC process, contains 2560 CUDA cores, and can run up to 1847 MHz in OC mode (or 1607-1733 MHz in Silent mode). The memory interface is 8GB of GDDR5X, running at 10010 MHz. For a good amount of time, the GTX 1080 was the card at the king of the hill.

Further Reading: AnandTech’s NVIDIA GTX 1080 Founders Edition Review

Thank you to ASUS for providing us with their GTX 1060 6GB Strix GPU. To complete the high/low cases for both AMD and NVIDIA GPUs, we looked towards the GTX 1060 6GB cards to balance price and performance while giving a hefty crack at >1080p gaming in a single graphics card. ASUS offered a hand here, supplying a Strix variant of the GTX 1060. This card is even longer than our GTX 1080, with three fans and LEDs crammed under the hood. STRIX is now ASUS’ lower cost gaming brand behind ROG, and the Strix 1060 sits at nearly half a 1080, with 1280 CUDA cores but running at 1506 MHz base frequency up to 1746 MHz in OC mode. The 6 GB of GDDR5 runs at a healthy 8008 MHz across a 192-bit memory interface.

Further Reading: AnandTech’s ASUS GTX 1060 6GB STRIX Review

Thank you to Crucial for providing us with MX200 SSDs. Crucial stepped up to the plate as our benchmark list grows larger with newer benchmarks and titles, and the 1TB MX200 units are strong performers. Based on Marvell's 88SS9189 controller and using Micron's 16nm 128Gbit MLC flash, these are 7mm high, 2.5-inch drives rated for 100K random read IOPs and 555/500 MB/s sequential read and write speeds. The 1TB models we are using here support TCG Opal 2.0 and IEEE-1667 (eDrive) encryption and have a 320TB rated endurance with a three-year warranty.

Further Reading: AnandTech's Crucial MX200 (250 GB, 500 GB & 1TB) Review

Thank you to Corsair for providing us with an AX1200i PSU. The AX1200i was the first power supply to offer digital control and management via Corsair's Link system, but under the hood it commands a 1200W rating at 50C with 80 PLUS Platinum certification. This allows for a minimum 89-92% efficiency at 115V and 90-94% at 230V. The AX1200i is completely modular, running the larger 200mm design, with a dual ball bearing 140mm fan to assist high-performance use. The AX1200i is designed to be a workhorse, with up to 8 PCIe connectors for suitable four-way GPU setups. The AX1200i also comes with a Zero RPM mode for the fan, which due to the design allows the fan to be switched off when the power supply is under 30% load.

Further Reading: AnandTech's Corsair AX1500i Power Supply Review

Thank you to G.Skill for providing us with memory. G.Skill has been a long-time supporter of AnandTech over the years, for testing beyond our CPU and motherboard memory reviews. We've reported on their high capacity and high-frequency kits, and every year at Computex G.Skill holds a world overclocking tournament with liquid nitrogen right on the show floor.

Further Reading: AnandTech's Memory Scaling on Haswell Review, with G.Skill DDR3-3000



2017 CPU Benchmarking

For our review, we are implementing our fresh CPU testing benchmark suite, using new scripts developed specifically for this testing. This means that with a fresh OS install, we can configure the OS to be more consistent, install the new benchmarks, maintain version consistency without random updates and start running the tests in under 5 minutes. After that it's a one button press to start an 8-10hr test (with a high-performance core) with nearly 100 relevant data points in the benchmarks given below. The tests cover a wide range of segments, some of which will be familiar but some of the tests are new to benchmarking in general, but still highly relevant for the markets they come from.

Our new CPU tests go through six main areas. We cover the Web (we've got an un-updateable version of Chrome 56), general system tests (opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion), rendering (ray tracing, modeling), encoding (compression, AES, h264 and HEVC), office based tests (PCMark and others), and our legacy tests, throwbacks from another generation of bad code but interesting to compare.

Our graphs typically list CPUs with microarchitecture, SKU name, cost and power. The cost will be one of two numbers, either the 1k unit price 'tray price' for when a business customer purchases 1000 CPUs, or the MSRP likely to be found at retail. The problem here is that neither Intel nor AMD are consistent: Intel has a tray price for every CPU, but an MSRP only for parts sold at retail. AMD typically quotes MSRP for CPUs at retail, tray prices for enterprise CPUs, and doesn't say much about OEM only parts. We try to find a balance here, so prices may be $10-$20 from what you might expect.

A side note on OS preparation. As we're using Windows 10, there's a large opportunity for something to come in and disrupt our testing. So our default strategy is multiple: disable the ability to update as much as possible, disable Windows Defender, uninstall OneDrive, disable Cortana as much as possible, implement the high performance mode in the power options, and disable the internal platform clock which can drift away from being accurate if the base frequency drifts (and thus the timing ends up inaccurate).

Web Tests on Chrome 56

Sunspider 1.0.2
Mozilla Kraken 1.1
Google Octane 2.0
WebXPRT15

System Tests

PDF Opening
FCAT
3DPM v2.1
Dolphin v5.0
DigiCortex v1.20
Agisoft PhotoScan v1.0

Rendering Tests

Corona 1.3
Blender 2.78
LuxMark CPU C++
LuxMark CPU OpenCL
POV-Ray 3.7.1b4
Cinebench R15 ST
Cinebench R15 MT

Encoding Tests

7-Zip 9.2
WinRAR 5.40
AES Encoding (TrueCrypt 7.2)
HandBrake v1.0.2 x264 LQ
HandBrake v1.0.2 x264-HQ
HandBrake v1.0.2 HEVC-4K

Office / Professional

PCMark8
Chromium Compile (v56)
SYSmark 2014 SE

Legacy Tests

3DPM v1 ST / MT
x264 HD 3 Pass 1, Pass 2
Cinebench R11.5 ST / MT
Cinebench R10 ST / MT

A side note - a couple of benchmarks (LuxMark) weren't fully 100% giving good data during testing. Need to go back and re-work this part of our testing.

2017 CPU Gaming Tests

For our new set of GPU tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there's a chance to save $50 and get a better graphics card for no loss in performance, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.

Our original list of nine games planned in February quickly became six, due to the lack of professional-grade controls on Ubisoft titles. If you want to see For Honor, Steep or Ghost Recon: Wildlands benchmarked on AnandTech, point Ubisoft Annecy or Ubisoft Montreal in my direction. While these games have in-game benchmarks worth using, unfortunately they do not provide enough frame-by-frame detail to the end user, despite using it internally to produce the data the user eventually sees (and it typically ends up obfuscated by another layer as well). I would instead perhaps choose to automate these benchmarks via inputs, however the extremely variable loading time is a strong barrier to this.

So we have the following benchmarks as part of our 4/2 script, automated to the point of a one-button run and out pops the results four hours later, per GPU. Also listed are the resolutions and settings used.

  • Civilization 6 (1080p Ultra, 4K Ultra)
  • Ashes of the Singularity: Escalation* (1080p Extreme, 4K Extreme)
  • Shadow of Mordor (1080p Ultra, 4K Ultra)
  • Rise of the Tomb Raider #1 - GeoValley (1080p High, 4K Medium)
  • Rise of the Tomb Raider #2 - Prophets (1080p High, 4K Medium)
  • Rise of the Tomb Raider #3 - Mountain (1080p High, 4K Medium)
  • Rocket League (1080p Ultra, 4K Ultra)
  • Grand Theft Auto V (1080p Very High, 4K High)

For each of the GPUs in our testing, these games (at each resolution/setting combination) are run four times each, with outliers discarded. Average frame rates, 99th percentiles and 'Time Under x FPS' data is sorted, and the raw data is archived.

The four GPUs we've managed to obtain for these tests are:

  • MSI GTX 1080 Gaming X 8G
  • ASUS GTX 1060 Strix 6G
  • Sapphire Nitro R9 Fury 4GB
  • Sapphire Nitro RX 480 8GB

In our testing script, we save a couple of special things for the GTX 1080 here. The following tests are also added:

  • Civilization 6 (8K Ultra, 16K Lowest)

This benchmark, with a little coercion, are able to be run beyond the specifications of the monitor being used, allowing for 'future' testing of GPUs at 8K and 16K with some amusing results. We are only running these tests on the GTX 1080, because there's no point watching a slideshow more than once.



Benchmarking Performance: CPU System Tests

Our first set of tests is our general system tests. These set of tests are meant to emulate more about what people usually do on a system, like opening large files or processing small stacks of data. This is a bit different to our office testing, which uses more industry standard benchmarks, and a few of the benchmarks here are relatively new and different.

All of our benchmark results can also be found in our benchmark engine, Bench.

PDF Opening

First up is a self-penned test using a monstrous PDF we once received in advance of attending an event. While the PDF was only a single page, it had so many high-quality layers embedded it was taking north of 15 seconds to open and to gain control on the mid-range notebook I was using at the time. This put it as a great candidate for our 'let's open an obnoxious PDF' test. Here we use Adobe Reader DC, and disable all the update functionality within. The benchmark sets the screen to 1080p, opens the PDF to in fit-to-screen mode, and measures the time from sending the command to open the PDF until it is fully displayed and the user can take control of the software again. The test is repeated ten times, and the average time taken. Results are in milliseconds.

System: PDF Opening with Adobe Reader DC

PDF opening is all about single thread frequency and IPC, giving the win to the new KBL-X chips.

FCAT Processing: link

One of the more interesting workloads that has crossed our desks in recent quarters is FCAT - the tool we use to measure stuttering in gaming due to dropped or runt frames. The FCAT process requires enabling a color-based overlay onto a game, recording the gameplay, and then parsing the video file through the analysis software. The software is mostly single-threaded, however because the video is basically in a raw format, the file size is large and requires moving a lot of data around. For our test, we take a 90-second clip of the Rise of the Tomb Raider benchmark running on a GTX 980 Ti at 1440p, which comes in around 21 GB, and measure the time it takes to process through the visual analysis tool.

System: FCAT Processing ROTR 1440p GTX1080 Data

FCAT similarly favors frequency and IPC. For this sort of workload, the Core i7 is the best chip to get.

3D Movement Algorithm Test v2.1: link

This is the latest version of the self-penned 3DPM benchmark. The goal of 3DPM is to simulate semi-optimized scientific algorithms taken directly from my doctorate thesis. Version 2.1 improves over 2.0 by passing the main particle structs by reference rather than by value, and decreasing the amount of double->float->double recasts the compiler was adding in. It affords a ~25% speed-up over v2.0, which means new data.

System: 3D Particle Movement v2.1

As 3DPM expands into several threads, the new quad-core parts will easily get trounced here by AMD's 8-cores for the same price. The Core i7-7800X puts on a good showing, as per core Intel's chips give a higher score.

DigiCortex v1.20: link

Despite being a couple of years old, the DigiCortex software is a pet project for the visualization of neuron and synapse activity in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron/1.8B synapse simulation. The results on the output are given as a fraction of whether the system can simulate in real-time, so anything above a value of one is suitable for real-time work. The benchmark offers a 'no firing synapse' mode, which in essence detects DRAM and bus speed, however we take the firing mode which adds CPU work with every firing.

System: DigiCortex 1.20 (32k Neuron, 1.8B Synapse)

DigiCortex likes a bit of everything: cores, threads, IPC, frequency, uncore frequency, and memory frequency. The Core i7 parts roughly double the Core i5s due to the thread count, and also the AMD Ryzen parts skip ahead as well due to having double the threads to the Core i7.

Agisoft Photoscan 1.0: link

Photoscan stays in our benchmark suite from the previous version, however now we are running on Windows 10 so features such as Speed Shift on the latest processors come into play. The concept of Photoscan is translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the model. The algorithm has four stages, some single threaded and some multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.

System: Agisoft Photoscan 1.0 Total Time

Agisoft is like a Formula 1 race circuit: the long fast straights and techical corners make it a nightmare to have the technology to be the best at both, and Photoscan has enough serial code for high single thread performance to take advantage but also massively parallel sections where having 12-18 threads makes a difference.  Despite having half the threads, the single core performance of the Core i7-7740X makes it pull ahead of the Ryzen 7 chips, but when comparing the four threads of the Core i5-7640X to the twelve threads of the Ryzen 5 processors, having 12 threads wins.



Benchmarking Performance: CPU Rendering Tests

Rendering tests are a long-time favorite of reviewers and benchmarkers, as the code used by rendering packages is usually highly optimized to squeeze every little bit of performance out. Sometimes rendering programs end up being heavily memory dependent as well - when you have that many threads flying about with a ton of data, having low latency memory can be key to everything. Here we take a few of the usual rendering packages under Windows 10, as well as a few new interesting benchmarks.

All of our benchmark results can also be found in our benchmark engine, Bench.

Corona 1.3: link

Corona is a standalone package designed to assist software like 3ds Max and Maya with photorealism via ray tracing. It's simple - shoot rays, get pixels. OK, it's more complicated than that, but the benchmark renders a fixed scene six times and offers results in terms of time and rays per second. The official benchmark tables list user submitted results in terms of time, however I feel rays per second is a better metric (in general, scores where higher is better seem to be easier to explain anyway). Corona likes to pile on the threads, so the results end up being very staggered based on thread count.

Rendering: Corona Photorealism

More threads win the day, although the Core i7 does knock at the door of the Ryzen 5 (presumably with $110 in hand as well). It is worth noting that the Core i5-7640X and the older Core i7-2600K are on equal terms.

Blender 2.78: link

For a render that has been around for what seems like ages, Blender is still a highly popular tool. We managed to wrap up a standard workload into the February 5 nightly build of Blender and measure the time it takes to render the first frame of the scene. Being one of the bigger open source tools out there, it means both AMD and Intel work actively to help improve the codebase, for better or for worse on their own/each other's microarchitecture.

Rendering: Blender 2.78

Similar to Corona, more threads means a faster time.

LuxMark v3.1: Link

As a synthetic, LuxMark might come across as somewhat arbitrary as a renderer, given that it's mainly used to test GPUs, but it does offer both an OpenCL and a standard C++ mode. In this instance, aside from seeing the comparison in each coding mode for cores and IPC, we also get to see the difference in performance moving from a C++ based code-stack to an OpenCL one with a CPU as the main host.

Rendering: LuxMark CPU C++

Rendering: LuxMark CPU OpenCL

Luxmark is more thread and cache dependent, and so the Core i7 nips at the heels of the AMD parts with double the threads. The Core i5 sits behind the the Ryzen 5 parts though, due to the 1:3 thread difference.

POV-Ray 3.7.1b4: link

Another regular benchmark in most suites, POV-Ray is another ray-tracer but has been around for many years. It just so happens that during the run up to AMD's Ryzen launch, the code base started to get active again with developers making changes to the code and pushing out updates. Our version and benchmarking started just before that was happening, but given time we will see where the POV-Ray code ends up and adjust in due course.

Rendering: POV-Ray 3.7

Mirror Mirror on the wall...

Cinebench R15: link

The latest version of CineBench has also become one of those 'used everywhere' benchmarks, particularly as an indicator of single thread performance. High IPC and high frequency gives performance in ST, whereas having good scaling and many cores is where the MT test wins out.

Rendering: CineBench 15 SingleThreaded

Rendering: CineBench 15 MultiThreaded

CineBench gives us singlethreaded numbers, and it is clear who rules the roost, almost scoring 200. The Core i7-2600K, due to its lack of instruction support, sits in the corner.



Benchmarking Performance: CPU Web Tests

One of the issues when running web-based tests is the nature of modern browsers to automatically install updates. This means any sustained period of benchmarking will invariably fall foul of the 'it's updated beyond the state of comparison' rule, especially when browsers will update if you give them half a second to think about it. Despite this, we were able to find a series of commands to create an un-updatable version of Chrome 56 for our 2017 test suite. While this means we might not be on the bleeding edge of the latest browser, it makes the scores between CPUs comparable.

All of our benchmark results can also be found in our benchmark engine, Bench.

SunSpider 1.0.2: link

The oldest web-based benchmark in this portion of our test is SunSpider. This is a very basic javascript algorithm tool, and ends up being more a measure of IPC and latency than anything else, with most high-performance CPUs scoring around about the same. The basic test is looped 10 times and the average taken. We run the basic test 4 times.

Web: SunSpider on Chrome 56

SunSpider has a single threaded focus, and we see the Kaby Lake-X processors take their spots at the top of the graph.

Mozilla Kraken 1.1: link

Kraken is another Javascript based benchmark, using the same test harness as SunSpider, but focusing on more stringent real-world use cases and libraries, such as audio processing and image filters. Again, the basic test is looped ten times, and we run the basic test four times.

Web: Mozilla Kraken 1.1 on Chrome 56

Mozilla too relies on single threaded IPC and frequency. 

Google Octane 2.0: link

Along with Mozilla, as Google is a major browser developer, having peak JS performance is typically a critical asset when comparing against the other OS developers. In the same way that SunSpider is a very early JS benchmark, and Kraken is a bit newer, Octane aims to be more relevant to real workloads, especially in power constrained devices such as smartphones and tablets.

Web: Google Octane 2.0 on Chrome 56

Octane is an interesting benchmark, requiring cores and ST performance, but mostly the latter. It also seems that either Intel's design is optimized for the benchmark or vice versa, given the substantial difference in performance. There's no way for the benchmark to use all of the threads from AMD, nor the 12 threads in the Core i7-7800X which has a lower single thread performance.

WebXPRT 2015: link

While the previous three benchmarks do calculations in the background and represent a score, WebXPRT is designed to be a better interpretation of visual workloads that a professional user might have, such as browser based applications, graphing, image editing, sort/analysis, scientific analysis and financial tools.

Web: WebXPRT 15 on Chrome 56

WebXPRT is a mix of ST and MT, but still based in the web and relies on ST performance a lot. Given the variable loading on the benchmark, Intel's newest features such as Speed Shift help keep it at the top.



Benchmarking Performance: CPU Encoding Tests

One of the interesting elements on modern processors is encoding performance. This includes encryption/decryption, as well as video transcoding from one video format to another. In the encrypt/decrypt scenario, this remains pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security. Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.

All of our benchmark results can also be found in our benchmark engine, Bench.

7-Zip 9.2: link

One of the freeware compression tools that offers good scaling performance between processors is 7-Zip. It runs under an open-source licence, is fast, and easy to use tool for power users. We run the benchmark mode via the command line for four loops and take the output score.

Encoding: 7-Zip

7z loves threads. 7z loves it.

WinRAR 5.40: link

For the 2017 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user-friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack (33 video files in 1.37 GB, 2834 smaller website files in 370 folders in 150 MB) of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test 10 times and take the average of the last five runs when the benchmark is in a steady state.

Encoding: WinRAR 5.40

WinRAR is another benchmark like Agisoft, with some parts being serial and others multithreaded. When we compare the Core i7 to the Ryzen 7, the high ST performance helps push the Core i7 to the top despite the 2:1 thread deficit. On the other hand, the Core i5 has a 3:1 thread defecit to the Ryzen 5, and falls beneath it in the results.

AES Encoding

Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.

Encoding: AES

AES is an optimized problem for modern processors, so add frequency and cores to get a proportionally better result. Again, the Core i7-2600K and the Core i5-7640X are almost neck-and-neck.

HandBrake v1.0.2 H264 and HEVC: link

As mentioned above, video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codec, VP9, there are two others that are taking hold: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content.

Handbrake is a favored tool for transcoding, and so our test regime takes care of three areas.

Low Quality/Resolution H264: He we transcode a 640x266 H264 rip of a 2 hour film, and change the encoding from Main profile to High profile, using the very-fast preset.

Encoding: Handbrake H264 (LQ)

High Quality/Resolution H264: A similar test, but this time we take a ten-minute double 4K (3840x4320) file running at 60 Hz and transcode from Main to High, using the very-fast preset.

Encoding: Handbrake H264 (HQ)

HEVC Test: Using the same video in HQ, we change the resolution and codec of the original video from 4K60 in H264 into 4K60 HEVC.

Encoding: Handbrake HEVC (4K)



Benchmarking Performance: CPU Office Tests

The office programs we use for benchmarking aren't specific programs per-se, but industry standard tests that hold weight with professionals. The goal of these tests is to use an array of software and techniques that a typical office user might encounter, such as video conferencing, document editing, architectural modeling, and so on and so forth.

All of our benchmark results can also be found in our benchmark engine, Bench.

Chromium Compile (v56)

Our new compilation test uses Windows 10 Pro, VS Community 2015.3 with the Win10 SDK to compile a nightly build of Chromium. We've fixed the test for a build in late March 2017, and we run a fresh full compile in our test. Compilation is the typical example given of a variable threaded workload - some of the compile and linking is linear, whereas other parts are multithreaded.

Office: Chromium Compile (v56)

Our Chrome Compile test is a mix of load, but also loves L3 cache. We've seen before that the L3 victim cache on AMD can be a defecit here, but even then the Core i5 cannot overcome the 3:1 thread deficit to the Ryzen 5 CPUs. The Core i7-7740X hits the nail on the head for threads and single thread performance, although users that play in this space would look straight to the Core i7-7800X, and likely decide that +16.5% better performance is worth the +18.2% extra cost.

PCMark8: link

Despite originally coming out in 2008/2009, Futuremark has maintained PCMark8 to remain relevant in 2017. On the scale of complicated tasks, PCMark focuses more on the low-to-mid range of professional workloads, making it a good indicator for what people consider 'office' work. We run the benchmark from the commandline in 'conventional' mode, meaning C++ over OpenCL, to remove the graphics card from the equation and focus purely on the CPU. PCMark8 offers Home, Work and Creative workloads, with some software tests shared and others unique to each benchmark set.

Office: PCMark8 Creative (non-OpenCL)

Office: PCMark8 Home (non-OpenCL)

Office: PCMark8 Work (non-OpenCL)

SYSmark 2014 SE: link

SYSmark is developed by Bapco, a consortium of industry CPU companies. The goal of SYSmark is to take stripped down versions of popular software, such as Photoshop and Onenote, and measure how long it takes to process certain tasks within that software. The end result is a score for each of the three segments (Office, Media, Data) as well as an overall score. Here a reference system (Core i3-6100, 4GB DDR3, 256GB SSD, Integrated HD 530 graphics) is used to provide a baseline score of 1000 in each test.

A note on context for these numbers. AMD left Bapco in the last two years, due to differences of opinion on how the benchmarking suites were chosen and AMD believed the tests are angled towards Intel processors and had optimizations to show bigger differences than what AMD felt was present. The following benchmarks are provided as data, but the conflict of opinion between the two companies on the validity of the benchmark is provided as context for the following numbers.

Office: SYSMark 2014 SE (Office)Office: SYSMark 2014 SE (Media)Office: SYSMark 2014 SE (Data)Office: SYSMark 2014 SE (Responsiveness)

Office: SYSMark 2014 SE (Overall)



Benchmarking Performance: CPU Legacy Tests

Our legacy tests represent benchmarks that were once at the height of their time. Some of these are industry standard synthetics, and we have data going back over 10 years. All of the data here has been rerun on Windows 10, and we plan to go back several generations of components to see how performance has evolved.

All of our benchmark results can also be found in our benchmark engine, Bench.

3D Particle Movement v1

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. This is the original version, written in the style of a typical non-computer science student coding up an algorithm for their theoretical problem, and comes without any non-obvious optimizations not already performed by the compiler, such as false sharing.

Legacy: 3DPM v1 Single Threaded

Legacy: 3DPM v1 MultiThreaded

CineBench 11.5 and 10

Cinebench is a widely known benchmarking tool for measuring performance relative to MAXON's animation software Cinema 4D. Cinebench has been optimized over a decade and focuses on purely CPU horsepower, meaning if there is a discrepancy in pure throughput characteristics, Cinebench is likely to show that discrepancy. Arguably other software doesn't make use of all the tools available, so the real world relevance might purely be academic, but given our large database of data for Cinebench it seems difficult to ignore a small five-minute test. We run the modern version 15 in this test, as well as the older 11.5 and 10 due to our back data.

Legacy: CineBench 11.5 Single Threaded

Legacy: CineBench 11.5 MultiThreaded

Legacy: CineBench 10 Single Threaded

Legacy: 3DPM v1 MultiThreaded

x264 HD 3.0

Similarly, the x264 HD 3.0 package we use here is also kept for historic regressional data. The latest version is 5.0.1, and encodes a 1080p video clip into a high-quality x264 file. Version 3.0 only performs the same test on a 720p file, and in most circumstances the software performance hits its limit on high-end processors, but still works well for mainstream and low-end. Also, this version only takes a few minutes, whereas the latest can take over 90 minutes to run.

Legacy: x264 3.0 Pass 1

Legacy: x264 3.0 Pass 2



Civilization 6

First up in our CPU gaming tests is Civilization 6. Originally penned by Sid Meier and his team, the Civ series of turn-based strategy games are a cult classic, and many an excuse for an all-nighter trying to get Gandhi to declare war on you due to an integer overflow. Truth be told I never actually played the first version, but every edition from the second to the sixth, including the fourth as voiced by the late Leonard Nimoy, it a game that is easy to pick up, but hard to master.

Benchmarking Civilization has always been somewhat of an oxymoron – for a turn based strategy game, the frame rate is not necessarily the important thing here and even in the right mood, something as low as 5 frames per second can be enough. With Civilization 6 however, Firaxis went hardcore on visual fidelity, trying to pull you into the game. As a result, Civilization can taxing on graphics and CPUs as we crank up the details, especially in DirectX 12.

Perhaps a more poignant benchmark would be during the late game, when in the older versions of Civilization it could take 20 minutes to cycle around the AI players before the human regained control. The new version of Civilization has an integrated ‘AI Benchmark’, although it is not currently part of our benchmark portfolio yet, due to technical reasons which we are trying to solve. Instead, we run the graphics test, which provides an example of a mid-game setup at our settings.

At both 1920x1080 and 4K resolutions, we run the same settings. Civilization 6 has sliders for MSAA, Performance Impact and Memory Impact. The latter two refer to detail and texture size respectively, and are rated between 0 (lowest) to 5 (extreme). We run our Civ6 benchmark in position four for performance (ultra) and 0 on memory, with MSAA set to 2x.

For reviews where we include 8K and 16K benchmarks (Civ6 allows us to benchmark extreme resolutions on any monitor) on our GTX 1080, we run the 8K tests similar to the 4K tests, but the 16K tests are set to the lowest option for Performance.

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

MSI GTX 1080 Gaming 8G Performance


1080p
 
4K
 
8K

16K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K
 

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

Civilization 6 Conclusion

In all our testing scenarios, AMD wins at 1080p with minor margins on the frame rates but considerable gains in the time under analysis. Intel pushes ahead in almost all of the 4K results, except with the time under analysis at 4K using an R9 Fury, perhaps indicating that AMD is offering a steadier range in its frame rate, despite the average being lower.



Ashes of the Singularity: Escalation

Seen as the holy child of DirectX12, Ashes of the Singularity (AoTS, or just Ashes) has been the first title to actively go explore as many of DirectX12s features as it possibly can. Stardock, the developer behind the Nitrous engine which powers the game, has ensured that the real-time strategy title takes advantage of multiple cores and multiple graphics cards, in as many configurations as possible.

As a real-time strategy title, Ashes is all about responsiveness during both wide open shots but also concentrated battles. With DirectX12 at the helm, the ability to implement more draw calls per second allows the engine to work with substantial unit depth and effects that other RTS titles had to rely on combined draw calls to achieve, making some combined unit structures ultimately very rigid.

Stardock clearly understand the importance of an in-game benchmark, ensuring that such a tool was available and capable from day one, especially with all the additional DX12 features used and being able to characterize how they affected the title for the developer was important. The in-game benchmark performs a four minute fixed seed battle environment with a variety of shots, and outputs a vast amount of data to analyze.

For our benchmark, we run a fixed v2.11 version of the game due to some peculiarities of the splash screen added after the merger with the standalone Escalation expansion, and have an automated tool to call the benchmark on the command line. (Prior to v2.11, the benchmark also supported 8K/16K testing, however v2.11 has odd behavior which nukes this.)

At both 1920x1080 and 4K resolutions, we run the same settings. Ashes has dropdown options for MSAA, Light Quality, Object Quality, Shading Samples, Shadow Quality, Textures, and separate options for the terrain. There are several presents, from Very Low to Extreme: we run our benchmarks at Extreme settings, and take the frame-time output for our average, percentile, and time under analysis.

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

Ashes Conclusion

Pretty much across the board, no matter the GPU or the resolution, Intel gets the win here. This is most noticable in the time under analysis, although AMD seems to do better when the faster cards are running at the lower resolution. That's nothing to brag about though.



Shadow of Mordor

The next title in our testing is a battle of system performance with the open world action-adventure title, Middle Earth: Shadow of Mordor (SoM for short). Produced by Monolith and using the LithTech Jupiter EX engine and numerous detail add-ons, SoM goes for detail and complexity. The main story itself was written by the same writer as Red Dead Redemption, and it received Zero Punctuation’s Game of The Year in 2014.

A 2014 game is fairly old to be testing now, however SoM has a stable code and player base, and can still stress a PC down to the ones and zeroes. At the time, SoM was unique, offering a dynamic screen resolution setting allowing users to render at high resolutions that are then scaled down to the monitor. This form of natural oversampling was designed to let the user experience a truer vision of what the developers wanted, assuming you had the graphics hardware to power it but had a sub-4K monitor.

The title has an in-game benchmark, for which we run with an automated script implement the graphics settings, select the benchmark, and parse the frame-time output which is dumped on the drive. The graphics settings include standard options such as Graphical Quality, Lighting, Mesh, Motion Blur, Shadow Quality, Textures, Vegetation Range, Depth of Field, Transparency and Tessellation. There are standard presets as well.

We run the benchmark at 1080p and a native 4K, using our 4K monitors, at the Ultra preset. Results are averaged across four runs and we report the average frame rate, 99th percentile frame rate, and time under analysis. 

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

Shadow of Mordor Conclusions

Again, a win across the board for Intel, with the Core i7 taking the top spot in pretty much every scenario. AMD isn't that far behind for the most part.



Rise of the Tomb Raider

One of the newest games in the gaming benchmark suite is Rise of the Tomb Raider (RoTR), developed by Crystal Dynamics, and the sequel to the popular Tomb Raider which was loved for its automated benchmark mode. But don’t let that fool you: the benchmark mode in RoTR is very much different this time around.

Visually, the previous Tomb Raider pushed realism to the limits with features such as TressFX, and the new RoTR goes one stage further when it comes to graphics fidelity. This leads to an interesting set of requirements in hardware: some sections of the game are typically GPU limited, whereas others with a lot of long-range physics can be CPU limited, depending on how the driver can translate the DirectX 12 workload.

Where the old game had one benchmark scene, the new game has three different scenes with different requirements: Geothermal Valley (1-Valley), Prophet’s Tomb (2-Prophet) and Spine of the Mountain (3-Mountain) - and we test all three. These are three scenes designed to be taken from the game, but it has been noted that scenes like 2-Prophet shown in the benchmark can be the most CPU limited elements of that entire level, and the scene shown is only a small portion of that level. Because of this, we report the results for each scene on each graphics card separately.

 

Graphics options for RoTR are similar to other games in this type, offering some presets or allowing the user to configure texture quality, anisotropic filter levels, shadow quality, soft shadows, occlusion, depth of field, tessellation, reflections, foliage, bloom, and features like PureHair which updates on TressFX in the previous game.

Again, we test at 1920x1080 and 4K using our native 4K displays. At 1080p we run the High preset, while at 4K we use the Medium preset which still takes a sizable hit in frame rate.

It is worth noting that RoTR is a little different to our other benchmarks in that it keeps its graphics settings in the registry rather than a standard ini file, and unlike the previous TR game the benchmark cannot be called from the command-line. Nonetheless we scripted around these issues to automate the benchmark four times and parse the results. From the frame time data, we report the averages, 99th percentiles, and our time under analysis.

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

#1 Geothermal Valley

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

RoTR: Geothermal Valley Conclusions

If we were testing a single GTX 1080 at 1080p, you might think that the graph looks a little odd. All the quad-core, non HT processors (so, the Core i5s) get the best frame rates and percentiles on this specific test on this specific hardware by a good margin. The rest of the tests do not mirror that result though, with the results ping-ponging between Intel and AMD depending on the resolution and the graphics card.

#2 Prophet's Tomb 

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

RoTR: Prophet's Tomb Conclusions

For Prophet's Tomb, we again see the Core i5s pull a win at 1080p using the GTX 1080, but the rest of the tests are a mix of results, some siding with AMD and others for Intel. There is the odd outlier in the Time Under analysis, which may warrant further inspection.

#3 Spine of the Mountain 

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

RoTR: Spine of the Mountain Conclusions

Core i5, we're assigning you to run at 1080p with a GTX 1080. That's an order. The rest of you, stand easy.



Rocket League

Hilariously simple pick-up-and-play games are great fun. I'm a massive fan of the Katamari franchise for that reason — passing start on a controller and rolling around, picking up things to get bigger, is extremely simple. Until we get a PC version of Katamari that I can benchmark, we'll focus on Rocket League.

Rocket League combines the elements of pick-up-and-play, allowing users to jump into a game with other people (or bots) to play football with cars with zero rules. The title is built on Unreal Engine 3, which is somewhat old at this point, but it allows users to run the game on super-low-end systems while still taxing the big ones. Since the release in 2015, it has sold over 5 million copies and seems to be a fixture at LANs and game shows. Users who train get very serious, playing in teams and leagues with very few settings to configure, and everyone is on the same level. Rocket League is quickly becoming one of the favored titles for e-sports tournaments, especially when e-sports contests can be viewed directly from the game interface.

Based on these factors, plus the fact that it is an extremely fun title to load and play, we set out to find the best way to benchmark it. Unfortunately for the most part automatic benchmark modes for games are few and far between. Partly because of this, but also on the basis that it is built on the Unreal 3 engine, Rocket League does not have a benchmark mode. In this case, we have to develop a consistent run and record the frame rate.

Read our initial analysis on our Rocket League benchmark on low-end graphics here.

With Rocket League, there is no benchmark mode, so we have to perform a series of automated actions, similar to a racing game having a fixed number of laps. We take the following approach: Using Fraps to record the time taken to show each frame (and the overall frame rates), we use an automation tool to set up a consistent 4v4 bot match on easy, with the system applying a series of inputs throughout the run, such as switching camera angles and driving around.

It turns out that this method is nicely indicative of a real bot match, driving up walls, boosting and even putting in the odd assist, save and/or goal, as weird as that sounds for an automated set of commands. To maintain consistency, the commands we apply are not random but time-fixed, and we also keep the map the same (Aquadome, known to be a tough map for GPUs due to water/transparency) and the car customization constant. We start recording just after a match starts, and record for 4 minutes of game time (think 5 laps of a DIRT: Rally benchmark), with average frame rates, 99th percentile and frame times all provided.

The graphics settings for Rocket League come in four broad, generic settings: Low, Medium, High and High FXAA. There are advanced settings in place for shadows and details; however, for these tests, we keep to the generic settings. For both 1920x1080 and 4K resolutions, we test at the High preset with an unlimited frame cap.

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

Rocket League Conclusions

The map we use in our testing, Aquadome, is known to be strenuous on a system, hence we see frame rates lower than what people expect for Rocket League - we're trying to cover the worst case scenario. But the results also show how AMD CPUs and NVIDIA GPUs do not seem to be playing ball with each other, which we've been told is likely related to drivers. The AMD GPUs work fine here regardless of resolution, and both AMD and Intel CPUs  get in the mix.



Grand Theft Auto

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark. The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data.

There are no presets for the graphics options on GTA, allowing the user to adjust options such as population density and distance scaling on sliders, but others such as texture/shadow/shader/water quality from Low to Very High. Other options include MSAA, soft shadows, post effects, shadow resolution and extended draw distance options. There is a handy option at the top which shows how much video memory the options are expected to consume, with obvious repercussions if a user requests more video memory than is present on the card (although there’s no obvious indication if you have a low end GPU with lots of GPU memory, like an R7 240 4GB).

To that end, we run the benchmark at 1920x1080 using an average of Very High on the settings, and also at 4K using High on most of them. We take the average results of four runs, reporting frame rate averages, 99th percentiles, and our time under analysis.

For all our results, we show the average frame rate at 1080p first. Mouse over the other graphs underneath to see 99th percentile frame rates and 'Time Under' graphs, as well as results for other resolutions. All of our benchmark results can also be found in our benchmark engine, Bench.

MSI GTX 1080 Gaming 8G Performance


1080p

4K

ASUS GTX 1060 Strix 6GB Performance


1080p

4K

Sapphire R9 Fury 4GB Performance


1080p

4K

Sapphire RX 480 8GB Performance


1080p

4K

Grand Theft Auto Conclusions

Looking through the data, there seems to be a difference when looking at the results with an AMD GPU and an NVIDIA GPU. With the GTX 1080, there's a mix of AMD and Intel results there, but Intel takes a beating in the Time Under analysis at 1080p. The GTX 1060 is a mix at 1080p, but Intel takes the lead at 4K. When an AMD GPU is paired to the processor, all flags fly Intel.



Power Consumption

With the two Kaby Lake-X processors, the main comparison for power consumption is going to be against their Kaby Lake-S counterparts on the mainstream platform. For an extra 100 MHz, Intel is moving from a 95W TDP to a 112W TDP, indicating that these processors will need better cooling – but when we tested KBL-S we were rather surprised by the power, especially on the Core i5.

Power: Total Package (1T)

Power: Total Package (Full Load)

The Core i5-7640X almost perfectly matches up with its Core i5 counterparts, even beating the Core i5-7600K in every situation but especially at load. For users who want to overclock near to the official TDP, it looks like there’s some headroom (other circumstances permitting). The Core i7-7740X is noticeably more power hungry at full load, hitting 79W, but lower than the Core i7-7700K at 87W.

Overclocking to 5.0 GHz

One of the elements promoted at the launch of Basin Falls was overclockability: Intel ran a private invite-only extreme overclocking workshop with sub-zero coolants that week, which was then followed by ASUS and GIGABYTE both stepping it up another gear by using liquid helium rather than liquid nitrogen. For some of these chips, the colder the better, and liquid helium is going to be limit for cooling (and cost) as there is inherent difficulty in using sub-zero coolants for records.

Those sub-zero events for records are just on single benchmarks, or single frequency records, and are not meant to be run at those speeds all day every day. This makes the air and water overclocking potential of the CPUs more important for regular users and enthusiasts.

When Kaby Lake-S was launched, we saw good overclocking performance on the Core i7-7700K: our set of 4.2 GHz base processors reached a range between 4.6 to 4.8 GHz, arguably giving a 8-15% overclock and given some of the numbers we had seen in the wild, the consistency between the processors meant that most of the 7700K CPUs could reach that amount. One of the reasons for such good performance on these CPUs was that Intel had incorporated a new feature for overclocking, the AVX offset. This allows overclockers to reduce the frequency that AVX code runs at, as this can be a limiting factor when it comes to a stable overclock. This feature is in Kaby Lake-X as well.

The end result is that the Core i7-7740X went one further in our testing, and hit 5.0 GHz all-cores at 1.25V, and completed our CPU testing suite with a -10 AVX offset. We went for 5.1 GHz all-core, up to 1.35 volts, but the temperatures were getting high and the system was still not stable. 5.0 GHz is a very respectable result, even if we only have a sample of one.

The results at 5.0 GHz are as follows. Gain is absolute for the benchmark for the 5.0 GHz value, with anything within 2.0% listed in orange. Green indicates a win for the faster chip.

Comparison: Intel Core i7-7740X at Stock vs 5 GHz
4.3 GHz Base
4.5 GHz Turbo
Auto AVX
  5.0 GHz Base
5.0 GHz Turbo
4.0 GHz AVX
Score / Time Web Benchmarks Score / Time Gain
135 ms Sunspider 133 ms +1.6%
622 ms Kraken 642 ms -3.1%
45485 Octane 45279 -0.5%
561 WebXPRT15 570 +1.5%
  System Tests    
2229 ms PDF Opening 2055 ms +8.5%
46.172 s FCAT 45.656 s +1.1%
1287 3DPM 2.1 1429 +11.0%
282 s Dolphin v5 280 s +0.7%
891.56 s Agisoft Total 830.06 s +7.4%
  Rendering    
2474593 Corona 2660002 +7.5%
436 s Blender 485 s -10.1%
195 Cinebench 15 ST 215 +10.6%
986 Cinebench 15 MT 1084 +10.0%
  Encoding    
27433 7-zip 29433 +8.9%
48.698 s WinRAR 46.633 s +4.4%
5.5 GB/s AES 6.2 GB/s +12.7%
25.2 FPS Handbrake HEVC 22.6 FPS -10.2%
  Office Tests    
4878 PCMark8 Creative 5084 +4.2%
4844 PCMark8 Home 5071 +4.7%
4110 PCMark8 Work 3984 +3.2%
19.45 per day Chrome Compile 17.70 per day -9.0%
  GTX 1080    
72.56 FPS 1080p Ashes Av 76.19 FPS +4.8%
53.95 FPS 1080p Ashes 99% 55.45 FPS +2.7%
62.30 FPS 4K Ashes Av 63.46 FPS +1.8%
41.24 FPS 4K Ashes 99% 40.79 FPS -1.1%
103.66 FPS 1080p RoTR-1 Av 106.9 FPS +3.0%
85.14 FPS 1080p RoTR-1 99% 85.88 FPS +0.9%
61.60 FPS 4K RoTR-1 Av 61.33 FPS -0.4%
50.25 FPS 4K RoTR-1 99% 50.43 FPS +0.3%

There are a few obvious trends, and a couple of red flags. Most benchmarks split themselves into two groups: benchmarks with a heavy CPU component tend to get a 4-12% gain when overclocked, and those that require a lot of DRAM involvement or AVX tend to be in the 2% region. The red flags are the three big 9-10% loss values for CPU heavy benchmarks, Blender, Handbrake HEVC and Chrome Compilation.

All three benchmarks are sizable workloads with sustained high frequencies throughout, and have heavy AVX components as well. The 4.0 GHz setting for AVX could account for most of that difference, however other AVX benchmarks in our testing (Corona, Cinebench, elements of PCMark, Agisoft) do not have the same effect. The only real difference here is time: the amount of time each benchmark stays at a sustained load.

The three that had the biggest dips are our longest benchmarks: Blender at 8 minutes, Handbrake HEVC at 25 minutes, and the Chrome Compile at over an hour. In this case it seems we are hitting thermal limits for the power delivery, as explained by Igor Wallossek over at Tom's Hardware. He tested an upcoming theory that the early X299 boards are not up to the task for cooling VRMs at heavy load, and through analysis he determined that this was likely to occur in heavily overclocked scenarios. His data showed that the Skylake systems he tested, when overclocked, would hit thermal limits, come back down, and then ramp up again in a cyclical manner. He tested Skylake-X, which draws a lot more power overclocked than our KBL-X setup here, so it likely isn't affecting our setup as much, but still enough for certain benchmarks. I fully suspect we will see second-wave X299 motherboards with substantial heatsinks on the power delivery to overcome this.

More testing is needed, especially in upcoming X299 motherboard reviews. We used an early X299 prototype for our CPU testing here, and a Gaming 9 motherboard on the GPU tests (which showed little variation and no problems). We did take power numbers for this setup but when looking back on the results, they seemed off so we need to go back and re-test those.



The Fastest for Serial Workloads

If you asked ‘what made the best processor’ fifteen years ago, the obvious answers were performance, power and price. As time has marched on, this list has started to include integrated graphics, bandwidth, platform integration, platform upgradability, core-to-core latency, and of course, cores. Marching up from a single x86 core through to CPUs that carry 10 cores for consumers, 28 cores for enterprise and 72 cores for add-in cards makes the task of creating a perfect processor almost impossible – there is no way to satisfy all of the properties that build a processor today. Both AMD and Intel start from basic building blocks (a single core) and then configure processors around that core, adding in more cores, connectivity, and then binning to the right voltage/frequency and pricing appropriately. The end result is a stack of processors aimed at different segments of the market.

The pair of Kaby Lake-X processors cover one main area listed above more than any others: core performance. By having the latest CPU microarchitecture and placing it on the newest high-end desktop platform there is room at the top for more frequency leading to a higher pure performance product. As a byproduct these CPUs are power efficient, leading to a high performance per watt, and are situated in a platform with extensive IO options. Ultimately this is where the Kaby Lake-X customer will sit: someone who wants high single thread performance but is not after massive multi-core performance. This would typically cover the majority of gamers and enthusiasts, but not necessarily content creators.

The benefits in the benchmarks are clear against the nearest competition: these are the fastest CPUs to open a complex PDF, at the top for office work, and at the top for most web interactions by a noticeable amount.

The downsides are pure throughput workflows, such as neuron simulation, rendering and non-video encoding.

The parts in the middle are the ones to dissect, and these get interesting. Let me pull up a few graphs that illustrate this middle of the road position: Chromium Compilation, Agisoft Photoscan and WinRAR.

Office: Chromium Compile (v56)

System: Agisoft Photoscan 1.0 Total Time

Encoding: WinRAR 5.40

These three results show the Core i7-7740X performing above any AMD chips of similar price, but the Core i5-7640X performing below any Ryzen 7 or Ryzen 5 parts. This comes down to the workload in each of these benchmarks, and how the processor configurations affect that. All three of these real-world benchmarks are variable-thread workloads. Some elements are serialized and rely on a high single-thread performance, while other elements are fully parallelizable and can take advantage of cores and threads (sometimes threads do not necessarily help). The benchmarks are ultimately limited by Amdahl’s Law, where single thread speed affects the whole test, but multiple-threads only helps the parallelizable parts. With sufficiently parallelizable code, it becomes a balance between the two.

So for the Core i7-7740X, up against the Ryzen 7 1700 at an equivalent price, the Core i7 has eight threads and the Ryzen 7 has sixteen, but the Core i7 has a much higher single thread performance. So for these benchmarks, having a high performance metric like this means that despite having half the cores/threads of the AMD part, the Core i7 can take the lead very easily.

But the Core i5-7640X has a different task. It has four cores, like the Core i7, but no hyperthreading, so it sits at four threads. Its direct competitor, the Ryzen 5 1600X, has six cores with simultaneous multithreading, leading to twelve threads. This gives the AMD processor a 3:1 advantage in threads, and for each of these three benchmarks it can parallelize the code sufficiently that the single thread performance of the Intel CPU is not enough. Moving from a 2:1 ratio with the Core i7 to a 3:1 ratio with the Core i5 is a turning point for ST performance compared to MT performance.

So with the X299 confusion, are these CPUs worth recommending?

When Kaby Lake-X first came out, a number of technology experts were confused at Intel’s plans. It made sense to launch the latest microarchitecture on the high-end desktop platform, although launching it in a quad-core form was an idea out-of-left-field, especially for a platform that is geared towards multiple cores, more memory, and more memory bandwidth. In that paradigm, the Kaby Lake-X is an oddball processor design choice.

There are bigger factors at play however – if Intel launched 6-10 core parts on KBL, it would cannibalize their Skylake-X and Skylake-SP sales. Also, as we’ve seen with Skylake-X CPUs, those enterprise cores are now different to the consumer Skylake-S cores, with different cache structures and AVX-512. So if Intel had launched >4 cores on KBL-X, they would have likely had to scrap Skylake-X.

But that’s a slight tangent.

The Core i7-7740X appeals to users who want the fastest out-of-the-box single thread x86 processor on the market today. This means financial traders, gamers, and professionals working with serial code bases, or anyone with deep pockets that might think about upgrading to Skylake-X in the future. Enthusiast overclockers are likely to find the better binned CPUs fun as well.

That’s if you do not mind paying a premium for the X299 platform. For users who mind the cost, the Core i7-7700K is 98% of the way there on performance but can save a hundred dollars on the motherboard and offers the same functionality. In some of our benchmarks, where despite the high single thread performance having more cores helped, then spending a little more on the Skylake-X six-core Core i7-7800X is beneficial: for example, Luxmark and POV-Ray scored +33% for the 7800X over the 7740X.

The Core i7-7740X makes certain sense for a number of niche scenarios. By contrast, the Core i5-7640X doesn’t make much sense at all. There’s still the benefit of high single-thread performance and some good gaming performance in older titles, but in the variable threaded workloads it loses to AMD’s processors, sometimes by as much as 45%.  For a chip that comes in at $242, users should expect to pay about the same on a motherboard – whereas either an AMD part or the Core i5-7600K can go in a $120 motherboard and still be overclocked.

There are only two scenarios I can see where the Core i5 adds up. Firstly, users who just want to get onto X299 now and upgrade to a bigger CPU for quad-channel memory and more PCIe lanes later. The second is for professionals that know that their code cannot take advantage of hyperthreading and are happy with the performance. Perhaps in light of a hyperthreading bug (which is severely limited to minor niche edge cases), Intel felt a non-HT version was required.

In our recent CPU Buyers’ Guide (link autoupdates to the latest CPU guide) we suggested the Core i7-7740X for anyone wanting a Peak VR experience, and we still stand by that statement. It has enough threads and the biggest grunt to take on VR and the majority of enthusiast gaming experiences, if a user has pockets big enough.

The recommendations of the new CPUs boil down to platform costs. They seem a minor upgrade to the Kaby Lake-K processors and the Z270 platform, which is a platform that caters to a big audience with a more cost-sensitive structure for motherboards in mind. 

Log in

Don't have an account? Sign up now