Introduction
V-Ray’s GPU rendering and NVIDIA’s hardware are constantly improving. Recently, there have been major advances in both, so we thought now would be the perfect time to run new benchmarks and find out how much faster everything might be.
The hardware
With 40 logical CPU cores and 128GB RAM, the Lenovo P900 is powerful. It’s great for GPU tests, since there’s space for three double slot GPUs and one single slot GPU. Plus, the toolless chassis makes it quick to pop cards in and out. The tests felt like an F1 pitstop for GPUs.
The GPUs we decided to test are as follows:
GPU | Architecture | Cores | RAM type | RAM | Power | Slots | Street Price |
---|---|---|---|---|---|---|---|
GP100 | Pascal | 3584 | HBM2 | 16GB | 235W | 2 | N/A |
P6000 | Pascal | 3840 | GDDR5X | 24GB | 250W | 2 | $4,699 |
P5000 | Pascal | 2560 | GDDR5X | 16GB | 180W | 2 | $2,499 |
P4000 | Pascal | 1792 | GDDR5X | 8GB | 105W | 1 | N/A |
M6000 | Maxwell | 3072 | GDDR5 | 24GB | 250W | 2 | $4,539 |
Titan X (Pascal) | Pascal | 3584 | GDDR5X | 12GB | 250W | 2 | $1,599 |
*Street prices approximate, based on a quick search at Newegg and Amazon. The GP100 and P4000 are not public yet, so no pricing is available.
The benchmark test
Even before the benchmarks started, we were very interested to see NVIDIA’s new NVLink tech in action. Because NVLink allows cards to share memory, we were curious to see what sort of performance we could get using two new GP100s. More on this later.
Our lead GPU developer, Blago Taskov and I set up the benchmarks. To get better data, we decided it would be best to test multiple scenes instead of just one. We batch rendered nine different scenes and recorded the time to complete each one. Then, we added up the total time for all nine.
Here are the results:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6 | Test 7 | Test 8 | Test 9 | Total time | |
---|---|---|---|---|---|---|---|---|---|---|
GP100 x 2 | 46.49 | 130.36 | 156.69 | 29.43 | 112.99 | 39.88 | 40.21 | 107.75 | 19.94 | 683.74 |
GP100 | 90.72 | 251.81 | 295.52 | 50.84 | 220.51 | 77.72 | 76.94 | 202.28 | 38.02 |
1304.36 |
P6000 | 127.21 | 363.18 | 410.72 | 72.17 | 348.99 | 131.39 | 109.64 | 264.82 | 61.83 | 1889.95 |
P5000 | 188.18 | 536.69 | ||||||||
P4000 | 212.54 | 636.84 | 724.22 | 131.86 | 565.83 | 207.79 | 178.6 | 455.61 | 104.62 | 3217.91 |
M6000 | 140.13 | 483.71 | 538.86 | 97.59 | 423.11 | 159.04 | 134.79 | 351.91 | 73.15 | 2402.29 |
A comparison of the different times in percentage of time for each card can be seen in this table:
GP100 x 2 | GP100 | P6000 | P5000 | P4000 | M6000 | Titan X (Pascal) | |
---|---|---|---|---|---|---|---|
GP100 x 2 | 1 | 1.907684 | 2.764135 | 4.059496 | 4.706336 | 3.513455 | 2.7042882 |
GP100 | 0.524196 | 1 | 1.448948 | 2.127971 | 2.467041 | 1.841738 | 1.4175764 |
P6000 | 0.361777 | 0.690156 | 1 | 1.468631 | 1.702643 | 1.271087 | 0.9783486 |
P5000 | 0.246336 | 0.469931 | 0.680906 | 1 | 1.15934 | 0.86549 | 0.6661635 |
P4000 | 0.21248 | 0.405344 | 0.587322 | 0.86256 | 1 | 0.746537 | 0.5746059 |
M6000 | 0.28462 | 0.542965 | 0.786728 | 1.155414 | 1.339518 | 1 | 0.7696947 |
Titan X (Pascal) | 0.369783 | 0.705429 | 1.022131 | 1.501133 | 1.740323 | 1.299216 | 1 |
A note about RAM
RAM plays a big part in the value of these cards. For example, the Titan X (Pascal) and P6000 showed similar times across all the tests. On some the Titan X was faster, and on others the P6000 beat it outright. In overall time, the Titan X narrowly edged out the P6000. But that’s the not the whole story. While both cards were neck in neck in speed, the choice (and cost) comes down to RAM. The Titan X is significantly less expensive at 12GB of RAM, but the P6000 can fit much more data with its 24GB of RAM. You might be able to give yourself a little more breathing on that 12GB card with V-Ray 3.5’s On-demand Mip-mapping. This would dramatically reduce the RAM requirements for loading textures. Ultimately, it comes down to your budget and how much memory you really need.
Let’s say you want to render a huge scene with lots of geometry and textures. If you need more than 24GB, that’s where NVLink comes in. What is NVLink?
Currently, GP100s are the only cards to support NVLink. They use special HBM2 memory that is so fast, it can be shared across cards. It may look similar to SLI, but it’s not the same. In our setup we connected two GP100s. In theory, with specialized hardware, it’s possible to link more. For example, NVIDIA’s DGX-1 does this with eight P100 GPUs. But at $129,000 it’s a little out of our price range. We’re looking forward to testing that one. When we do, we’ll be sure to share the results.
V-Ray and NVLink
We’ve enabled NVLINK in the latest V-Ray nightly builds. To test it, we enlisted the help of our friends at Dabarti Studio, and they created this torture test.
Model and assets courtesy of Dabarti with 169 million polygons and 150+ 6k textures
This scene contains 169 million polygons and over 150 6K images. The geometry alone won’t fit on a single card, not to mention all those high res. textures.
Time to render. First, we set all objects to Dynamic Geometry in the V-Ray Properties. This made it possible for the geometry to be shared across the cards. Then, we disabled On-demand Mip-mapping to force the full resolution textures to load. Once the cards were fully loaded, each one used 13GB of its 16GB RAM. That’s a total of 26GB RAM on both cards – more than the 24GB a P6000 can hold.
It worked, and we noticed little or no performance loss with NVLink. It’s still early, but the initial results are positive. Maybe with a few driver updates and V-Ray tweaks, NVLink will perform even better in the future.
Conclusion
Moore’s Law is alive and well. The M6000 arrived about two years ago and today the GP100 is almost twice as fast – right on schedule. The combination of NVIDIA’s latest tech and V-Ray’s most recent advances in GPU rendering, seem to remove some of the early memory limitations. And that paints a bright future for GPU rendering. We will continue to test and update you more as we get new hardware to test and benchmark.
Special thanks
Thanks to NVIDIA for loaning us their latest and greatest hardware for stress testing. Also, thanks to Lenovo for supplying Chaos Group Labs with a workstation that can handle some serious computing. And thanks to Tomasz Wyszolmirski at Dabarti Studio for helping us continue to push GPU rendering to its limits.