RTX 3080, Thunderbolt 3, and Throughput Saturation?
 
Notifications
Clear all

RTX 3080, Thunderbolt 3, and Throughput Saturation?  

 of  5
  RSS

tilchev
(@tilchev)
Eminent Member
Joined: 7 months ago
 

@arizor, From the nvidia quote:

... The impact is typically less than a few percent going from a x16 PCIE 4.0 to x16 PCIE 3.0. ...

What they are saying may indeed be true for a x16 connection, but much less so for a x8 and surely isn't true for a x4, which is the one that concerns us.

2015 15" Dell Latitude E5570 (R7 M370) [6th,4C,H] + GTX 1060 @ 32Gbps-M2 (ADT-Link R43SG) + Win10 [build link]  

ReplyQuote
Gakkou
(@gakkou)
Eminent Member
Joined: 4 months ago
 

I don't think it will be as bad as some make it out to be, but we will see once someone here buys one and actually benchmarks it

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ReplyQuote
Arizor
(@arizor)
Trusted Member
Joined: 9 months ago
 
Posted by: @tilchev

@arizor, From the nvidia quote:

... The impact is typically less than a few percent going from a x16 PCIE 4.0 to x16 PCIE 3.0. ...

What they are saying may indeed be true for a x16 connection, but much less so for a x8 and surely isn't true for a x4, which is the one that concerns us.

Sure, I was just answering the concern about pcie4 vs pcie3.

 The gap between x16 and x4 isn’t that big, most cards don’t really take advantage of even x8 yet. Here’s one benchmark of the 1080, where the gap at 2160p is negligible - https://www.techpowerup.com/review/nvidia-geforce-gtx-1080-pci-express-scaling/24.html

 

 edit: another video benchmark using the Radeon VII, at 1440p the loss is 2%-8%,

obviously the gap will be bigger for the 3 series but I’d wager it’s less than 10% at 2160p.

2019 16-inch MacBook Pro Retina (2.3ghz Intel i9) // Radeon 5500M (8GB).
MacOS Catalina // Bootcamp Win10 2004 (Build 19041).
Razer Core X // Vega 64 // 2m Active Thunderbolt cable.


ReplyQuote
tilchev
(@tilchev)
Eminent Member
Joined: 7 months ago
 

@arizor, My point is that the gap between gen 3 x4 and gen 4 x16 is quite big, because gen 3 x4 == gen 4 x2. Obviously it's ok for most current cards, but given the massive performance boost of the 3080/90, I'm a bit skeptical. Even the current 2080ti can fully use a 3.0 x8 slot and the 3080/90 have roughly double the performance of that meaning they could reach the full potential of a gen 3 x16 slot. Seriously take a look at this. For current gen 3.0 eGPU interfaces, I would at most get a 3070. Once gen 4 eGPU interfaces are out a 3080/90 would be much more viable. Of course you could get a 3080/90 now and update your interface later when such products become available.

2015 15" Dell Latitude E5570 (R7 M370) [6th,4C,H] + GTX 1060 @ 32Gbps-M2 (ADT-Link R43SG) + Win10 [build link]  

ReplyQuote
Arizor
(@arizor)
Trusted Member
Joined: 9 months ago
 

@tilchev ,

for sure, the performance leap is enormous (and very welcome!). But remember not to confuse performance of a card with the bottleneck of passing data to the display, which is the real bottleneck.

The TB3 botteneck is lessened by 4K (over say 1080p) because 4K puts the onus on the graphics card to do the work. The bottleneck is not how rapidly the GPU can process the data, but how quickly it can output the data to the display over TB3. At 4K the GPU needs to work harder before transmitting data (ie this manifests as a lower frame rate running games at 4K over 1080), which lessens the bottleneck.

for sure if you are running games at 1080p, don’t bother with a 3 series rtx as the bottleneck is very high because frame rates are so high they hit the data limit, but at 4K the slower transmission (ie lower frame rate) lowers the bottleneck, so a 3 series is very much viable for a 4K ultra settings goal.

This is all of course hypothetical until we try one! But just wanted to elucidate some of the ideas around performance and bottlenecks in practical application.

2019 16-inch MacBook Pro Retina (2.3ghz Intel i9) // Radeon 5500M (8GB).
MacOS Catalina // Bootcamp Win10 2004 (Build 19041).
Razer Core X // Vega 64 // 2m Active Thunderbolt cable.


tilchev liked
ReplyQuote
tilchev
(@tilchev)
Eminent Member
Joined: 7 months ago
 

@arizor, Now I get your point, I was just a bit confused when you gave an example with GTX 1080, which is a bit of an old chip by now. Thanks for clarifying. Actually isn't this bottleneck eliminated or at least much less relevant if you are using a monitor that is directly plugged to the eGPU instead of the internal laptop display? I don't really understand how frames are transported between the GPU, CPU, monitor and interfaces between them, perhaps you can recommend a good read or video for that?

Another question: in your example for 1080p vs 4k, aren't 4K frames somehow bigger and use up more bandwidth although fewer in quantity?

2015 15" Dell Latitude E5570 (R7 M370) [6th,4C,H] + GTX 1060 @ 32Gbps-M2 (ADT-Link R43SG) + Win10 [build link]  

ReplyQuote
odin
 odin
(@odin)
Estimable Member
Joined: 2 years ago
 
Posted by: @arizor

@tilchev ,

for sure, the performance leap is enormous (and very welcome!). But remember not to confuse performance of a card with the bottleneck of passing data to the display, which is the real bottleneck.

The TB3 botteneck is lessened by 4K (over say 1080p) because 4K puts the onus on the graphics card to do the work. The bottleneck is not how rapidly the GPU can process the data, but how quickly it can output the data to the display over TB3. At 4K the GPU needs to work harder before transmitting data (ie this manifests as a lower frame rate running games at 4K over 1080), which lessens the bottleneck.

for sure if you are running games at 1080p, don’t bother with a 3 series rtx as the bottleneck is very high because frame rates are so high they hit the data limit, but at 4K the slower transmission (ie lower frame rate) lowers the bottleneck, so a 3 series is very much viable for a 4K ultra settings goal.

This is all of course hypothetical until we try one! But just wanted to elucidate some of the ideas around performance and bottlenecks in practical application.

I'm probably going to get a 3080 just so I can crank up the RT effects to their fullest. It SHOULD be able to keep 60FPS at 1600p (Gram 17 native res) with details slammed to Ultra for at least a year or two until games start getting heavy. Throw DLSS advancements in there and we should be good to go. I also just want to keep it for a couple few years and not feel like I need to upgrade, that extra 30% or so perf should be nice. Also, if I get to a point where a desktop system is viable I'll just move it over to that.

In my experience it seems there's some games that utilize PCIe bandwidth more than others, some games seem to take a bigger hit when using an eGPU than others. I will admit that is one thing I'm a little concerned about among other unknowns:

  • Will more games start to use additional PCIe bandwidth in general (especially for people like me that can't use an external monitor straight to the eGPU)
  • My i7 8565U being able to keep up to feed it enough data. Hopefully 2021 LG Gram 17 will use Tiger Lake, not banking on an AMD model that will use USB4 with TB3 interoperability.
  • It seems the 550W Sonnet will be able to power the 3080 just fine but who knows. 320W card vs 375W max card TGP according to Sonnet, but they also don't recommend Vega 64 or Radeon VII for the 550 and both of those are less than 320W on paper.
  • With TB4 seemingly not providing any further bandwidth, I'm guessing we're going to be waiting quite some time for an 8-lane solution.

 

LG Gram 17 | Sonnet Breakaway Box 550 | Asus Strix RTX 2070 OC Edition | Win 10 Pro 20H2 + Fedora 32 Dual Boot
Build Link

 
2018 17" LG Gram 17 [8th,4C,U] + RTX 2070 @ 32Gbps-TB3 (Sonnet Breakaway 550) + Win10 [build link]  


ReplyQuote
Arizor
(@arizor)
Trusted Member
Joined: 9 months ago
 

@tilchev, that's correct. The bottleneck is much less to an external display, since to an internal display it's doing the work twice to transport the data (i.e. both up and down the thunderbolt to your eGPU and back to your internal display).

Don't think of the 4K vs 1080p as transferring an image between the devices, as if it's like downloading a 4k vs 1080p picture.

The initial transfer of data (from the laptop down the thunderbolt into the eGPU) is, to put it in basic terms, raw data for the GPU to process. There really isn't much difference at this point between 1080p data and 4k data; it's just the CPU saying "HERE'S A BUNCH OF STUFF THAT NEEDS COMPUTATION". So the more times per second the data is being sent (i.e. frames), the bigger the TB3 bottleneck. As I probably don't need to explain, running at 1080p means the CPU is sending many more requests per second (i.e. FPS) than at 4K, thus the bottleneck becomes the TB3. At 4K, the requests per second (FPS) is much lower, thus the bottleneck becomes your GPU, rather than the TB3.

The long and short of it is, you can pretty much aim for 4K @60 over a TB3 connection, and with a powerful enough GPU, you can hit it without much degradation. Higher framerates than that will start to hit the TB3 bottleneck, so if you want to take advantage of those 120/144hz monitors, eGPU is probably not a good investment.

But for those of us with 4K 60hz monitors, it's a suitable solution.

 

2019 16-inch MacBook Pro Retina (2.3ghz Intel i9) // Radeon 5500M (8GB).
MacOS Catalina // Bootcamp Win10 2004 (Build 19041).
Razer Core X // Vega 64 // 2m Active Thunderbolt cable.


ReplyQuote
odin
 odin
(@odin)
Estimable Member
Joined: 2 years ago
 
Posted by: @arizor

@tilchev, that's correct. The bottleneck is much less to an external display, since to an internal display it's doing the work twice to transport the data (i.e. both up and down the thunderbolt to your eGPU and back to your internal display).

Don't think of the 4K vs 1080p as transferring an image between the devices, as if it's like downloading a 4k vs 1080p picture.

The initial transfer of data (from the laptop down the thunderbolt into the eGPU) is, to put it in basic terms, raw data for the GPU to process. There really isn't much difference at this point between 1080p data and 4k data; it's just the CPU saying "HERE'S A BUNCH OF STUFF THAT NEEDS COMPUTATION". So the more times per second the data is being sent (i.e. frames), the bigger the TB3 bottleneck. As I probably don't need to explain, running at 1080p means the CPU is sending many more requests per second (i.e. FPS) than at 4K, thus the bottleneck becomes the TB3. At 4K, the requests per second (FPS) is much lower, thus the bottleneck becomes your GPU, rather than the TB3.

The long and short of it is, you can pretty much aim for 4K @60 over a TB3 connection, and with a powerful enough GPU, you can hit it without much degradation. Higher framerates than that will start to hit the TB3 bottleneck, so if you want to take advantage of those 120/144hz monitors, eGPU is probably not a good investment.

But for those of us with 4K 60hz monitors, it's a suitable solution.

 

One of the other areas is how many assets are streaming to the eGPU from memory or storage now that we're getting DirectStorage (textures, layers of effects and passes, more instructions for more fidelity) there are increaes in PCIe bandwidth usage as the GPU is doing more and more work and there is more geometry and higher quality assets on screen at once. Whether that creates some breaking point for the available bandwidth especially for those of us that can't use an external monitor remains to be seen. I think we might be a couple years away from that at least right now. 

The hit is obviously already there, but games are still largely playable right now. For instance I just got done with the final expansion for Control and decided to run around the world there again as it had been a while since I played it. It plays GREAT at 1600p with RTX on High and obviously using DLSS and I think a mix of high and medium settings in the game itself. It could be a tad smoother but it's definitely playable. I wonder how long this lasts as games become more and more complicated. DLSS is going to be pretty key for us to keep asset size down and let the GPU do the work of making them look like higher quality and by consequence larger byte size assets.

The Assassin's Creed series (don't know about the newest ones, but I definitely see it with like Black Flag and Unity up to Syndicate) is one of those that seems to take a larger hit with reduced PCIe bandwidth. They must be feeding a lot of data to the GPU, perhaps more than they need to be. One thing that would be interesting if it could be measured is the PCI bandwidth usage of different games. I would imagine some games have much higher usage than others.

LG Gram 17 | Sonnet Breakaway Box 550 | Asus Strix RTX 2070 OC Edition | Win 10 Pro 20H2 + Fedora 32 Dual Boot
Build Link

 
2018 17" LG Gram 17 [8th,4C,U] + RTX 2070 @ 32Gbps-TB3 (Sonnet Breakaway 550) + Win10 [build link]  


ReplyQuote
Arizor
(@arizor)
Trusted Member
Joined: 9 months ago
 

@odin, for sure, DLSS is as very interesting technology in terms of how it shapes bandwidth.

Yeah certain games do indeed do strange things with the bandwidth, I think Horizon Zero Dawn is the newest strange case, where there is a large difference between 16x and 8x, let alone 4x. It's very odd and I suspect they're doing some very aggressive loading of assets in the background.

 

2019 16-inch MacBook Pro Retina (2.3ghz Intel i9) // Radeon 5500M (8GB).
MacOS Catalina // Bootcamp Win10 2004 (Build 19041).
Razer Core X // Vega 64 // 2m Active Thunderbolt cable.


odin liked
ReplyQuote
 of  5