eGPU Performance Loss - PCI Express vs. Thunderbolt
 
Notifications
Clear all

eGPU Performance Loss - PCI Express vs. Thunderbolt  

 of  11
  RSS

Donwey
(@donwey)
Active Member
Joined: 3 years ago
 

So reading about this topic here i came to conclusion that since i have aorus box gtx1080 and would like even more performance on 1440p it would be better to buy a better laptop to fully utilize the gtx 1080 rather than buying new rtx 2080. But the question is, when could we see a big performance improvement with eGPUs? Will the coming thunderbolt 4 help? I guess not since we are all using external monitors and with any ver. of thunderbolt they show same 20%performance drop. When could we see more performance from eGPUs? 

Dell XPS 15 9560 @ I7 7700HQ + Aorus Gaming Box GTX 1080 + External Monitor 1440p

 
2017 15" Dell XPS 15 9560 (GTX1050) [7th,4C,H] + GTX 1080 @ 16Gbps-TB3 (AORUS Gaming Box) + Win10 [build link]  


ReplyQuote
joevt
(@joevt)
Noble Member
Joined: 4 years ago
 

A new version of Thunderbolt would be required for more performance from eGPUs. Another option is new graphics cards that are better at transmitting data over narrow pipes. Maybe new drivers could do that with existing graphics cards. The idea would be to compress the data before sending it through the narrow pipes and then uncompress it at the other end. The time for the compress/decompress process needs to be less than the time it would take to transmit that data. Maybe AMD XConnect does this but all I see is marketing speak. Maybe AMD XConnect is just drivers that are less dumb - supporting hot plug/surprise removal like they should have always done (Thunderbolt is not the first interface to support hot plug).

About Thunderbolt 4 : There are a couple options:
1) Increase bandwidth per lane from the current 20 Gbps. That's kind of high already. Consider PCIe 4.0 is only 16 GT/s. PCIe 5.0 might be 32 GT/s.
2) Add another lane to the existing two lanes (an increase of 50%). Maybe Thunderbolt 3 could use a special USB-C cable that replaces the USB 2.0 lines of USB-C with higher performance lines. This is what VirtualLink does, but they only go up to 10 Gbps.

Mac mini (2018), Mac Pro (Early 2008), GA-Z170X-Gaming 7, Sapphire Pulse Radeon RX 580 8GB GDDR5, Radeon Pro W5700, Sonnet Echo Express III-D, Trebleet Thunderbolt 3 to NVMe M.2 case


ReplyQuote
adamk77
(@adamk77)
Eminent Member
Joined: 3 years ago
 
Posted by: P-Mac
Posted by: enjoy

I told that many times...

I'm trying to prevent misinformation and incorrect conclusions being drawn by your barrage of redundant data, when the fact of the matter is that eGPU performance delta is a complex issue.

This pretty little list of resolutions vs. GPUs shouldn't be touted as some definitive reference of real-world eGPU gaming performance, for a few reasons:

  1. You are not using a real-world gaming test, playing actual games for the majority of your testing data, therefore you can't draw a complete conclusion without looking at the whole story. Gaming with modern, large texture-set games utilizes the PCIe bus in ways and amounts that vary significantly compared to Heaven, and all your list does is tell us that we can sit there and watch Heaven all day at those resolutions. In real-world gaming scenarios, performance loss via eGPU is a more complicated issue.
  2. You aren't listing refresh rate, which changes computational requirements significantly, and also can affect the amount of which the TB bottleneck is responsible for the eGPU performance delta.
  3. This list is basically preaching to the choir since the relative performance of the listed cards is already a known factor.

For example, playing The Division (4K 60Hz, maxed settings, vsync adaptive), there are situations in which my Titan is:

  • Not bottlenecked at all - simple scene, 60fps is being achieved, and as a result the card is not at full utilization
  • TB bottlenecked - moderately complex scenes, but traveling around the game world from point A to point B. Card utilization isn't near 99%, meaning the GPU is spending a lot of time idle waiting for textures or data to be sent to it for processing
  • CPU bottlenecked - complex scenes (e.g. gunfights in wide open spaces) with a lot of particles, AI, and physics effects - GPU utilization in the 90's, moderate bus utilization, but CPU melting as it tries to set up these complex scenes with a lot of game overhead
  • GPU bottlenecked - very complex scenes (lots of volumetric fog, foliage with subsurface scattering and shadows) with little to no AI/particles/physics - GPU utilization at 99%, bus and CPU usage being moderate. This is when the performance delta is the least, compared to desktop-installed equivalent cards. This also happens to be the workload in a lot of GPU benchmarks (3DMark graphics, Heaven, Valley, SP)

A 10-20% performance loss with an eGPU is a best-case scenario, not an average as far as gaming performance is concerned. The answer to someone asking "which GPU should I put in my eGPU" should always be "what are you looking to do with it?"

The resolution someone plays at, the framerate they are targeting, their Thunderbolt version, their CPU, and even the specific games that they play can have a huge impact on which GPU they should be looking at and what kind of performance they will experience. 

---

The reason I am so interested in this topic is because I've seen the 10-20% number being thrown around since I first got my eGPU setup running (around june 2016), and in that time, I have learned a lot about the different bottlenecks in a laptop + eGPU system with how it relates to real-world gaming performance, because, well... I play games with my computer, not solely use it for benchmarks. I've ran a 980, a 980 Ti in my eGPU setup before. Overclocked GPUs, BIOS mods, thermal paste replacements, underclocking & undervolting the CPU for thermal headroom...I love tinkering and if there's a setting that could potentially benefit gaming performance, you can bet I've touched it in my quest to understand how our unique eGPU-based systems deal with the workloads of modern games.

The problem with the "oh cool the GPU will basically be 10-20% slower than desktop-installed" conclusion is this: I was fortunate enough to upgrade to a TITAN Xp in my eGPU. You'd figure "10-20% loss" from the most powerful GPU out right now would be totally fine to handle 1080p Ultra gaming, right? 

In DOOM, I've seen FPS dips down into the 60s and 70s, and I play at 144Hz since it's a twitch-style shooter. ROTTR was barely better of an experience than my old 980 Ti. VR (90fps) in Elite Dangerous & DIRT Rally is a mixed bag. The Division (60Hz), as outlined above, is also a mixed bag...but Elite Dangerous on the 1080p display runs fine at 4K DSR 60Hz with barely any dips below 60fps. GTA V has huge framerate swings and dips, but DSR 4K helps mitigate that. 

A modern game is a much different beast from a focused benchmark and that's why I caution against drawing a general conclusion about gaming performance when the majority of data that we are drawing the conclusion from is benchmark-based.

This reply is so spot on.  I was banging my head against the wall wondering why I was getting 60 FPS in 1440p in games like Division 2 and Rise of the Tomb Raider, but I was only getting 30-40 FPS in a game like Monster Hunter World. 

I am seeing both CPU and GPU utilization around 70%, which means they're just idling waiting for data to do something with it. It's being constrained by the Thunderbolt 3 bandwidth. 

I'm still a bit perplexed as why this would still be the case if I set the resolution to something ridiculously low like 640x480. I would assume that there isn't much texture data to push around at this resolution so it should not be TB bottle-necked. If @p-mac is still around, I'd love to get your insights on what you think maybe going on.

 2018 Mac Mini i7, 32GB RAM, 512GB SSD, 10Gbit ethernet
 Sonnet eGFX Breakaway Box 350W with Sapphire Vega 56 Pulse

 
2018 Mac Mini [8th,6C,B] + RX Vega 56 @ 32Gbps-TB3 (Sonnet Breakaway 350) + macOS 10.14.1 & Win10 1803 [build link]  


ReplyQuote
andyegoo
(@andyegoo)
Eminent Member
Joined: 3 years ago
 

Hello everyone noticed that thunderbolt 1 allows you to comfortably play at 30 fps in 4k resolution in forza 4 graphics settings are high, if you set 60 fps, micro friezes start. Is this a limit of 10 gigabits?


2012 15" MacBook Pro Retina (GT650M) [3rd,4C,Q] + GTX 1060 @ 10Gbps-TB1>TB3 (ASUS XG Station 2)+Win10 [build link]  

ReplyQuote
P-Mac
(@p-mac)
Trusted Member
Joined: 5 years ago
 
Posted by: adamk77
I'm still a bit perplexed as why this would still be the case if I set the resolution to something ridiculously low like 640x480. I would assume that there isn't much texture data to push around at this resolution so it should not be TB bottle-necked. If @p-mac is still around, I'd love to get your insights on what you think maybe going on.

@adamk77, Hi...yeah, it's been a while since I've been around these parts. Still using an eGPU (Vega 64 + Alphacool Eiswolf watercooler / Akitio Node...gotta update my post) but got disheartened about posting here due to frustrations with people spreading misinformation without doing their part researching what exactly was causing their issues.

I currently believe that this post (along with the performance benchmarks @itsage meticulously performed in the quoted post) paints the best picture about potential bottlenecks using an eGPU. From what I've been able to glean from that thread:

  • You do not actually get full PCIe3 x4 bandwidth of 32Gbps as you would assume, because Thunderbolt 3 reserves 18Gbps out of the 40 available, leaving only 22Gbps for the eGPU.
  • Thunderbolt encoding/decoding causes an additional performance deficit, independent of bandwidth constraints. A good example to see this in action is @itsage's Superposition results, with 64.6 FPS (x16 internal), 64.7 (x4 internal), 64.4 (x2 internal) and 55.3 (TB3). The fact that the x16, x4, and even x2 (!!) internal results are within margin of error suggests to me that this particular test is not bandwidth-constrained, so this—combined with the TB3 result being lower—means that logically the only other thing to blame here is Thunderbolt encoding overhead.

I haven't played Monster Hunter World, so I don't have any personal experience to comment with, that being said:

CPU bottlenecking can still occur at utilization levels below 100%; the game may simply not be multi-threaded enough (Crysis 1 comes to mind) and the host computer (usually a laptop)'s cooling may not be able to sustain high boost clocks, causing the CPU to bottleneck the game's maximum framerate due to poor clockspeed affecting single-core performance. You may also truly be bandwidth-bottlenecked, more and more modern games are starting to leverage PCIe bandwidth to stream textures & shaders (i.e. for in-game areas the player may travel to) and this can cause significant frame drops. 

I would assume that there isn't much texture data to push around at this resolution so it should not be TB bottle-necked.

Display resolution (aka rendering resolution, for games with built-in scaling) is independent of texture resolution. You can drag a 1080 Ti down to 30fps at 1080p if you slam it with a heavy ray tracing workload like the Star Wars Reflections GPU demo...the point I'm getting to is that your resolution may not be the primary factor in capping your performance if the card is still doing heavy shading work alongside constantly streaming high resolution textures in.

For me, Battlefield V is one of the worst offenders, performance is all over the place sometimes dropping to the 30s even on a Vega 64 and no obvious CPU bottleneck. Sometimes, the CPU will bottleneck (very high CPU util, low GPU util) in scenes, sometimes it'll be an "unclear bottleneck" which I generally attribute to TB3 bandwidth or encoding, and sometimes it will run at full tilt and I can get my monitor's full 100fps. 

The fact that modern games are moving away from the old "load EVERYTHING into the GPU then start playing" paradigm is primarily the reason I've stopped trusting synthetic benchmarks completely for determining eGPU performance for gaming, since they all generally work old-school like that. They're still great for isolating the TB3 encoding bottleneck itself, but one look at how the internal PCIe bandwidth changes affect fps on @itsage's benchmark table vs. their actual game tests shows just how much texture streaming relies on PCIe bandwidth.

MAG341CQ 34" Ultrawide // Logitech G900 & HyperX Alloy FPS Pro (MX Blues) // Scarlett 2i4 + Yamaha HS7

 
2016 15" MacBook Pro (RP460) [6th,4C,H] + RX Vega 64 LC @ 32Gbps-TB3 (AKiTiO Node) + macOS 10.14 & Win10 [build link]  


ReplyQuote
adamk77
(@adamk77)
Eminent Member
Joined: 3 years ago
 
Posted by: P-Mac

Display resolution (aka rendering resolution, for games with built-in scaling) is independent of texture resolution. You can drag a 1080 Ti down to 30fps at 1080p if you slam it with a heavy ray tracing workload like the Star Wars Reflections GPU demo...the point I'm getting to is that your resolution may not be the primary factor in capping your performance if the card is still doing heavy shading work alongside constantly streaming high resolution textures in.

For me, Battlefield V is one of the worst offenders, performance is all over the place sometimes dropping to the 30s even on a Vega 64 and no obvious CPU bottleneck. Sometimes, the CPU will bottleneck (very high CPU util, low GPU util) in scenes, sometimes it'll be an "unclear bottleneck" which I generally attribute to TB3 bandwidth or encoding, and sometimes it will run at full tilt and I can get my monitor's full 100fps. 

The fact that modern games are moving away from the old "load EVERYTHING into the GPU then start playing" paradigm is primarily the reason I've stopped trusting synthetic benchmarks completely for determining eGPU performance for gaming, since they all generally work old-school like that. They're still great for isolating the TB3 encoding bottleneck itself, but one look at how the internal PCIe bandwidth changes affect fps on @itsage's benchmark table vs. their actual game tests shows just how much texture streaming relies on PCIe bandwidth.

Thanks for this awesome response. It was highly educational.

I think that this (built-in scaling) is what's going on with Monster Hunter World. I'm pretty sure I am being TP3 bandwidth constrained.

 2018 Mac Mini i7, 32GB RAM, 512GB SSD, 10Gbit ethernet
 Sonnet eGFX Breakaway Box 350W with Sapphire Vega 56 Pulse

 
2018 Mac Mini [8th,6C,B] + RX Vega 56 @ 32Gbps-TB3 (Sonnet Breakaway 350) + macOS 10.14.1 & Win10 1803 [build link]  


ReplyQuote
sxr71
(@sxr71)
Active Member
Joined: 3 years ago
 

How about give us our full 40gbps for a start? I was so disappointed to learn about reserved bandwidth for DisplayPort. I’m obviously not using it with an EGPU so why not give us a mode that just takes that bandwidth back? 

I understand intel wants to open source the standard and then maybe we might get firmware to do it?

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ReplyQuote
itsage
(@itsage)
Founder Admin
Joined: 5 years ago
Builds: 155
 

The ideal situation is to have a Thunderbolt Firmware Baking Tool. Users get to select max bandwidth and Power Delivery.

LinkedIn | Twitter | Facebook | Youtube | Instagram
 
external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2021 14" Microsoft Surface Laptop Studio [11th,4C,H] + RTX 2080 Ti @ 32Gbps-TB4 (WD_Black D50) + Win11 [build link]  


ReplyQuote
ataylor
(@ataylor)
Eminent Member
Joined: 3 years ago
 

Had a question. for those of us using say 3d rendering application.. how does that effect the math.. i.e octanebench. etc. Is it better to use the monitors hooked up directly to the Dedicated GPU "amd" and have say dual GTX cards headless doing CUDA stuff?
i don't currently have loopback adapters on those and just boot with my 2 external monitors connected to the radeon card of the macbookpro i9. 

2018 MacBookPro 15 in. /w Touchbar - i9 - 32gig ram - 1080 Aorus 1080 Gaming Box - Sonnet-Breakaway550 with GTX 1080 - Still need to get both of the GTX 980TI working for 4 GTX total cards


ReplyQuote
P-Mac
(@p-mac)
Trusted Member
Joined: 5 years ago
 
Posted by: ataylor

Had a question. for those of us using say 3d rendering application.. how does that effect the math.. i.e octanebench. etc. Is it better to use the monitors hooked up directly to the Dedicated GPU "amd" and have say dual GTX cards headless doing CUDA stuff?
i don't currently have loopback adapters on those and just boot with my 2 external monitors connected to the radeon card of the macbookpro i9. 

Computational workloads tend to have the least amount of performance loss as nearly all of the work is done “on the card” and the thunderbolt bus isn’t being thrashed too hard. 

Here’s an analogy: 

Pretend you have a calculator and are ready to do math problems, you’re the GPU. A video game workload with heavy streaming and bus usage would be like if there was a list of different math problems for you to solve, but you’re forced to listen to a friend (the CPU and game engine) tell you each problem one at a time over a walkie-talkie (the Thunderbolt bus). You’re gonna solve them a lot slower than you probably could, because you’re spending a lot of time waiting for your friend to tell you the next problem. 

In this analogy, computational workloads like 3D rendering would be as if you simply got a math test handed to you on a piece of paper; you can see all the problems all at once and can work through them as fast as you can. 

Most synthetic benchmarks like 3DMark, or especially Superposition, are like the latter. Everything the GPU will need to do its math is fully loaded onto it before it begins its work (for the most part)...that’s why you see worse performance loss in games vs. synthetics or computational workloads.

EDIT: to answer your monitor question, no, having a monitor attached to the eGPU shouldn’t affect the speed at which it does its workload since it’s not a video game and not trying to output a frame buffer directly to a monitor. 

This post was modified 3 years ago

MAG341CQ 34" Ultrawide // Logitech G900 & HyperX Alloy FPS Pro (MX Blues) // Scarlett 2i4 + Yamaha HS7

 
2016 15" MacBook Pro (RP460) [6th,4C,H] + RX Vega 64 LC @ 32Gbps-TB3 (AKiTiO Node) + macOS 10.14 & Win10 [build link]  


ReplyQuote
 of  11