A Call For Measurements: Isolating the Thunderbolt Effect.
 
Notifications
Clear all

A Call For Measurements: Isolating the Thunderbolt Effect.  

 of  19
  RSS

itsage
(@itsage)
Illustrious Member Admin
Joined: 4 years ago
 

It's the same system. You can click on the numbers to see the screen captures. I noticed that the eGPU tests were ran with an older Radeon drivers (from the week before). I will find time to run it again with the same version Radeon drivers.

external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2020 13" MacBook Pro [10th,4C,G] + RTX 2080 Ti @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 2004 [build link]  


ReplyQuote
switch
(@switch)
Estimable Member
Joined: 3 years ago
 

These numbers suggest that it's not actual bandwidth that's a bottleneck, but the TB overhead (not to mention the PCH), coupled with the fact that full-speed TB3 is actually PCI-e x2, not x4, due to DP and USB-C reservation.

2016 15" HP ZBook 15 G3 (Xeon E3-1545M, Iris Pro p580, no dGPU) + [email protected] (Aorus Gaming Box) + Win10pro

 
2018 15" HP ZBook 15 G5 (Q P1000) [8th,6C,H] + GTX 1070 @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 [build link]  


itsage liked
ReplyQuote
nando4
(@nando4)
Noble Member Admin
Joined: 4 years ago
 
Posted by: itsage

I followed Nando's recommendation to tape the GPU contact pins so that I could run it as dGPU on the Z170 Test Bench at different PCIe speeds. It was an interesting experiment. I used GPU-Z to confirm the card was running at the speed I wanted. Invisible tape was very handy and made for an easy cleanup, just in case you want to replicate this test yourself.

...

Here are the results of the Radeon RX 580 ran as dGPU (top PCIe slot) at x16, x4, x2, and x1. I’m also adding the x4 TB3 eGPU results for quick reference.

Radeon RX 580 x16 PCIe 3.0 x4 PCIe 3.0 x2 PCIe 3.0 x1 PCIe 3.0 x4 TB3 eGPU
           
Unigine Valley 56.6 FPS 56.3 FPS 55.4 FPS 53.5 FPS 50.8 FPS
Unigine Heaven 56.8 FPS 56.7 FPS 56.3 FPS 55.0 FPS 49.2 FPS
Unigine Superposition 64.6 FPS 64.7 FPS 64.4 FPS 63.8 FPS 55.3 FPS
3DMark Time Spy 30.4 FPS 30.2 FPS 30.0 FPS 29.3 FPS 27.4 FPS
3DMark Fire Strike 68.2 FPS 67.6 FPS 67.5 FPS 66.5 FPS 57.0 FPS
           
Rise of the Tomb Raider 60.0 FPS 60.0 FPS 60.0 FPS 59.6 FPS 58.1 FPS
Tom Clancy’s Ghost Recon 70.8 FPS 67.8 FPS 61.4 FPS 49.1 FPS 41.5 FPS
Shadow of Mordor 101.9 FPS 100.5 FPS 99.8 FPS 94.4 FPS 83.3 FPS

 @itsage, this is a significant test result. We see that even x1 3.0 dGPU direct PCIe benchmarks has better results than x4 3.0 TB3.  With these sample benchmarks bandwidth isn't the drag on the FPS and conversely nor would PCH vs CPU connectivity since x1 3.0 isn't going to bottleneck DMI.

Instead,  as advise privately by a vendor doing TB3 testing,  here we are seeing it's the Thunderbolt 3 controller's encoding/decoding speed that is causing the performance loss.

I hope Intel improve encoding/decoding speed  in Thunderbolt4 along with the other TB3 performance detractors we've noted.

eGPU Setup 1.35    •    eGPU Port Bandwidth Reference Table

 
2015 15" Dell Precision 7510 (Q M1000M) [6th,4C,H] + GTX 1080 Ti @32Gbps-M2 (ADT-Link R43SG) + Win10 1803 [build link]  


mac_editor, 4chip4, 3RYL and 2 people liked
ReplyQuote
Sky11
(@sky11)
Reputable Member
Joined: 3 years ago
 
Posted by: nando4
Posted by: itsage

I followed Nando's recommendation to tape the GPU contact pins so that I could run it as dGPU on the Z170 Test Bench at different PCIe speeds. It was an interesting experiment. I used GPU-Z to confirm the card was running at the speed I wanted. Invisible tape was very handy and made for an easy cleanup, just in case you want to replicate this test yourself.

...

Here are the results of the Radeon RX 580 ran as dGPU (top PCIe slot) at x16, x4, x2, and x1. I'm also adding the x4 TB3 eGPU results for quick reference.

 @itsage, this is a significant test result. We see that even x1 3.0 dGPU direct PCIe benchmarks has better results than x4 3.0 TB3.  This is showing that bandwidth isn't the drag on the Thunderbolt 3 benchmarks, nor would PCH vs CPU connectivity since x1 3.0 isn't going to bottleneck DMI.

Instead, as advise privately by a vendor doing TB3 testing, it's the Thunderbolt 3 controller's encoding/decoding speed that is causing the performance loss. Your results support this.

I hope Intel improve encoding/decoding speed  in Thunderbolt4 along with the other TB3 performance detractors we've noted.

It is not a trivial task guys... may require both silicon and software optimizations... and big hairy hand to push for make that happen... 

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

nando4 and itsage liked
ReplyQuote
ed_co
(@ed_co)
Reputable Member
Joined: 3 years ago
 

You should re activate this thread with new X299 TB3 rigs!! And see if there is any mobo which performs TB3-CPU connection. Cheers.

2017 15" MacBook Pro (RP560) [7th,4C,H] + GTX 1080 Ti @ 32Gbps-TB3 (Mantiz Venus) + macOS 10.13 & Win10 [build link]  

ReplyQuote
itsage
(@itsage)
Illustrious Member Admin
Joined: 4 years ago
 

I saw this Gigabyte X299 Designare motherboard at CES. It most likely has TB3-CPU arrangement. It's one of the very few that has dual Thunderbolt 3 ports and dual DisplayPort In. Someone who's looking to build a Hackintosh to use the LG 5K Ultrafine or TB3 display can do so with this motherboard.

external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2020 13" MacBook Pro [10th,4C,G] + RTX 2080 Ti @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 2004 [build link]  


ReplyQuote
itsage
(@itsage)
Illustrious Member Admin
Joined: 4 years ago
 

@ed_co The older X99 Designare has TB3«»CPU and it's a very good value atm. I just upgraded my TB3 test bench to this motherboard. Details are here.

external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2020 13" MacBook Pro [10th,4C,G] + RTX 2080 Ti @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 2004 [build link]  


ReplyQuote
NCC74656
(@ncc74656)
Eminent Member
Joined: 3 years ago
 

I'm looking at a new laptop that is compatible with m.2 and tb3.

so from what i read here the performance between a m.2 and tb3 is rather negligible... so with a newer laptop that can do either one might see a small (few fps) boost with m.2 but the compatibility and setup of TB3 would make things easier than the M.2 so perhaps the m.2 isn't worth the effort?

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ReplyQuote
karatekid430
(@karatekid430)
Estimable Member
Joined: 3 years ago
 

@nando4

These are the critical CUDA-Z host-to-device (H2D) values I've seen posted here and elsewhere that can be used to gauge relative bandwidth to each other:

x4 3.0 M.2 = 2940MiB/s = 24.66Gbps
TB3 = 2260MiB/s = 18.96Gbps
TB2 = 1250MiB/s = 10.49Gbps
TB1 = 790MiB/s = 6.63Gbps
EC2 = 380MiB/s =  3.19Gbps

I have measured the full 22Gbps = 2750MB/s on Thunderbolt 3 to my eGPU. I know M.2 is 3.94GB/s theoretically, and we have the Samsung 970 Pro advertising 3500MB/s reads. So these are a little off. Just curious though, what is EC2 aside from Elastic Compute Cloud (Amazon)? It does not seem to match the context of ports.

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ReplyQuote
karatekid430
(@karatekid430)
Estimable Member
Joined: 3 years ago
 

Okay, so do we have word from Intel about this? I was emailing Mika Westerberg from Intel about the Linux Thunderbolt driver, and even he did not know about the 22Gbps limit and asked me for sources. So it is clearly not widely publicised - Intel probably does not want people to realise that the comparison of 40Gbps to USB 3.1 10Gbps is off.

The best explanation so far is for dual-port controllers. Somebody said it is reserving 10Gbps for USB on the other port. But that does not explain why they cannot compete for bandwidth - at least it will all be there if only one is used at a time. Also, why do they have to reserve USB bandwidth and not PCIe bandwidth? Why could it not be different for single port controllers? Why could there not be an override?

They are going to be screwed with USB 3.2 - trying to reserve 20Gbps for USB, if that is the case.

My proposed solution would be to enable 32Gbps for devices that do not utilise DisplayPort - such as my eGPU, and my Dell Thunderbolt 3 NVMe SSD (has the exact same M.2 NVMe 512GB Toshiba in their Dell 9370). There would be no downside for such devices.

Perhaps Intel did not want to confuse people. But honestly, a lot of people find USB-C confusing, anyway. So I do not see the problem. This is what has to happen in technology, and the faster we learn to embrace the changes, the happier we will be. If you embrace the changes whole-heartedly and quickly, the painful transition period gets over and done with. USB-C is only about dongles whilst you are holding onto the legacy stuff. Once you ditch the legacy stuff, the only cable you will ever need is the USB-C to USB-C (or Thunderbolt 3 certified equivalent).

I got my hands on the MSI Thunderbolt M3 AIC - extremely difficult, and I only recently realised it even exists. JHL6540, NVM 19 reported in Linux, and has a lot of resemblance to the Asrock AIC. It arrived, and soon after both USB-C ports had fallen off the PCB. Not soldered properly. However, it does have a JTAG header so if anybody has any ideas how to modify the NVM firmware to help this situation, I would be willing to give it a shot. I will try to solder the ports back on in the mean time.

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ikir and itsage liked
ReplyQuote
 of  19