Nvidia eGPU MBP TB3 port underperformance: 16xxMiB/s instead of 22xxMiB/s under macOS or Windows+apple_set_os.efi
|CUDA-Z H2D bandwidth MiB/s
|macOS||macOS, hotplugged at boot||Windows via apple_set_os.efi||Windows, no apple_set_os.efi|
|13" MBP (2017)||16xx||-||16xx||22xx||lexine|
|13" MBP (2016)||-||22xx (not verified)||-||22xx||Smackintosh|
|13" MBP (2016)||16xx||-||16xx||-||goalque|
|15" MBP (2016)||-||-||-||22xx||VOD|
|15" MBP (2017)||16xx||-||-||22xx||Max Pham|
|15" MBP (2017)||-||-||16xx||22xx||chaosmage|
We encounter yet another TB3 performance quirk which I bring to the attention of our Macbook eGPU community and Apple. We see 2016 or newer 13" and 15" TB3 Macbooks with Nvidia eGPUs whose CUDA-Z H2D performance maxes out at 16xxMiB/s (13.42Gbps).
- 13.42Gbps is notably lower than the 22Gbps PCIe bandwidth specced by Intel for Thunderbolt 3.
- AMD eGPUs are not affected (ref: goalque for Windows test, itsage for macOS test).
This occurs when these systems are used with:
- Windows but booted as a fake macOS using apple_set_os.efi
It seems that I am getting some H2D bandwidth degradation when I am using the apple_set_os.eif instead of hot-plugging at Windows boot logo.
Host to Device – apple_set_os.efi: Host to Device – hot-plug at Windows boot:
So affected users can consider a way to bypass apple_set_os.efi when booting Windows:
- 2016+ 13″ MBPs have an iGPU so have only required apple_set_os.efi to bypass hotplugging.
-> here's how to manually hotplug instead.
- 2016 15″ MBPs with a DSDT override can have error 12 solved without apple_set_os.efi, but would require an external LCD as wouldn’t get an active iGPU. The boot process there avoids needing gpu-switch too. 2017+ 15″ MBPs have a factory ‘large memory’ so do not need a DSDT override to achieve it as explained.
macOS and Windows+apple_set_os.efi both have underperformance
We can use the TB3 bandwidth recorded at the Interface Performance Reference Table as a standard.
Under macOS, I have not see any 32Gbps-TB3 CUDA-Z results in the 22xxMiB/s range. They have always been in the 16xxMiB/s or lower range, matching what this apple_set_os.efi Windows boot report tells us.
Seeking a refund may see Apple fix this problem
Requests for partial or full refunds from Apple for underperformance may see Apple fix this in a timely manner.
Why? Their TB3 ports are only delivering 16xxMiB/s (13.42Gbps) H2D bandwidth to Nvidia cards and is not coming close to the 22Gbps PCIe bandwidth specced by Intel when being used under macOS.
You can prove this by running CUDA-Z on your eGPU or attaching fast external TB3 SSD storage and benchmarking it. In either case, the very important host-to-device (memory write) result will peak at 16xxMiB/s.
These are interesting news.
My question is: Is this related to the half-bandwidth issue caused by the TI83 version observed with thunderbolt3 enclosures? I also use apple_set_os.efi in my setup but with a thunderbolt 2 connection and an Akitio Thunder 2. apple_set_os.efi is a must for me because my MBP is equipped with a dGPU. I didn't notice any performance impact in my benchmarks however (25% bandwidth loss would bring me close to the bandwidth of the thunderbolt 1 interface), but this might be difficult to affirm.
While I don't have windows installed on my internal drive (so I can't test hot plugging). I can compare cudaz on OSX vs cudaz on windows from my modified clover (which does apple_set_os.efi). Both are on the "slower" port of my 2016 13" MBP (upper right port).
@ yifanlu, can you maneuover your enclosure such that can do CUDA-Z benchmark with the faster ports? Then would be useful to have three CUDA-Z results: using macOS, Windows via apple_set_os.efi and Windows (hotplugged).
I’m guestimating that Apple too may be reserving bandwidth for phantom USB devices, just as we saw with the Mantiz Venus static allocation of bandwidth. This should not affect TB2 users but you never know. Thunderbolt bandwidth has been full of surprises.
Very interesting. Host to device showed a slight increase as expected but device to host doubled! Unfortunately though that doesn't necessarily mean FPS will double because it doesn't matter how much data can be sent "back" if the amount of data to process is constrained.
I just remembered something. When I ran overwatch without turning off the internal display, I would get only 45FPS instead of 60. If the iGPU were driving the internal display, that will not be the case, right? Can people test the haven benchmark with hot plugging vs. apple_set_boot.efi crossed with internal display turned off vs internal display turned on? Four configurations.
The performance degradation is quite noticeable. Battlefield 1 dips down to ~50 FPS occasionally at 1920x1200 Ultra.
No performance degradation on TB2 Macbook Pro 15 Late 2013. Tested on external monitor.
Enclosure: Akitio Node fw 23.1
GPU: Radeon Rx 580
booting via apple_set_os.efi
booting without apple_set_os.efi
I actually see the same performance issue where
Windows in Bootcamp: CUDA-Z shows HtD as 22xx Mib/s and DtH as 26xx Mib/s
Was this an issue that was introduced recently, or was just discovered now? I'm out of the loop with TB3 (damn :o)..
"Desultory reading is delightful, but to be beneficial, our reading must be carefully directed." — Seneca
Author: kryptonite ✧ purge-wrangler ✧ tbt-flash ✧ purge-nvda ✧ set-eGPU
Insights Into macOS Video Editing Performance
Launching Apps on Specific (e)GPUs on macOS
2014 15-inch MacBook Pro 750M
2018 15-inch MacBook Pro