So, why are we calling it "10Gbps-TB1"?  

 
Yukikaze
(@yukikaze)
Honorable Member Moderator

This is probably bugging me more than it should, but: In the implementations table, we use the following options (for Thunderbolt connections) in the Interface column:

10Gbps-TB1

16Gbps-TB2

32Gbps-TB3

Shouldn't 10Gbps be 8Gbps for TB1? Or shouldn't it be 20Gbps for TB2 and 40Gbps for TB3? Either it should list the interconnect speed (which is also less confusing to less technical users, since it matches what these interfaces advertise), or it should list the effective connection speeds, but it should not vary the two between the different TB generations.

Edited: 6 months  ago

My eGPU Zoo - Link to my Implementations.

"Always listen to experts. They'll tell you what can't be done, and why. Then do it."- Robert A. Heinlein, "Time Enough for Love."

ReplyQuote
Posted : November 30, 2017 7:10 pm nando4 liked
nando4
(@nando4)
Noble Member Admin

The 20Gbps-TB2, 40Gbps-TB3 is Intel marketting talk. This is unattainable PCIe bandwidth since the encapsulating PCIe bandwidth is less.

So 10Gbps-TB1, 16Gbps-TB2 and 32Gbps-TB3 is the lesser of the PCIe link and TB channel link bandwidth. This work to:

1. buck the Intel 20Gbps, 40Gbps terminology drawing user attention as to why. Intel does not clearly denote that 40Gbps-TB3 is actually 22Gbps-TB3 of PCIe bandwidth. They do say that a TB3 controller takes "up to 4-lanes of PCIe 3.0 as input".

2. Infers PCIe lanes as a reference point for TB2/TB3 bandwidth. Important because:

- is often a discussion point. eg: 4-lane TB3, 2-lane TB3, GT4/GT2 OPI.
- gives a comparison point back to other direct PCIe interfaces (eg: M.2, EC) where any measured underperformance points back to the TB controller functionality. 

3. More closely approximates the H2D bandwidth ratio between these three TB interfaces. TB3 gives 2.8x TB1 H2D bandwidth, not 4x.

Please see the following link for measured interface performance:

https://egpu.io/external-gpu-implementations-table/#perf

Edited: 6 months  ago

eGPU Port Bandwidth Reference TableeGPU Setup 1.35

ReplyQuote
Posted : November 30, 2017 11:34 pm
Yukikaze
(@yukikaze)
Honorable Member Moderator

I know all this, and yo are right!

Then 10Gbps-TB1 should be 8Gbps-TB1, which is the point I am making here. There is no 10Gbps of PCIe BW across TB1!

Edited: 6 months  ago

My eGPU Zoo - Link to my Implementations.

"Always listen to experts. They'll tell you what can't be done, and why. Then do it."- Robert A. Heinlein, "Time Enough for Love."

ReplyQuote
Posted : November 30, 2017 11:53 pm
nando4
(@nando4)
Noble Member Admin

Yukikaze, TB1 connects at x4 2.0. It is limited by the TB1 controller to 10Gbps hence why it is 10Gbps-TB1.

It has more than x2 2.0 (8Gbps) of bandwidth across it as can be seen at by the comparison of 10Gbps-TB1 and 8Gbps-M2 at:

https://egpu.io/external-gpu-implementations-table/#perf

I personally confirmed that 10Gbps-TB1 is more bandwidth than x2 2.0 when doing comparisons of my old BPlus TH05 (neutered x2 2.0 TB1 controller link) against TB1 enclosures connecting at x4 2.0. I won't point you were to find that info as you know already 🙂

Edited: 6 months  ago

eGPU Port Bandwidth Reference TableeGPU Setup 1.35

ReplyQuote
Posted : December 1, 2017 12:00 am 3RYL and 4chip4 liked
goalque
(@goalque)
Prominent Member Admin

Thunderbolt carries video (DP) data as well. A single 4K display does not have influence on 22Gbps PCIe data bandwidth but two 4K displays through TB3 leaves only 8Gbps download for PCIe data.

Source:  https://sgcdn.startech.com/005329/media/sets/TB31PCIEX16_Manual/TB31PCIEX16_PCIe_Thunderbolt_3_manual.pdf

Edited: 6 months  ago

automate-eGPU.shapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10

ReplyQuote
Posted : December 1, 2017 12:16 am chx, 4chip4 and nando4 liked
goalque
(@goalque)
Prominent Member Admin

More detailed explanation:

http://en.community.dell.com/support-forums/laptop/f/3518/t/20017807
https://thunderbolttechnology.net/tech/faq (What are the video formats supported by Thunderbolt 3?)

"A Thunderbolt 3 port can work as a regular USB-C port, but when running in Thunderbolt mode, it can carry up to 8 lanes of DisplayPort 1.2 (i.e. two full-bandwidth outputs), and up to 4 lanes of PCIe Gen 3. Thunderbolt docks that offer USB connectivity do so by incorporating a PCIe-based USB 3.1 controller. Note that the capabilities of a given system’s Thunderbolt 3 interface will depend on how many GPU outputs and how many PCIe lanes are wired to its Thunderbolt 3 controller, although the spec mandates at least 1 full DisplayPort 1.2 output (4 lanes) and 2 PCIe Gen 3 lanes. The way this can all be carried simultaneously over just 4 USB-C lanes is that the Thunderbolt controller multiplexes the DisplayPort and PCIe signals into just a “Thunderbolt signal” before sending it out of the USB-C connector, then the device on the other end de-multiplexes it as needed. Those of you doing some quick math here may have realized that Thunderbolt 3’s maximum 40 Gbps rate is not high enough to handle both traffic types running their respective max bandwidth simultaneously. Dual 4K @ 60 Hz displays would consume ~32 Gbps all on its own, for example, and PCIe Gen 3 x4 is another 32 Gbps. Note however that Thunderbolt can carry 40 Gbps in each direction simultaneously – so for example you could theoretically use dual 4K displays (consuming 32 Gbps of only transmit bandwidth) while also receiving data from a PCIe-based capture device at its full 32 Gbps. When there simply isn’t enough bandwidth to meet demand, Thunderbolt 3 gives priority to display traffic, then PCIe."

"PCIe-based capture device at its full 32 Gbps" is not correct. It is up to 22 Gbps.

However, the question is... what happens when there is an eGPU (PCIe 3.0 x4) between the TB3 controller and 5K or dual 4K displays? Can we benefit from the 8 lanes of DP 1.2 protocol in that situation as well?

automate-eGPU.shapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10

ReplyQuote
Posted : December 1, 2017 6:03 pm ed_co and 4chip4 liked
joevt
(@joevt3)
Active Member
Posted by: goalque

However, the question is... what happens when there is an eGPU (PCIe 3.0 x4) between the TB3 controller and 5K or dual 4K displays? Can we benefit from the 8 lanes of DP 1.2 protocol in that situation as well?

I'm not sure what you mean. If your eGPU is connected to the first port of a Thunderbolt 3 controller, then the controller can still send 8 lanes of DP 1.2 over the second Thunderbolt port. If you mean that there should be more bandwidth for the first port because it is connected to an eGPU which doesn't use any of the 8 lanes of DP 1.2, then I don't think it will go over 22 Gbps anyway. It's the same with any PCIe device connected by Thunderbolt. Maybe Intel could fix the firmware to allow greater than 22 Gbps, if that is not the real limit of PCIe data over Thunderbolt because of limitations in the controller hardware.

 

ReplyQuote
Posted : December 4, 2017 5:30 am
goalque
(@goalque)
Prominent Member Admin
Posted by: joevt

I'm not sure what you mean. If your eGPU is connected to the first port of a Thunderbolt 3 controller, then the controller can still send 8 lanes of DP 1.2 over the second Thunderbolt port. If you mean that there should be more bandwidth for the first port because it is connected to an eGPU which doesn't use any of the 8 lanes of DP 1.2, then I don't think it will go over 22 Gbps anyway. It's the same with any PCIe device connected by Thunderbolt. Maybe Intel could fix the firmware to allow greater than 22 Gbps, if that is not the real limit of PCIe data over Thunderbolt because of limitations in the controller hardware.

I meant the following topology:

1) TB3 MBP <-> TB3 cable <-> TB3 controller <-> direct DP interface (as with TB31PCIEX16) <-> DP cable <-> 4K display

I/O bandwidth remaining: 22Gbps (download) / 22Gbps (upload)

2) TB3 MBP <-> TB3 cable <-> TB3 controller <-> PCIe 3.0 x4 slot <-> PCIe 3.0 backplane with x16 slot <-> eGPU <-> DP cable <-> 4K display

I/O bandwidth remaining?

automate-eGPU.shapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10

ReplyQuote
Posted : December 4, 2017 1:09 pm ed_co liked
joevt
(@joevt3)
Active Member
Posted by: goalque

I meant the following topology:

1) TB3 MBP <-> TB3 cable <-> TB3 controller <-> direct DP interface (as with TB31PCIEX16) <-> DP cable <-> 4K display

I/O bandwidth remaining: 22Gbps (download) / 22Gbps (upload)

2) TB3 MBP <-> TB3 cable <-> TB3 controller <-> PCIe 3.0 x4 slot <-> PCIe 3.0 backplane with x16 slot <-> eGPU <-> DP cable <-> 4K display

I/O bandwidth remaining?

1) 16 Gbps is being used by DP (for 4K) but TB3 has 40 Gbps total and gives up to 22 Gbps to PCIe I/O

2) If the computer is not sending textures or whatever to/from the graphics card, then you have a large percent of the PCIe 22 Gbps still available (upload and download) and you have the rest available for DP from your internal GPU (as in situation #1) without impacting the eGPU. Download from Thunderbolt is impacted if you're sending video output from the eGPU to your internal graphics card because the graphics buffer copy is sent using the 22 Gbps PCIe bandwidth for each frame that is copied. This is assuming the information at  https://egpu.io/thunderbolt-3-news-for-egpus/#tb3perf is correct.

 

ReplyQuote
Posted : December 4, 2017 2:38 pm goalque liked
goalque
(@goalque)
Prominent Member Admin
Posted by: joevt

2) If the computer is not sending textures or whatever to/from the graphics card, then you have a large percent of the PCIe 22 Gbps still available (upload and download) and you have the rest available for DP from your internal GPU (as in situation #1) without impacting the eGPU.

So are you saying that when the eGPU is just displaying desktop graphics, and I am not transferring any data, the 22Gbps PCIe data part stays mostly untouched because of the DP data/PCIe data separation?

In the above situation (2), if we connect two 4K displays (DP) to the eGPU, would the remaining H2D PCIe data bandwidth be 8Gbps (40Gbps - 2 x 16Gbps) and the D2H PCIe data part (22Gbps) would not be affected?

For some reason, Startech document states 8Gbps as "download" whereas Intel graph indicates H2D direction (upload).

https://thunderbolttechnology.net/sites/default/files/Thunderbolt3_TechBrief_FINAL.pdf

Intel says that "Two links of (4 lane) DisplayPort 1.2 consume 2x (4 x 5.4 Gbps) or 43.2 Gbps", so one DP 1.2 has up to 21.6Gbps bandwidth but the actual video data rate @60Hz [4K 30bpp] is 16Gbps (as you mentioned).

https://www.amd.com/Documents/50279_AMD_FirePro_DisplayPort_1-2_WP.pdf

Edited: 6 months  ago

automate-eGPU.shapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10

ReplyQuote
Posted : December 4, 2017 8:21 pm
joevt
(@joevt3)
Active Member
Posted by: goalque

So are you saying that when the eGPU is just displaying desktop graphics, and I am not transferring any data, the 22Gbps PCIe data part stays mostly untouched because of the DP data/PCIe data separation?

Yes. It's the same situation where you move a graphics card from the CPU slot to a PCH slot. You can still get full performance from an NVMe device that is also in a PCH slot. A PCH slot is limited by the DMI link which is similar to PCIe 3.0 x4 or an NVMe slot.

Posted by: goalque

In the above situation (2), if we connect two 4K displays (DP) to the eGPU, would the remaining H2D PCIe data bandwidth be 8Gbps (40Gbps - 2 x 16Gbps) and the D2H PCIe data part (22Gbps) would not be affected?

No. Multiple display connections to an eGPU just means bigger screen real-estate and does not increase PCIe traffic much or at all so the bandwidth remains unchanged. The Thunderbolt controller has no idea how many displays are connected to the eGPU and therefore cannot adjust itself with that information. DisplayPort traffic is not sent over Thunderbolt unless a DisplayPort device is connected to a Thunderbolt port in the chain of devices. A DisplayPort device connected to an eGPU (graphics card) is not connected to the Thunderbolt. An eGPU box might have another Thunderbolt port (I think Intel frowns on that). A DisplayPort device connected to that port would affect PCIe bandwidth if the DisplayPort traffic exceeds 18 Gbps since Thunderbolt allows up to 22 Gbps for PCIe bandwidth.

Posted by: goalque

For some reason, Startech document states 8Gbps as "download" whereas Intel graph indicates H2D direction (upload).

https://thunderbolttechnology.net/sites/default/files/Thunderbolt3_TechBrief_FINAL.pdf

Yes, it's confusing if you don't define your source and destination when using terms like download and upload and receive and transmit, etc. I think H2D and D2H are more descriptive, but you may need to define the host and device to be absolutely clear.

Posted by: goalque

Intel says that "Two links of (4 lane) DisplayPort 1.2 consume 2x (4 x 5.4 Gbps) or 43.2 Gbps", so one DP 1.2 has up to 21.6Gbps bandwidth but the actual video data rate @60Hz [4K 30bpp] is 16Gbps (as you mentioned).

https://www.amd.com/Documents/50279_AMD_FirePro_DisplayPort_1-2_WP.pdf

DisplayPort uses 10 bits per 8 bit symbol (similar to PCIe 1.0 and PCIe 2.0), so that 21.6Gbps is actually 17.28Gbps (20% drop). PCIe 3.0 uses 130 bits per sixteen 8 bit symbols (128 bits) (1.5% drop). DisplayPort sends "stuffing symbols" during the horizontal and blanking areas as well as between pixels to fill the 17.28 Gbps bandwidth (or whatever the main link is using - 1, 2, or 4 lanes at RBR:1.62, HBR:2.7, HBR2:5.4, HBR3:8.1 Gbps). When transmitted as Thunderbolt, the Thunderbolt controller excludes all the stuffing symbols giving more room for PCIe 3.0 traffic. The receiving Thunderbolt controller will add stuffing symbols for DisplayPort output. The 16Gbps for 4K and 22 Gbps for 5K given in Thunderbolt3_TechBrief_FINAL.pdf shows that the vertical and horizontal blanking pixels are excluded from the used video bandwidth.

For example:

4096 * 2180 pixels/frame * 30 bits/pixel * 60 frames/second = 15.9 Gbps for DisplayPort over Thunderbolt.

If you add the horizontal and vertical blanking pixels (which are transported over DisplayPort as mostly stuffing symbols plus maybe a stream attribute packet containing the image height, width, etc. of the main video stream) then:

4256 * 2222 * 30 * 60 = 17.02 Gbps for DisplayPort, the rest of the 17.28 Gbps is filled with stuffing symbols.

Or you could use the pixel clock:

567.31 MHz * 30 bits = 17.02 Gbps

 

Edited: 6 months  ago
ReplyQuote
Posted : December 5, 2017 6:07 am chx and goalque liked
 chx
(@chx)
Trusted Member

 if you're sending video output from the eGPU to your internal graphics card because the graphics buffer copy is sent using the 22 Gbps PCIe bandwidth for each frame that is copied. This is assuming the information at  https://egpu.io/thunderbolt-3-news-for-egpus/#tb3perf is correct.

 

blink you are sending rendered frames over the PCIe bus? Who receives those? How? What?

ReplyQuote
Posted : December 11, 2017 12:37 am
joevt
(@joevt3)
Active Member
Posted by: chx

blink you are sending rendered frames over the PCIe bus? Who receives those? How? What?

Graphics rendered by an eGPU can only be shown on your laptop's built in display if the OS copies the contents from the eGPU to the frame buffer of your laptop's GPU. This can only happen over the PCIe bus through Thunderbolt using part of Thunderbolt's alloted device to host PCIe bandwidth which won't be affected by DisplayPort bandwidth because DisplayPort can only be host to device but it will be affected by and affects reads from another device such as a hard drive on the same Thunderbolt bus.

https://egpu.io/how-to-egpu-accelerated-internal-display-macos/

ReplyQuote
Posted : December 11, 2017 4:04 pm
 chx
(@chx)
Trusted Member
Posted by: joevt
 
Graphics rendered by an eGPU can only be shown on your laptop's built in display if the OS copies the contents from the eGPU to the frame buffer of your laptop's GPU. This can only happen over the PCIe bus through Thunderbolt using part of Thunderbolt's alloted device to host PCIe bandwidth [...]  and affects reads from another device such as a hard drive on the same Thunderbolt bus.

https://egpu.io/how-to-egpu-accelerated-internal-display-macos/

Ah so memory copy between two PCIe devices, sure, that can work, I guess that's how bridgeless SLI works too. I got very confused and didn't consider the built in GPU still having a role although it's quite interesting that the framebuffer has such n iron clad standard this can work, usually two GPU can't agree on the name of the day. Also, this doesn't affect the eGPU bandwidth itself because it's bidirectional, does it?

Edited: 6 months  ago
ReplyQuote
Posted : December 11, 2017 9:26 pm
joevt
(@joevt3)
Active Member
Posted by: chx

I got very confused and didn't consider the built in GPU still having a role

The built in display is connected to the built in GPU. The pixels must therefore come from a frame buffer of the built in GPU. The eGPU is doing all the drawing because it's better at doing that (faster 3D performance). The eGPU is chosen by the software (such as a game) to do the drawing. The eGPU does the drawing in it's own buffers. Therefore, a method of getting the pixels from the eGPU to the GPU must exist.

Think about what happens when a window overlaps two displays. There are two options:

  1. A draw command is done twice, once for each display.
  2. A draw command is done once, and the pixels are copied for the other display.

Option (a) happened a lot in classic Mac OS. Option (b) is what we're talking about. It's used by apps and games targeting a specific GPU.

Posted by: chx

it's quite interesting that the framebuffer has such n iron clad standard this can work, usually two GPU can't agree on the name of the day.

It's done in a higher level than that - in software (drivers) instead of hardware. The drivers are the standard. The drivers must provide certain API's to do drawing and copying. How the hardware works doesn't matter. The drivers probably ask for a certain format of the pixels. It's a simple task for the eGPU's drivers to provide that format, possibly doing some conversion using the eGPU. How the transfer of data occurs doesn't matter (DMA, or CPU copies, or whatever) - though some methods are more efficient, all the pixels must come over the PCIe/Thunderbolt bus.

Posted by: chx

Also, this doesn't affect the eGPU bandwidth itself because it's bidirectional, does it?

The OS asks the eGPU (using the drivers) to send the pixels to a buffer that can be used by the built in GPU (maybe a buffer in RAM that is sent to the built in GPU by a call to the built-in GPU's drivers). That takes time, a tiny bit of bandwidth to the eGPU (the commands from the driver), and a lot of bandwidth from the eGPU (the pixels). Thunderbolt is bidirectional which means receive and transmit use different paths so they don't interfere.

ReplyQuote
Posted : December 12, 2017 7:28 am