Linux Wayland - Need to know current state, how-to and if it benefits eGPU  

  RSS

karatekid430
(@karatekid430)
Estimable Member
Joined: 1 year  ago
Posts: 134
July 18, 2018 4:10 am  

Hello, fellow forum-goers!

I have been messing around with Thunderbolt 3 and eGPUs *a lot*. In particular, Linux. Windows more or less works. I have yet to come across a computer with resource errors. The only time I got resource errors was with Gigabyte Z170X-Designare when I had the built-in controller and a Thunderbolt AIC running simultaneously (4-ports, insane). But that was fixed by going into BIOS and specifying the correct amounts of PCI bridge memory and prefetchable memory correctly (took a lot of experimenting - allocating too much destabilised the system). I think in order of appearance in the BIOS menu, the quantities that worked were 256, 384, 512, 768MB. That allowed R9 Nano eGPU to work on any of the four ports, all with Intel GPU and internal AMD Radeon R9 Nano enabled (I have two cards).

On the other hand, Linux does not suffer from these issues. According to my friend (he cannot remember his source), Windows still botches resource allocation to try to maintain compatibility for 36-bit PAE applications. Pathetic. However, it does suffer from using X11 which is an absolute dinosaur. I do enjoy being able to pipe X windows over TCP/IP, but I have come to discover that X11 is totally messed up. Not just with eGPUs - but more generally, any setup with monitors running off multiple GPUs, including laptop switchable graphics.

With no xorg configuration files in /etc/X11/…., it manages to autoconfigure to a working state, normally, which is useful for eGPUs where the configuration changes (on the other hand, if you specify configuration, and it cannot find said device on said bus, it just implodes, instead of ignoring and moving on). However, it does seem to select a primary GPU. If you change monitor configuration to only display on the monitor on the primary GPU, it runs perfectly smooth - even with eGPUs. If you mirror, it becomes a tiny bit jumpy. If you extend, it is a little slower on the monitor on the secondary GPU. If you select the monitor on the secondary GPU alone, it becomes a literal slideshow (sub-1 FPS and completely unusable). I am guessing it is rendering on the primary GPU which goes into low power mode when its monitor is switched off, making it unusable.

This is clearly not how it should be done - ideally it should render the content for each monitor on the GPU driving that monitor. But will Wayland fix this? It has taken years and it is still in its infancy. Any guides on how to enable Wayland seem hazy, and they don't always seem to work out when you follow them. Can anybody comment on this, specifically in the context of Ubuntu and Arch Linux? Like what packages are available, supported browsers, desktop environments, etc?

Also, how do I set Linux up with no trace of X11 remaining? Is there a Wayland only setup with functional desktop environment and browser yet? If X11 is still installed, it has a knack of interfering, in my experience. Is the only desktop environment Weston? Or do some of the others claiming to use Wayland actually work? Whenever I install them and query which type it is using, it either says X11 or an indeterminate answer.

Anyways, this is one of three things needed to get eGPU working correctly on Linux.
1) Fix AMD Fiji ATOM BIOS error in dmesg which requires it to fail on one Thunderbolt port, and pass when moved to the other one (no idea what causes it and why it works on the other port).
2) Fix amdgpu.ko to handle surprise removal properly. The Windows driver does it fine, so why does the Linux driver have to crash the system?
3) Wayland support needs to mature (assuming it fixes the problems with X11).

Or, feel free to share your experiences regarding Linux and eGPU / switchable graphics!

Cheers!


theitsage liked
ReplyQuote
(@timur_kristof)
Active Member
Joined: 8 months  ago
Posts: 17
July 23, 2018 2:23 pm  
Posted by: karatekid430

Any guides on how to enable Wayland seem hazy, and they don't always seem to work out when you follow them. Can anybody comment on this, specifically in the context of Ubuntu and Arch Linux? Like what packages are available, supported browsers, desktop environments, etc?

Also, how do I set Linux up with no trace of X11 remaining? Is there a Wayland only setup with functional desktop environment and browser yet? If X11 is still installed, it has a knack of interfering, in my experience. Is the only desktop environment Weston? Or do some of the others claiming to use Wayland actually work? Whenever I install them and query which type it is using, it either says X11 or an indeterminate answer.

Fedora 28 by default ships a Wayland based Gnome session, which works pretty well out of the box. I would recommend to try that one. I have not tried it with an eGPU yet because I'm not sure if it would work with an AMD card and couldn't be bothered to buy any NVidia hardware. Still, I will try to answer your questions.

Also, how do I set Linux up with no trace of X11 remaining?

If you run a Wayland session, it is still a good thing to have XWayland around for clients that don't support Wayland natively. It depends on your desktop environment. Most DEs these days either support just X, or both Wayland and X, in which case they still need to link against the X libraries. Currently work is underway in Gnome to make it possible to completely remove the X dependency, but they are not there yet.

Is there a Wayland only setup with functional desktop environment and browser yet?

Yes, Gnome on Fedora 28 definitely works. AFAIK they even have a patched Firefox that works on Wayland.

If X11 is still installed, it has a knack of interfering, in my experience.

Even on a Wayland session, it's still be nice to run XWayland for legacy clients.

Is the only desktop environment Weston? Or do some of the others claiming to use Wayland actually work?

Weston is just a reference compositor, not an actual DE. There are several DEs that support Wayland though. I've already mentioned Gnome. Later versions of KDE also support Wayland to some extent (though I'm not sure how much) and there is sway if you're looking to replace your tiling window manager. I'm sure there are others, but these are the ones that come to mind.

Whenever I install them and query which type it is using, it either says X11 or an indeterminate answer.

Not sure how you queried that. Since XWayland is by default enabled, most of your X-only apps will happily run and think they are running on an X server. I recommend this if you want to understand what Wayland is and how it is better: https://wayland.freedesktop.org/architecture.html

Hope this helps! 🙂


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 3 weeks  ago
Posts: 11
October 25, 2018 5:22 pm  

@karatekid430 I realize that your post is about a year old, but I wast pointed to it by @nu_ninja.

I've been struggling to get my RX 580 working as an eGPU on linux.

See the forum post here: https://egpu.io/forums/thunderbolt-linux-setup/problems-with-amd-rx-580-akitio-node-tb3-ubuntu-18-10/

I've also opened an official bug report with amdgpu located here:
https://bugs.freedesktop.org/show_bug.cgi?id=108521

So far I've tried a variety of hacks that involved me making small changes to the kernel source code, recompiling, and testing.  I've tried completely disabling the internal Vega M GPU by commenting out the device IDs in the amdgpu driver code.  I've also tried increasing the amount of time that amdgpu allocates to initialize the atom bios (increasing from 5 seoconds to 15 seconds).

Recently we discovered that the thunderbolt bridge on my system isn't being granted all the PCI resources that it requires, and the amdgpu developers think this is the primary reason why the eGPU initialization fails.  Here is some output from dmesg where you can see the resource issues:

[    0.436946] pci 0000:04:00.0: BAR 13: no space for [io  size 0x4000]
[    0.436947] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x4000]
[    0.436949] pci 0000:04:00.0: BAR 13: assigned [io  0xc000-0xcfff]
[    0.436950] pci 0000:04:00.0: BAR 13: [io  0xc000-0xcfff] (failed to expand by 0x3000)
[    0.436951] pci 0000:04:00.0: failed to add 3000 res[13]=[io  0xc000-0xcfff]
[    0.436955] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[    0.436956] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[    0.436957] pci 0000:05:01.0: BAR 13: no space for [io  size 0x2000]
[    0.436958] pci 0000:05:01.0: BAR 13: failed to assign [io  size 0x2000]
[    0.436959] pci 0000:05:02.0: BAR 13: assigned [io  0xc000-0xcfff]
[    0.436960] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[    0.436961] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[    0.436963] pci 0000:05:01.0: BAR 13: assigned [io  0xc000-0xcfff]
[    0.436964] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[    0.436965] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[    0.436967] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[    0.436968] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[    0.436969] pci 0000:05:02.0: BAR 13: no space for [io  size 0x1000]
[    0.436970] pci 0000:05:02.0: BAR 13: failed to assign [io  size 0x1000]
[    0.436971] pci 0000:05:01.0: BAR 13: [io  0xc000-0xcfff] (failed to expand by 0x1000)
[    0.436972] pci 0000:05:01.0: failed to add 1000 res[13]=[io  0xc000-0xcfff]

You mentioned in your post that:

But that was fixed by going into BIOS and specifying the correct amounts of PCI bridge memory and prefetchable memory correctly (took a lot of experimenting - allocating too much destabilised the system). I think in order of appearance in the BIOS menu, the quantities that worked were 256, 384, 512, 768MB. That allowed R9 Nano eGPU to work on any of the four ports, all with Intel GPU and internal AMD Radeon R9 Nano enabled (I have two cards).

Unfortunately my BIOS does not support this level of customization.  Is there any other way to specific the PCI bridge and/or prefetchable memory (perhaps using kernel boot parameters?).  I feel like your post has illuminated the root problem, which is a PCI resource issue - causing the eGPU to fail initialization.

P.S. I tried to send you a PM, but because i'm a new forum user, this is not yet allowed.

Thanks for any help and/or assistance you can offer!
Rob


ReplyQuote