Setup & Software Discussions
Problems with AMD RX 580 + Akitio Node (TB3) + Ubuntu 18.10
 

Problems with AMD RX 580 + Akitio Node (TB3) + Ubuntu 18.10  

  RSS

rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Hello All,

Here's my system information
System: Dell XPS 15 9575 2 in 1
Built in GPUs: Intel iGPU, Vega M
OS: Ubuntu 18.10
Kernel: 4.18
eGPU: RX 580

Unfortunately I'm struggling to get my RX 580 working correctly as an eGPU on my Ubuntu 18.10 based system.

First of all I've been able to successfully get the Akitio Node authorized as a Thunderbolt Device.  The new 4.18 kernel makes this trivial as Ubuntu will prompt you to authenticate the new TB device as soon as you plug it in.  In addition, the Akitio Node *and* the RX 580 are visible in lspci:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 05)
00:13.0 Non-VGA unclassified device: Intel Corporation 100 Series/C230 Series Chipset Family Integrated Sensor Hub (rev 31)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 (rev 31)
00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #1 (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation QM175 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 [Radeon RX Vega M GL] (rev c0)
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
06:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step) [Alpine Ridge 4C 2016] (rev 02)
07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015]
08:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580]

Here's the detailed information for the RX 580:

09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) (prog-if 00 [VGA controller])
Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX 470/480/570/570X/580/580X]
Flags: fast devsel, IRQ 18
Memory at 2fb0000000 (64-bit, prefetchable) [size=256M]
Memory at 2fc0000000 (64-bit, prefetchable) [size=2M]
I/O ports at 2000 [size=256]
Memory at bc000000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at bc040000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Capabilities: [370] L1 PM Substates
Kernel modules: amdgpu

If I look at dmesg I see some distressing entries in the system logs:[ 8.534250] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)

[    8.534756] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1682:0xC580 0xE7).
[    8.537567] [drm] register mmio base: 0xBC000000
[    8.537568] [drm] register mmio size: 262144
[    8.537598] [drm] add ip block number 0 <vi_common>
[    8.537599] [drm] add ip block number 1 <gmc_v8_0>
[    8.537599] [drm] add ip block number 2 <tonga_ih>
[    8.537599] [drm] add ip block number 3 <powerplay>
[    8.537600] [drm] add ip block number 4 <dm>
[    8.537600] [drm] add ip block number 5 <gfx_v8_0>
[    8.537601] [drm] add ip block number 6 <sdma_v3_0>
[    8.537602] [drm] add ip block number 7 <uvd_v6_0>
[    8.537602] [drm] add ip block number 8 <vce_v3_0>
[    8.537608] kfd kfd: skipped device 1002:67df, PCI rejects atomics
[    8.537630] [drm] UVD is enabled in VM mode
[    8.537630] [drm] UVD ENC is enabled in VM mode
[    8.537636] [drm] VCE enabled in VM mode
[    8.614467] ATOM BIOS: 401815-171128-QS1
[    8.614512] [drm] GPU posting now...
[   13.621276] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting
[   13.621310] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing E650 (len 187, WS 0, PS 4) @ 0xE6FA
[   13.621341] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C53A (len 193, WS 4, PS 4) @ 0xC569
[   13.621359] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C410 (len 114, WS 0, PS 8) @ 0xC47C
[   13.621361] amdgpu 0000:09:00.0: gpu post error!
[   13.621363] amdgpu 0000:09:00.0: Fatal error during GPU init
[   13.621370] [drm] amdgpu: finishing device.
[   13.621792] amdgpu: probe of 0000:09:00.0 failed with error -22

I noticed a couple other people have posted about similar problems:

https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210
https://egpu.io/forums/thunderbolt-linux-setup/egpus-under-linux-an-advanced-guide/#post-33304

I've an official bug report for amdgpu here:

https://bugs.freedesktop.org/show_bug.cgi?id=108521

If anybody has any suggestions they would be greatly appreciated!

On a final note, Ubuntu 18.10 ships with Kernel 4.18 but I've also tried 4.19 and I'm experiencing the same problems.

Thanks!
Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
Topic Tags
nu_ninja
(@nu_ninja)
Estimable Member
Joined: 1 year ago
 

I'd start by configuring x with just the external display active, something like just this in /etc/X11/xorg.conf.d/

Section "Device"
     Identifier "AMD"
     Driver "amdgpu"
     BusID "PCI:10:0:0" ##ID in decimal, convert from hex if necessary
     Option "AllowEmptyInitialConfiguration"
     Option "AllowExternalGpus"
EndSection

then if that works try setting up a config with the internal screen bound to the iGPU and external as primary with eGPU (I posted a config file in this post). I've not had any problems with the amdgpu driver once it's properly configured in x with the bus id and external display.

Mid-2012 13" Macbook Pro (MacBookPro9,2) TB1 -> RX 460/560 (AKiTiO Node/Thunder2)
+ macOS 10.14+Win10
+ Linux Mint 19.1


itsage liked
ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Hi Ninja, thanks for the response!

I actually was following your guides pretty closely and did try a variety of different xorg config files, all with no luck unfortunately.

I think the issue is at the kernel level, as my dmesg output shows a failure trying to initialize the eGPU.  My guess (more of a hunch really) is that there is a problem because my laptop actually has 2 GPUs to start with, the Intel iGPU (915) and the Vega M GPU (amdgpu).  The eGPU would be a third GPU that also uses the amdgpu kernel drivers.  Perhaps there is a conflict with this? Right now I'm trying to see if there's someway to completely disable the Vega M discrete GPU (via kernel boot parameters) to test out this theory.

I'm wondering if you wouldn't mind sharing the dmesg output from your system after you've plugged in the eGPU via TB3.  I'd love to see if you see similar errors, or if the eGPU is initialized successfully.

Thanks!
Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
nu_ninja
(@nu_ninja)
Estimable Member
Joined: 1 year ago
 

Ok, I attached the relevant parts of my dmesg output. Looks like I'm not getting the atom bios entries, maybe because I'm using an older card? That part of the code is obviously the problem, but I'm not sure if you could change that.

This might sound crazy but just for testing you could try and create a device section for the dgpu and deliberately give it the wrong driver like the nvidia nouveau driver to make sure it doesn't use the amdgpu driver.

This post was modified 9 months ago

Mid-2012 13" Macbook Pro (MacBookPro9,2) TB1 -> RX 460/560 (AKiTiO Node/Thunder2)
+ macOS 10.14+Win10
+ Linux Mint 19.1


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Thanks for the response!

I've been fooling around with a ton of different Xorg config files, and I really think it's less about configuration, and more about the lower level amdgpu kernel driver.

My hunch right now is that the Vega M (and specifically it's power management) are somehow interfering with the initialization of the new eGPU.  I've been trying to figure out a way to completely disable the Vega M at boot to see if the eGPU will work, but I've been struggling to accomplish this.

I found some interesting posts on using the pcistub kernel module to "reserve" a device so that it can't be initialized by the amdgpu module, but unfortunately this doesn't appear to work correctly.

See here:
https://superuser.com/questions/503697/prevent-radeon-driver-from-attaching-to-specific-pci-devices
https://superuser.com/questions/914810/how-to-disable-a-plugged-in-pci-e-graphic-card-on-os-level

I also tried to disable the Vega M in my BIOS but unfortunately that's not an option.

I'm gonna see what the amdgpu devs come up with.

Thanks for your response again!

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
nu_ninja
(@nu_ninja)
Estimable Member
Joined: 1 year ago
 

Yeah looking into it I agree its a deeper problem than x. Probably good to see what the devs say. One thing you might have already tried or not; setting amdgpu.dc=0 as a kernel parameter, since it seems to be on by default starting with vega cards per this article.

Mid-2012 13" Macbook Pro (MacBookPro9,2) TB1 -> RX 460/560 (AKiTiO Node/Thunder2)
+ macOS 10.14+Win10
+ Linux Mint 19.1


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Thanks for the suggestion.  I tried adding amdgpu.dc=0 to my kernel boot parameters, but unfortunately amdgpu still appeared to bind to the Vega M, and the eGPU still fails to initialize.  I think at this point I'm gonna wait and hear back from the kernel developers and see what they say.

I'm still trying to figure out a way to completely disable the Vega M GPU, but nothing I've tried seems to work.

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

For those of you that are interested, an amdgpu developer advised me to comment out the device IDs for Vega M in the kernel source (using 4.19) located here: /drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

You can see the full source for this file here:
https://elixir.bootlin.com/linux/v4....u/amdgpu_drv.c

These are the lines in question:
/* VEGAM */
{0x1002, 0x694C, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGAM},
{0x1002, 0x694E, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGAM},

This did indeed cause my Vega M to not be initialized, *but* the problem I'm having with the eGPU remains. So it appears my hunch that the Vega M is interfering with the eGPU initialization was incorrect, and I'm back to square one...

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
nu_ninja
(@nu_ninja)
Estimable Member
Joined: 1 year ago
 

@rstrube

See this post particularly 1) at the bottom. Seems this may have been @karatekid430 's workaround.

Mid-2012 13" Macbook Pro (MacBookPro9,2) TB1 -> RX 460/560 (AKiTiO Node/Thunder2)
+ macOS 10.14+Win10
+ Linux Mint 19.1


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Thanks for the heads up about that post!  I'm actually just responded to that thread and I agree that the information seems very relevant to the problems I'm experiencing!

Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

So after many many hours of debugging and trying different things I finally figured it out.  There is a a bug with acpi enabled which causes the Thunderbolt PCI bridges not to receive their proper resources.  If I disabled acpi via the kernel boot parameter:

acpi=off

Then the eGPU get's correctly initialized! One side affect is that the Vega M GPU is completely disabled with acpi=off.

It looks like a linux ACPI thunderbolt bug, or at least a bug with the XPS 9575 BIOS when acpi is enabled.

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


h4wk590 and nu_ninja liked
ReplyQuote
nu_ninja
(@nu_ninja)
Estimable Member
Joined: 1 year ago
 

Awesome! That sounds like a pretty big bug to squash.

Mid-2012 13" Macbook Pro (MacBookPro9,2) TB1 -> RX 460/560 (AKiTiO Node/Thunder2)
+ macOS 10.14+Win10
+ Linux Mint 19.1


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

@nu_ninja
So I've confirmed that I'm using my eGPU, but any games I run stutter like crazy every couple of seconds.  Any suggestions for what I should look at to improve performance.  I used your Xorg.config file to eliminate everything but the AMD GPU for Xorg.

Thanks for any suggestions!
Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
(@timur_kristof)
Active Member
Joined: 1 year ago
 
Posted by: rstrube

@nu_ninja
So I've confirmed that I'm using my eGPU, but any games I run stutter like crazy every couple of seconds.  Any suggestions for what I should look at to improve performance.  I used your Xorg.config file to eliminate everything but the AMD GPU for Xorg.

Thanks for any suggestions!
Rob

I've got an XPS 13 9370 here, and using an RX 570 with a Zotac AMP box mini. The hardware seems to be pretty similar to what you have, the main difference of course being that I don't have the Vega GPU. However since you already confirmed that isn't causing the problem, hope you don't mind my 20 cents.

This latest finding that you have sounds to me more like a Thunderbolt related bug that doesn't have anything to do with AMD. I know this is trivial, but can you check if you have the latest Thunderbolt firmware (also known as NVM) on both your laptop and your eGPU enclosure? Not sure if your device is supported by LVFS so you may have to install windows to update that firmware. While you are at it also check if your laptop has the latest bios. If all firmware is up to date and the issue is still there I would suggest to write an email on the linux-usb mailing list (that seems to be the place where the thunderbolt devs are), there are a bunch of helpful Intel guys there who maybe can help you out. This may very well be a bug in the thunderbolt driver.

With regards to stuttering and low performance. You didn't say it here but you did mention in your freedesktop bug report that you are booting with amdgpu.dpm=0 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 which means you totally destroy all power management features. When amdgpu.dpm=0 is there, then it basically doesn't do any power management, effectively just letting your graphics card sit at the same frequencies that it had when you booted it up. (That is 300 MHz on my RX 570, should be in the same ballpark for your 580.) Did you try without that? You shouldn't need any of those other parameters either, by the way.

Also, acpi=off is surely not a proper long term solution, because it has too many side effects.

While we are at it, does the eGPU setup work correctly on windows?

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


itsage liked
ReplyQuote
(@timur_kristof)
Active Member
Joined: 1 year ago
 
Posted by: nu_ninja

I'd start by configuring x with just the external display active, something like just this in /etc/X11/xorg.conf.d/

     Option "AllowEmptyInitialConfiguration"

As far as I understand "AllowExternalGpus" is specific to the nvidia proprietary driver, and will not have any effect on an AMD card.

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Hi Guys,

So unfortunately this *does* appear to be a BIOS bug with the Dell XPS 9575 that prevents the Thunderbolt 3 PCI bridge from receiving the proper PCI resources.  Adding acpi=off to the kernel boot parameters just creates a scenario where the Vega M resources are somehow made available to the eGPU, allowing it to become initialized.  Even so one of the Thunderbolt 3 PCI bridges still doesn't have the necessary PCI resources (specifically device 0000:05:02.0), which is probably causing the extreme performance problems that I'm having.

I've opened up an official bug with the ACPI BIOS kernel developers here: https://bugzilla.kernel.org/show_bug.cgi?id=201527 but this really needs to be solved at the BIOS level.  Perhaps they have a direct line of communication to the Dell engineers?

Thanks for all the suggestions, for now it appears that eGPUs on the Dell XPS 9575 are a no go on linux, at least until the BIOS issues are fixed!

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


itsage liked
ReplyQuote
(@timur_kristof)
Active Member
Joined: 1 year ago
 
Posted by: rstrube

Thanks for all the suggestions, for now it appears that eGPUs on the Dell XPS 9575 are a no go on linux, at least until the BIOS issues are fixed!

Does that mean that you tested and the eGPU doesn't work on Windows either?

Even so one of the Thunderbolt 3 PCI bridges still doesn't have the necessary PCI resources (specifically device 0000:05:02.0), which is probably causing the extreme performance problems that I'm having.

I'm pretty sure that at least some of your performance problems came from those amdgpu kernel parameters where you disabled the power management entirely.

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote
MacFreekDotKext
(@jatechnology)
Eminent Member
Joined: 2 years ago
 

Has anyone gotten Thunderbolt 3 Radeon eGPU working on Linux yet? Very interested

I have yet to list my system & eGPU hardware or link a build guide in my signature. I will do so soon to give context to my posts.


ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

Hi @timur_kristof,

Apologies for the late reply, I've been posting about this issue on several other forums, on reddit, on the kernel mailing lists, etc. but I neglected to check back on this thread...

To answer your questions, I did update my TB firmware to NVM 36.  The output from my:

fwupdmgr get-devices

for that device is:

XPS 9575 Thunderbolt Controller
  DeviceId:             069ac71f347e92d158f2c211cca10d52a19e2d41
  Guid:                 8926f505-8219-5d6c-969a-e927534113fb
  Summary:              Unmatched performance for high-speed I/O
  Plugin:               thunderbolt
  Flags:                internal|updatable|supported|registered
  Vendor:               Dell
  VendorId:             TBT:0x00D4
  Version:              36.00
  Icon:                 computer
  Created:              2018-11-04

It's an excellent suggestion - and I wanted to rule out the TB firmware causing the problem.  Unfortunately, this *did not* solve the problem.

I've been keeping this thread Manjaro Linux forums up to date with additional information: https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210

Here's a reddit post related to this issue: https://www.reddit.com/r/Dell/comments/9u61lm/question_for_dell_i_believe_ive_discovered_dell/

I've also opened up an ACPI kernel bug (although to be honest it might be a Dell BIOS bug) here: https://bugzilla.kernel.org/show_bug.cgi?id=201527

The rationale for the kernel bug report is that sometimes kernel developers can work around buggy ACPI BIOS implementations - even though if this really is caused by a BIOS bug, it should probably be solved upstream by Dell.

The current theory among some of the kernel developers is that the some of the TB to PCI bridges are not receiving the necessary BAR resources.  This causes the card to fail initialization.  Disabling ACPI is really a hack - it bypasses using some of the ACPI information the BIOS provides, allowing the TB to PCI bridges to get more? resources - not sure if they get all the required resources that they need - but enough for the card to get initialized.  Here are some of the relevant details from my dmesg logs that demonstrate the problem.  Note I'm currently on Kernel 4.19.4, but I saw the same problems with Kernels 4.18.x

PCI resource allocation issues:

Note: devices 0000:04:00.0, 0000:05:00.0, 0000:05:01.0, 0000:05:02.0, and 0000:05:04.0 are all Thunderbolt PCI bridges, but device 0000:05:02.0 seems to be the problematic one.

[  152.673753] pci_bus 0000:05: Allocating resources
[  152.673792] pci 0000:05:01.0: bridge window [io  0x1000-0x0fff] to [bus 07-39] add_size 1000
[  152.673802] pci 0000:05:02.0: bridge window [io  0x1000-0x0fff] to [bus 3a] add_size 1000
[  152.673803] pci 0000:05:02.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 3a] add_size 200000 add_align 100000
[  152.673813] pci 0000:05:04.0: bridge window [io  0x1000-0x0fff] to [bus 3b-6e] add_size 1000
[  152.673823] pci 0000:04:00.0: bridge window [io  0x1000-0x0fff] to [bus 05-6e] add_size 3000
[  152.673825] pci 0000:04:00.0: BAR 13: assigned [io  0x2000-0x4fff]
[  152.673829] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[  152.673830] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[  152.673831] pci 0000:05:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[  152.673832] pci 0000:05:02.0: BAR 13: assigned [io  0x3000-0x3fff]
[  152.673832] pci 0000:05:04.0: BAR 13: assigned [io  0x4000-0x4fff]
[  152.673834] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[  152.673835] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[  152.673837] pci 0000:05:00.0: PCI bridge to [bus 06]
[  152.673842] pci 0000:05:00.0:   bridge window [mem 0xea000000-0xea0fffff]
[  152.673852] pci 0000:05:01.0: PCI bridge to [bus 07-39]
[  152.673854] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[  152.673859] pci 0000:05:01.0:   bridge window [mem 0xbc000000-0xd3efffff]
[  152.673863] pci 0000:05:01.0:   bridge window [mem 0x2fb0000000-0x2fcfffffff 64bit pref]
[  152.673870] pci 0000:05:02.0: PCI bridge to [bus 3a]
[  152.673872] pci 0000:05:02.0:   bridge window [io  0x3000-0x3fff]
[  152.673877] pci 0000:05:02.0:   bridge window [mem 0xd3f00000-0xd3ffffff]
[  152.673887] pci 0000:05:04.0: PCI bridge to [bus 3b-6e]
[  152.673889] pci 0000:05:04.0:   bridge window [io  0x4000-0x4fff]
[  152.673894] pci 0000:05:04.0:   bridge window [mem 0xd4000000-0xe9ffffff]
[  152.673898] pci 0000:05:04.0:   bridge window [mem 0x2fd0000000-0x2ff9ffffff 64bit pref]
[  152.673904] pci 0000:04:00.0: PCI bridge to [bus 05-6e]
[  152.673906] pci 0000:04:00.0:   bridge window [io  0x2000-0x4fff]
[  152.673912] pci 0000:04:00.0:   bridge window [mem 0xbc000000-0xea0fffff]
[  152.673915] pci 0000:04:00.0:   bridge window [mem 0x2fb0000000-0x2ff9ffffff 64bit pref]

It also appears that pcieport has PCI resource allocation issues:

[  193.946376] thunderbolt 0000:06:00.0: stopping RX ring 0
[  193.946388] thunderbolt 0000:06:00.0: disabling interrupt at register 0x38200 bit 12 (0xffffffff -> 0xffffefff)
[  193.946404] thunderbolt 0000:06:00.0: stopping TX ring 0
[  193.946413] thunderbolt 0000:06:00.0: disabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xfffffffe)
[  193.946421] thunderbolt 0000:06:00.0: control channel stopped
[  193.946516] thunderbolt 0000:06:00.0: freeing RX ring 0
[  193.946527] thunderbolt 0000:06:00.0: freeing TX ring 0
[  193.946542] thunderbolt 0000:06:00.0: shutdown
[  193.985339] pci_bus 0000:05: Allocating resources
[  193.985415] pcieport 0000:05:02.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 3a] add_size 200000 add_align 100000
[  193.985458] pcieport 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[  193.985462] pcieport 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[  193.985470] pcieport 0000:05:02.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[  193.985473] pcieport 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[  198.333956] pcieport 0000:05:00.0: Refused to change power state, currently in D3

I've reached out to Dell support, and they assigned somebody to help out, but their first question is whether or not the problem exists on Windows 10.  In the meantime I've actually returned my RX 580 (I've kept the Aikido Node in the hopes that one day these issues will get resolved).  There's one other person (@adnans) on the Manjaro Linux forums that also has an XPS 9575 + and RX 580 so I'm hoping he can do some Windows testing and report back.

It's possible that Dell worked around some of the BIOS bugs in their Thunderbolt Windows Drivers.  It's also possible that this really is a Linux kernel Thunderbolt bug.

I'll try to do a better job keeping this thread up to date with additional information.  Thanks again for your reply, I really appreciate your suggestions!

Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


itsage liked
ReplyQuote
rstrube
(@rstrube)
Active Member
Joined: 9 months ago
 

UPDATE:

An ACPI kernel developer got back to me and mentioned that the PCI resource allocation issues that are present in my dmesg are not actually a problem.  This is contradictory to what the AMD amdgpu developers thought was the root cause of the problem with using the RX 580 as an eGPU.

For those of you that are interested, here's the kernel bug report: https://bugzilla.kernel.org/show_bug.cgi?id=201527

Thanks!
Rob

Pending: Add my system information and expected eGPU configuration to my signature to give context to my posts


ReplyQuote