AMD Radeon RX 5700 XT + Razer Core X Chroma + Dell XPS 13 9370 = GPU Detected, but fails to initialize
I'm struggling to get my eGPU to initialize in Ubuntu 18.04.4. I would really appreciate any insight.
Here is where I'm at:
- Clean install of Ubuntu 18.04.4
- Install latest updates
- Authorize Core X Chroma TB3 connection
- Install amdgpu drivers from AMD (I've tried both the open and pro versions)
No matter what I do, I always get the same error messages in dmesg.
$ dmesg |grep -e drm -e amdgpu
[ 2.656730] fb0: switching to inteldrmfb from EFI VGA
[ 2.661262] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.661264] [drm] Driver supports precise vblank timestamp query.
[ 2.664460] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[ 2.683112] [drm] Initialized i915 1.6.0 20190619 for 0000:00:02.0 on minor 0
[ 2.709320] fbcon: i915drmfb (fb0) is primary device
[ 2.709405] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[ 11.246638] [drm] amdgpu kernel modesetting enabled.
[ 11.246639] [drm] amdgpu version: 18.104.22.168.10
[ 11.246640] [drm] OS DRM version: 5.3.0
[ 11.248822] amdgpu 0000:3e:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x80000000 -> 0x8fffffff
[ 11.248823] amdgpu 0000:3e:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x90000000 -> 0x901fffff
[ 11.248824] amdgpu 0000:3e:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xc4100000 -> 0xc417ffff
[ 11.248840] amdgpu 0000:3e:00.0: enabling device (0006 -> 0007)
[ 11.249026] [drm] initializing kernel modesetting (NAVI10 0x1002:0x731F 0x1682:0x5701 0xC1).
[ 11.249883] [drm] register mmio base: 0xC4100000
[ 11.249883] [drm] register mmio size: 524288
[ 11.249898] [drm] PCIE atomic ops is not supported
[ 11.250007] [drm:amdgpu_discovery_init [amdgpu]] *ERROR* invalid ip discovery binary signature
[ 11.250102] amdgpu 0000:3e:00.0: amdgpu_discovery_init failed
[ 11.250162] amdgpu 0000:3e:00.0: Fatal error during GPU init
[ 11.250227] [drm] amdgpu: finishing device.
[ 11.250368] amdgpu: probe of 0000:3e:00.0 failed with error -22
I'm not sure if the "PCIE atomic ops is not supported" message is relevant, but I've found nothing searching on that or "invalid ip discovery binary signature".
Hoping someone here can shed some light on this.
Thanks in advance.
Is it possibly a firmware issue?
While the egpu is connected, can you try
sudo rmmod amdgpu
sudo modprobe amdgpu
And see it that loads the module corrtectly. If that doesn't work, you might need try a patched kernel
Hi All, I've learned several things since my original post. I'll share here with the hopes it will help others.
First of all, my original issue was in fact a BIOS issue. In the XPS 13's bios, there are three options that can be enabled for Thunderbolt:
- Enable Thunderbolt Technology Support
- Enable Thunderbolt Adapter Boot Support
- Enable Thunderbolt Adapter Pre-boot Modules
At this point I'm not sure if it was number 2 or number 3, but one of those two was not checked. In that state, Windows could use the eGPU, but linux could not. You can see above that linux was at least partially seeing the card, but not enough to initialize it. When all three options were enabled the error "[drm:amdgpu_discovery_init [amdgpu]] *ERROR* invalid ip discovery binary signature" went away.
Now I also had some kernel compatibility issues. I was still getting "[ 11.250368] amdgpu: probe of 0000:3e:00.0 failed with error -22" while working with Ubuntu 20.04. With the above fix in place, I reverted back to Ubuntu 1804 HWE4 and the eGPU worked great out of the box. When I installed Ubuntu 20.04 again, the "error -22" came back. I had to upgrade from the 5.4 kernel that ships with Ubuntu 20.04 to a 5.5 or newer kernel to fix the issue.
Hope this helps!