MacPro6,1 (Black Can) dual D700, Akitio w/GTX 970 works with MacOS 10.11.6, NV driver loads with linux but nvidia-smi fails. Kernel 4.13.0-16-generic Ubuntu 17.10  

 

(@jerry_normandin)
Active Member
Joined:5 months  ago
Posts: 10
October 20, 2017 9:52 pm  

Hi,

    I am trying to configure my Akitio w/GTX 970 egpu as a compute node.   Here’s my grub command :

linux /boot/vmlinuz-4.13.0-16-generic.efi.signed root=UUID=4b899c05-6c71-44ee-a6a1-a813bb0be34f ro intel_iommu=on pci=hpbussize=10,hpmemsize=2M,nocrs,realloc quiet splash $vt_handoff

The GTX970 has power.. in my wife’s words.. the coffee percolator is on.  (That’s what it sounds like when the fans spin at low rpms with my Zoltac GTX 970).  The nvida driver loads:

[ 7.234085] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017 (using threaded interrupts)
[ 7.246091] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 384.90 Tue Sep 19 17:05:19 PDT 2017
[ 7.736303] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:10:00.0/0000:11:02.0/0000:14:00.0/0000:15:03.0/0000:17:00.0/0000:18:01.0/0000:19:00.1/sound/card1/input28
[ 7.968418] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:10:00.0/0000:11:02.0/0000:14:00.0/0000:15:03.0/0000:17:00.0/0000:18:01.0/0000:19:00.1/sound/card1/input30
[ 7.968511] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:10:00.0/0000:11:02.0/0000:14:00.0/0000:15:03.0/0000:17:00.0/0000:18:01.0/0000:19:00.1/sound/card1/input31
[ 7.968548] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:10:00.0/0000:11:02.0/0000:14:00.0/0000:15:03.0/0000:17:00.0/0000:18:01.0/0000:19:00.1/sound/card1/input32
[ 8.445871] [drm] [nvidia-drm] [GPU ID 0x00001900] Loading driver
[ 8.445872] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:19:00.0 on minor 2

However nvidia-smi fails.  after typing dmesg I see:

[ 947.752114] NVRM: RmInitAdapter failed! (0x26:0xffff:1113)
[ 947.752172] NVRM: rm_init_adapter failed for device bearing minor number 0

I’m pretty sure the cause of the driver not functioning is:

[ 0.663140] pci 0000:a3:00.0: BAR 14: no space for [mem size 0x00400000] [ 0.663140] pci 0000:a3:00.0: BAR 14: failed to assign [mem size 0x00400000] [ 0.663141] pci 0000:a3:03.0: BAR 14: no space for [mem size 0x00c00000] [ 0.663142] pci 0000:a3:03.0: BAR 14: failed to assign [mem size 0x00c00000] [ 0.663143] pci 0000:a3:04.0: BAR 14: no space for [mem size 0x00400000] [ 0.663143] pci 0000:a3:04.0: BAR 14: failed to assign [mem size 0x00400000] [ 0.663144] pci 0000:a3:05.0: BAR 14: no space for [mem size 0x00c00000] [ 0.663145] pci 0000:a3:05.0: BAR 14: failed to assign [mem size 0x00c00000] [ 0.663146] pci 0000:a3:06.0: BAR 14: no space for [mem size 0x00400000] [ 0.663146] pci 0000:a3:06.0: BAR 14: failed to assign [mem size 0x00400000] [ 0.663147] pci 0000:a3:00.0: BAR 13: no space for [io size 0x1000] [ 0.663148] pci 0000:a3:00.0: BAR 13: failed to assign [io size 0x1000] [ 0.663149] pci 0000:a3:03.0: BAR 13: no space for [io size 0x8000] [ 0.663149] pci 0000:a3:03.0: BAR 13: failed to assign [io size 0x8000] [ 0.663150] pci 0000:a3:04.0: BAR 13: no space for [io size 0x1000] [ 0.663151] pci 0000:a3:04.0: BAR 13: failed to assign [io size 0x1000] [ 0.663152] pci 0000:a3:05.0: BAR 13: no space for [io size 0x7000] [ 0.663152] pci 0000:a3:05.0: BAR 13: failed to assign [io size 0x7000] [ 0.663153] pci 0000:a3:06.0: BAR 13: no space for [io size 0x1000] [ 0.663154] pci 0000:a3:06.0: BAR 13: failed to assign [io size 0x1000] [ 0.663155] pci 0000:a4:00.0: BAR 0: no space for [mem size 0x00040000] [ 0.663156] pci 0000:a4:00.0: BAR 0: failed to assign [mem size 0x00040000] [ 0.663157] pci 0000:a4:00.0: BAR 1: no space for [mem size 0x00001000] [ 0.663157] pci 0000:a4:00.0: BAR 1: failed to assign [mem size 0x00001000]

How do I fix this ?

 

 


ReplyQuote
CudaSandbox
(@cudasandbox)
New Member
Joined:2 months  ago
Posts: 1
January 4, 2018 8:48 pm  

Hi Jerry,

I am experiencing the same issue with an external Nvidia GTX 1070 (Aorus) on a macbook pro 2013 running Ubuntu 17.

I tried a lot of fixes so far, including pci=noCRS and even updating DSDT tables and recompiling the kernel but nothing worked…

Did you manage to find a solution to this problem?

Thanks,

M


ReplyQuote
  
Working

Please Login or Register