enable-baffin-CUs.sh script  

 

goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130
December 18, 2016 9:29 pm  

Thanks to findings of okrasit and Fl0r!an, I was intrigued to write a small script that unleashes the full power of R9 Nano and RX 480 for calculation tasks on macOS 10.12.2. Nearly doubled clpeak single-precision compute results (GFLOPS) and Indigo Renderer Benchmark score.

The script is written only for experimental and educational purposes, I don’t take any responsibility if something goes wrong, and since this is a binary hack, I don’t see any continuity.

A word of warning, please do not edit PP_DisablePowerContainment key. It doubled OpenCL float4-float16 OpenCL computing performance with the RX 480, but it can also fry your Akitio. The total power consumption of RX 480 eGPU exceeded 190W! (I use custom boards).

 

http://www.insanelymac.com/forum/topic/313977-r9-nano/?p=2332854
https://www.tonymacx86.com/threads/enable-all-cores-r9-fury-cards.209892/#post-1393445


RX 480 (clpeak):

Device: AMD Radeon HD Baffin Unknown Prototype Compute Engine
    Driver version  : 1.2 (Dec  9 2016 21:43:55) (Macintosh)
    Compute units   : 36
    Clock frequency : 1266 MHz

    Global memory bandwidth (GBPS)
      float   : 202.40
      float2  : 212.20
      float4  : 215.84
      float8  : 129.97
      float16 : 60.01

    Single-precision compute (GFLOPS)
      float   : 5644.43
      float2  : 5634.59
      float4  : 5608.17
      float8  : 5574.38
      float16 : 5517.69

R9 Nano (clpeak):

Device: AMD Radeon HD Baffin Unknown Prototype Compute Engine
    Driver version  : 1.2 (Dec  9 2016 21:43:55) (Macintosh)
    Compute units   : 64
    Clock frequency : 1000 MHz

    Global memory bandwidth (GBPS)
      float   : 402.93
      float2  : 439.91
      float4  : 419.14
      float8  : 196.41
      float16 : 109.40

    Single-precision compute (GFLOPS)
      float   : 7180.09
      float2  : 6990.68
      float4  : 6820.10
      float8  : 6769.77
      float16 : 6657.53

chmod +x enable-baffin-CUs.sh

For R9 Nano:

sudo ./enable-baffin-CUs.sh fiji 64 

For RX 480:

sudo ./enable-baffin-CUs.sh ellesmere 36

#!/bin/sh
#
# Script (enable-baffin-CUs.sh) by Goalque (goalque@gmail.com)
# Credit to okrasit and Fl0r!an:
# http://www.insanelymac.com/forum/topic/313977-r9-nano/?p=2332854
# https://www.tonymacx86.com/threads/enable-all-cores-r9-fury-cards.209892/#post-1393445

first_argument="$1"
second_argument="$2"
init_function=""
CU_count=""

pattern1="s/(\x48\xB8)(\x02)(\x00\x00\x00\x01\x00\x00\x00\x48\x89\x43\x54\xC7\x43\x7C)(\x08)(\x00\x00\x00)"

pattern2="s/(\x0F\x42\xC8)(\x89\x8B\x80\x00\x00\x00\x44\x88\xB3\x99\x00\x00\x00\x44\x88\x73\x20)"

pattern3="s/(\xE8)(\x49\x85\xFE\xFF)(\xBE\x48\x01\x00\x00\x4C\x89\xF7)"

if [[ "$first_argument" == "ellesmere" ]]
then
init_function="\x46\xE4\x00\x00"
elif [[ "$first_argument" == "fiji" ]]
then
init_function="\x73\x02\x01\x00"
elif [[ "$first_argument" == "baffin" ]]
then
init_function="\x49\x85\xFE\xFF"
fi

if [[ "$second_argument" == 36 ]]
then
CU_count="\x04\3\x12\5"
elif [[ "$second_argument" == 64 ]]
then
CU_count="\x04\3\x20\5"
fi

if [[ "$init_function" != "" ]] && [[ "$CU_count" != "" ]]
then

rsync -a /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100 /tmp/AMDRadeonX4100
cat /tmp/AMDRadeonX4100 | perl -pe "$pattern1"$"/\1"$CU_count"/g" | perl -pe "$pattern2"$"/\x90\x90\x90\2/g" | perl -pe $pattern3$"/\1"$init_function"\3/g" > /tmp/AMDRadeonX4100_modified

rsync -a --delete /tmp/AMDRadeonX4100_modified /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

chown -R root:wheel /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

chmod -R 755 /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

rm /Volumes/Macintosh\ HD/System/Library/PrelinkedKernels/prelinkedkernel 2>/dev/null

rm /Volumes/Macintosh\ HD/System/Library/Caches/com.apple.kext.caches/Startup/kernelcache 2>/dev/null

touch /System/Library/Extensions
echo "Rebuilding caches..."
kextcache -q -update-volume /Volumes/Macintosh\ HD
echo "Ready."
else
echo "Invalid parameters."
fi
IndigoBench RX480 32CUs
R9 Nano 64CUs

automate-eGPU EFIapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10


Daelin, FricoRico, ikir and 2 people liked
ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
December 18, 2016 9:32 pm  

Thank you Goalque! I assume this would work for Mac Pro tower and Hackintosh users as well?

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130

theitsage liked
ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
December 18, 2016 9:49 pm  

Thank you! If you don't mind, I'd like to share it with the Mac Pro community. I know a lot of Mac Pro tower users who have been sitting on the sideline waiting for full driver support for Polaris GPUs. With this script, it's as close to official driver as one could hope for. 

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130
December 18, 2016 10:30 pm  

Yep, no problem. Remember that this script does the CU count patch only, nothing else. Who knows, maybe we’ll have a main course after an appetizer 😉

automate-eGPU EFIapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10


nando4 and theitsage liked
ReplyQuote
FricoRico
(@fricorico)
Eminent Member
Joined:2 years  ago
Posts: 40
December 19, 2016 4:07 pm  

What is up with the cores limit exactly? Is it a Mac OS generic lock, or only locked because the Kexts are intended for different GPUs?

I take it this does not improve Metal/OpenGL performance?


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130
December 19, 2016 7:44 pm  

That’s a good question. There are a couple of values in hardware initialization functions. I guess the 16 CU limit is there because officially announced Polaris-based GPUs cannot utilize 64. Apple has quietly improved AMD drivers and added new device ids. Fiji is added purposely. This is a signal that they want to support a large variety of AMD GPUs.

I don’t know if this has an effect on shaders in OpenGL or Metal, probably not. Valley benchmark score did not seem to be affected.

automate-eGPU EFIapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10


nando4 liked
ReplyQuote
FricoRico
(@fricorico)
Eminent Member
Joined:2 years  ago
Posts: 40
December 20, 2016 9:41 am  

As AmandTech pointed out, an AMD Compute Unit is made out of 4 SIMDs, where many CU's make the base of the GPU. Then everything is put through the Pixel Pipeline. I can't imagine those would be either limited or a bottleneck for the AMD RX48o. I would expect to see a graphics performance increase as well, but maybe the Thunderbolt 2 connection is limiting the GPU in heavy graphic computations.

I will run some benchmarks with both Metal as OpenGL, where Metal really has a huge performance increase for eGPUs (some games over 3 times the performance), probably because of a smaller bandwidth overhead.

Very nice find Goalque!


ikir liked
ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130
December 20, 2016 6:19 pm  

What is certain at present is that the hack has an influence on CL_DEVICE_MAX_COMPUTE_UNITS value in OpenCL’s clGetDeviceInfo call. The R9 Nano has 64 ROPs/64 CUs, and RX 480 has 32 ROPs/36 CUs.

https://www.techpowerup.com/gpudb/2735/radeon-r9-nano

https://www.techpowerup.com/gpudb/2848/radeon-rx-480

AFAIK, TB2 doesn’t bottleneck much GPGPU tasks. I’m looking forward to your results!

automate-eGPU EFIapple_set_os.efi
--
late-2016 13" Macbook Pro nTB + Vega64@32Gbps-TB3 (Netstor HL23T) + macOS & Win10
late-2016 13" Macbook Pro nTB + GTX980/RX580@32Gbps-TB3 (Netstor HL23T) + macOS10.13 & Win10


ReplyQuote
ikir
 ikir
(@ikir)
Honorable Member
Joined:2 years  ago
Posts: 736
December 22, 2016 12:08 pm  
Posted by: FricoRico

I will run some benchmarks with both Metal as OpenGL, where Metal really has a huge performance increase for eGPUs (some games over 3 times the performance), probably because of a smaller bandwidth overhead.

   

Yeah, this is the kind of news i love to read!

eGPU.it | LG 34" 4K 34UC88 curved ultrawide display
MacBook Pro 2018 Touch Bar i7 quad-core 2.7Ghz - 16GB RAM - 512GB PCIe SSD --> my Mantiz Venus extreme mod with Sapphire Nitro+ RX Vega 64


ReplyQuote
FricoRico
(@fricorico)
Eminent Member
Joined:2 years  ago
Posts: 40
December 25, 2016 12:16 pm  

I took the time to benchmark some games with the unlocked CUs, it really made no difference at all neither negative nor positive. Tested Metal en OpenGL games/benchmarks, so I think the limitation of the CUs was only noticeable in computing performance itself. This is still a great mod for people doing GPU based 3D rendering or other GPU intensive computations.


theitsage and nando4 liked
ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
January 4, 2017 4:27 am  

Thank you again for the script! I was able to enable all 64 CUs on an R9 Fury X for my Mac Pro. This GPU is currently the most powerful AMD GPU for a Mac. Vega release next week may change it.

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


jonwatso and ikir liked
ReplyQuote
Oscar J
(@oscar-j)
Active Member
Joined:2 years  ago
Posts: 6
January 19, 2017 1:02 am  

Intriguing! Were your Indigo tests with the Nano or 480? (sorry for bumping)


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined:2 years  ago
Posts: 1130

ReplyQuote
Menneisyys
(@menneisyys)
Eminent Member
Joined:2 years  ago
Posts: 35
February 24, 2017 1:22 pm  

I've executed the same test several times on my early 2013 15" rMBP + Node + Sapphire 4GB RX 480. The figures are largely the same:

- 2184...2247 (Bedroom) with the same eGPU also being the current display driver (in 4k60), 

- 1628, 1628, 1629 (still Bedroom; three tests with the eGPU being NOT the current one but driving the 4k monitor from the dGPU)

- 7136...7213 in five cases and 6172 in one case (this NOT being the card non-current) (Supercar)

I've only tested the CPU once: 0.391 with Bedroom. The 650M tests weren't finished in over 30 minutes (they didn't even get to stage 1) so I stopped them (tried three times).

Incidentally, these figures are both somewhat larger than the ones (1.928, 6.261) at  https://www.indigorenderer.com/benchmark-results?page=1&sort_desc=result_bedroom&filter=highlighted . This means these are pretty impressive results and also show

- even the TB1 bandwidth poses no problems for strictly OpenCL usage,

- not even when the external display is also driven using the same card.


ReplyQuote
Mehenn
(@mehenn)
Estimable Member
Joined:2 years  ago
Posts: 103
March 6, 2017 6:38 pm  

Hi @goalque.

I plugged in my WX5100 and tried to enable its 28 CU. Failed. LuxMark reports 14...

I started by editing the script and adding this option to it (ran it with sudo ./enable-baffin-CUs.sh ellesmere 36)

elif [[ "$second_argument" == 28 ]]

then

	CU_count="\x04\3\x0E\5"

fi

but not luck. Any suggestions?

phpSSCiNYAM

Thanks!

 


late-2016 13" Macbook Pro + GTX980@16Gbps-TB2 (TB3 to TB2 adapter) (AKiTiO Thunder2) + macOS 10.12.3
late-2016 13" MBP + Quadro M2000/FirePro WX5100/GTX750-TB3 (AKiTiO Thunder3) + macOS 10.12.3
(Custom Thunder cables)


ReplyQuote
Mehenn
(@mehenn)
Estimable Member
Joined:2 years  ago
Posts: 103

ReplyQuote
Yukikaze
(@yukikaze)
Honorable Member
Joined:2 years  ago
Posts: 756
March 6, 2017 7:16 pm  

I can't help with the OS X related stuff, but how about giving it a shot in a different OS? I don't know if you have Win bootcamped, but you could try to boot Linux off a live-USB stick, load the driver and run some benchmark. Just to make sure your issue is indeed in the SW config of things.

EDIT: It looks to me that the WX 5100 gets identified as a RX 460, and seems to be used as such. 14 is the number of CUs in the cut-down Baffin dies used for RX 460s.

My eGPU Zoo - Link to my Implementations.
Want to output 4K@60Hz out of an old system on the cheap? Read here.
"Always listen to experts. They'll tell you what can't be done, and why. Then do it."- Robert A. Heinlein, "Time Enough for Love."


ReplyQuote
Billyjb
(@billyjb)
New Member
Joined:1 year  ago
Posts: 1
March 30, 2017 1:18 am  

The IT Sage referenced this script in his blog showing how install a 470/480 in my 2010Mac Pro 5,1. My question is will this script work or can it be modified for the 470 which is has 32 CUs?

tia

Bill


ReplyQuote
mistergabs
(@mistergabs)
New Member
Joined:1 year  ago
Posts: 1
May 12, 2017 4:54 pm  

Might be the dumb one in the room, but after enabling this, my Mac Pro will immediately power down whenever I try to run a Geek Bench or touch 4K footage in Premiere - was totally fine with both the above before this. Any ideas how to DISable the Baffin CU script? Or get it working?


ReplyQuote
feki
 feki
(@feki)
New Member
Joined:1 year  ago
Posts: 2
May 13, 2017 3:19 pm  

Wow thank you! That works great in my MacPro 5.1 and my RX480. I would like to put a RX460 which lies around in there too. But is that possible? Or does that Compute Unit Hack mean the RX460 would not work? Thanks


ReplyQuote
psy-ninja
(@psy-ninja)
New Member
Joined:1 year  ago
Posts: 2
May 14, 2017 4:32 pm  

This sounds like what I need. How do I execute this script? 


ReplyQuote
feki
 feki
(@feki)
New Member
Joined:1 year  ago
Posts: 2
May 16, 2017 6:55 pm  

RX480 and RX460 do work parallel in MAcPro 5.1. DaVinci Resolve ist faster. Final Cut Pro X is glitchy 🙁


ReplyQuote
casual_g
(@casual_g)
Eminent Member
Joined:1 year  ago
Posts: 20
June 1, 2017 7:09 pm  

Maybe this is a dumb question. Do I have to restart after running the script? My clpeak reports for my RX 480:

  Device: AMD Radeon HD Baffin Unknown Prototype Compute Engine
    Driver version  : 1.2 (Dec  9 2016 21:43:55) (Macintosh)
    Compute units   : 16
    Clock frequency : 1266 MHz 

You can see it says 16 CUs.

By the way, would this script improve both graphical and compute performance? Edit: I read through the thread more closely.


ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
June 1, 2017 7:22 pm  

Yes. Restart your Mac and you'll see 36 compute units.

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
casual_g
(@casual_g)
Eminent Member
Joined:1 year  ago
Posts: 20
June 2, 2017 2:50 am  

I find that performance of the GPU is much better in Windows. Games have higher FPS.

I tried searching about performance but didn't find much comment. Upon restarting, the number of CUs went up to 36, but as previously discussed, graphical performance didn't improve much.

Is it because the OSX drivers that Apple writes for the GPUs are not up to date?


ReplyQuote
raybanner
(@raybanner)
Eminent Member
Joined:1 year  ago
Posts: 22
July 17, 2017 8:12 am  

How do I actually make use of the hack? Just put the commands into Terminal? Sorry for the newbie questions.

 

I have a RX460 and also a RX580 I want to test this hack with.


ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
July 17, 2017 2:37 pm  
Posted by: raybanner

How do I actually make use of the hack? Just put the commands into Terminal? Sorry for the newbie questions.

I have a RX460 and also a RX580 I want to test this hack with.

You don't need this script for RX 460. It's beneficial for Ellesmere and Fiji GPUs such as the RX 580 and R9 Fury in macOS 10.12 only.

To make use of the enable-baffin-CUs.sh script, navigate to the file location (assuming you saved or downloaded it in ~/Downloads folder), bless it with executable permission then run with these commands in Terminal:

cd ~/Downloads
chmod +x enable-baffin-CUs.sh
sudo ./enable-baffin-CUs.sh ellesmere 36

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
July 26, 2017 3:01 pm  
Posted by: mistergabs

Might be the dumb one in the room, but after enabling this, my Mac Pro will immediately power down whenever I try to run a Geek Bench or touch 4K footage in Premiere - was totally fine with both the above before this. Any ideas how to DISable the Baffin CU script? Or get it working?

This happens when the GPU gets insufficient amount of juice. It didn't crash your Mac Pro when only 16 CUs were running. When all 36 CUs are running, the card pulls a lot more power. I'd recommend using both PCIe booster power ports with a dual 6-pin to 8-pin adapter.

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
delving
(@delving)
New Member
Joined:1 year  ago
Posts: 3
July 28, 2017 1:22 am  

Is there any use in using this script in 10.13?


ReplyQuote
theitsage
(@itsage)
Famed Member Admin
Joined:2 years  ago
Posts: 2694
July 28, 2017 2:21 am  
Posted by: delving

Is there any use in using this script in 10.13?

There's no need to use this script in 10.12.6 and 10.13 if you have RX 470/480/570/580. The script is still useful for Fiji cards such as the R9 Fury/X andR9 Nano.

Best ultrabooks for eGPU use

eGPU enclosure buying guide

66 external GPU build guides


ReplyQuote
mac_editor
(@mac_editor)
Noble Member Moderator
Joined:1 year  ago
Posts: 1160
August 4, 2017 1:54 pm  
Posted by: theitsage
 
There's no need to use this script in 10.12.6 and 10.13 if you have RX 470/480/570/580. The script is still useful for Fiji cards such as the R9 Fury/X andR9 Nano.

Are you sure this script is not required on macOS 10.12.6? Maybe I don't remember right but when I had set up my eGPU on this release, I needed the patch for all CUs. Will test and check again.

purge-wrangler.shpurge-nvda.shset-eGPU.sh
----
Troubleshooting eGPUs on macOS
Command Line Swiss Knife
----
3 Build Guides


ReplyQuote