Notifications
Clear all

enable-baffin-CUs.sh script  

 of  4
  RSS

goalque
(@goalque)
Noble Member Admin
Joined: 4 years ago
 

Thanks to findings of okrasit and Fl0r!an, I was intrigued to write a small script that unleashes the full power of R9 Nano and RX 480 for calculation tasks on macOS 10.12.2. Nearly doubled clpeak single-precision compute results (GFLOPS) and Indigo Renderer Benchmark score.

The script is written only for experimental and educational purposes, I don’t take any responsibility if something goes wrong, and since this is a binary hack, I don’t see any continuity.

A word of warning, please do not edit PP_DisablePowerContainment key. It doubled OpenCL float4-float16 OpenCL computing performance with the RX 480, but it can also fry your Akitio. The total power consumption of RX 480 eGPU exceeded 190W! (I use custom boards).

 

http://www.insanelymac.com/forum/topic/313977-r9-nano/?p=2332854
https://www.tonymacx86.com/threads/enable-all-cores-r9-fury-cards.209892/#post-1393445


RX 480 (clpeak):

Device: AMD Radeon HD Baffin Unknown Prototype Compute Engine
    Driver version  : 1.2 (Dec  9 2016 21:43:55) (Macintosh)
    Compute units   : 36
    Clock frequency : 1266 MHz

    Global memory bandwidth (GBPS)
      float   : 202.40
      float2  : 212.20
      float4  : 215.84
      float8  : 129.97
      float16 : 60.01

    Single-precision compute (GFLOPS)
      float   : 5644.43
      float2  : 5634.59
      float4  : 5608.17
      float8  : 5574.38
      float16 : 5517.69

R9 Nano (clpeak):

Device: AMD Radeon HD Baffin Unknown Prototype Compute Engine
    Driver version  : 1.2 (Dec  9 2016 21:43:55) (Macintosh)
    Compute units   : 64
    Clock frequency : 1000 MHz

    Global memory bandwidth (GBPS)
      float   : 402.93
      float2  : 439.91
      float4  : 419.14
      float8  : 196.41
      float16 : 109.40

    Single-precision compute (GFLOPS)
      float   : 7180.09
      float2  : 6990.68
      float4  : 6820.10
      float8  : 6769.77
      float16 : 6657.53

chmod +x enable-baffin-CUs.sh

For R9 Nano:

sudo ./enable-baffin-CUs.sh fiji 64 

For RX 480:

sudo ./enable-baffin-CUs.sh ellesmere 36

#!/bin/sh
#
# Script (enable-baffin-CUs.sh) by Goalque (goalque@gmail.com)
# Credit to okrasit and Fl0r!an:
# http://www.insanelymac.com/forum/topic/313977-r9-nano/?p=2332854
# https://www.tonymacx86.com/threads/enable-all-cores-r9-fury-cards.209892/#post-1393445

first_argument="$1"
second_argument="$2"
init_function=""
CU_count=""

pattern1="s/(\x48\xB8)(\x02)(\x00\x00\x00\x01\x00\x00\x00\x48\x89\x43\x54\xC7\x43\x7C)(\x08)(\x00\x00\x00)"

pattern2="s/(\x0F\x42\xC8)(\x89\x8B\x80\x00\x00\x00\x44\x88\xB3\x99\x00\x00\x00\x44\x88\x73\x20)"

pattern3="s/(\xE8)(\x49\x85\xFE\xFF)(\xBE\x48\x01\x00\x00\x4C\x89\xF7)"

if [[ "$first_argument" == "ellesmere" ]]then
init_function="\x46\xE4\x00\x00"
elif [[ "$first_argument" == "fiji" ]]then
init_function="\x73\x02\x01\x00"
elif [[ "$first_argument" == "baffin" ]]then
init_function="\x49\x85\xFE\xFF"
fi

if [[ "$second_argument" == 36 ]]then
CU_count="\x04\3\x12\5"
elif [[ "$second_argument" == 64 ]]then
CU_count="\x04\3\x20\5"
fi

if [[ "$init_function" != "" ]] && [[ "$CU_count" != "" ]]then

rsync -a /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100 /tmp/AMDRadeonX4100
cat /tmp/AMDRadeonX4100 | perl -pe "$pattern1"$"/\1"$CU_count"/g" | perl -pe "$pattern2"$"/\x90\x90\x90\2/g" | perl -pe $pattern3$"/\1"$init_function"\3/g" > /tmp/AMDRadeonX4100_modified

rsync -a --delete /tmp/AMDRadeonX4100_modified /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

chown -R root:wheel /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

chmod -R 755 /System/Library/Extensions/AMDRadeonX4100.kext/Contents/MacOS/AMDRadeonX4100

rm /Volumes/Macintosh\ HD/System/Library/PrelinkedKernels/prelinkedkernel 2>/dev/null

rm /Volumes/Macintosh\ HD/System/Library/Caches/com.apple.kext.caches/Startup/kernelcache 2>/dev/null

touch /System/Library/Extensions
echo "Rebuilding caches..."
kextcache -q -update-volume /Volumes/Macintosh\ HD
echo "Ready."
else
echo "Invalid parameters."
fi
IndigoBench RX480 32CUs
R9 Nano 64CUs

automate-eGPU EFIapple_set_os.efi

Mid 2015 15-inch MacBook Pro eGPU Master Thread

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (ASUS XG Station Pro) + Win10 1809 [build link]  


Daelin, FricoRico, ikir and 2 people liked
ReplyQuote
itsage
(@itsage)
Illustrious Member Admin
Joined: 4 years ago
 

Thank you Goalque! I assume this would work for Mac Pro tower and Hackintosh users as well?

external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2020 13" MacBook Pro [10th,4C,G] + RTX 2080 Ti @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 2004 [build link]  


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined: 4 years ago
 

@itsage: Not tied to eGPU use. Feel free to try it out Smile

automate-eGPU EFIapple_set_os.efi

Mid 2015 15-inch MacBook Pro eGPU Master Thread

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (ASUS XG Station Pro) + Win10 1809 [build link]  


itsage liked
ReplyQuote
itsage
(@itsage)
Illustrious Member Admin
Joined: 4 years ago
 

Thank you! If you don't mind, I'd like to share it with the Mac Pro community. I know a lot of Mac Pro tower users who have been sitting on the sideline waiting for full driver support for Polaris GPUs. With this script, it's as close to official driver as one could hope for. 

external graphics card builds
best laptops for external GPU
eGPU enclosure buyer's guide

 
2020 13" MacBook Pro [10th,4C,G] + RTX 2080 Ti @ 32Gbps-TB3 (AORUS Gaming Box) + Win10 2004 [build link]  


ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined: 4 years ago
 

Yep, no problem. Remember that this script does the CU count patch only, nothing else. Who knows, maybe we’ll have a main course after an appetizer 😉

automate-eGPU EFIapple_set_os.efi

Mid 2015 15-inch MacBook Pro eGPU Master Thread

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (ASUS XG Station Pro) + Win10 1809 [build link]  


nando4 and itsage liked
ReplyQuote
FricoRico
(@fricorico)
Eminent Member
Joined: 4 years ago
 

What is up with the cores limit exactly? Is it a Mac OS generic lock, or only locked because the Kexts are intended for different GPUs?

I take it this does not improve Metal/OpenGL performance?

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined: 4 years ago
 

That’s a good question. There are a couple of values in hardware initialization functions. I guess the 16 CU limit is there because officially announced Polaris-based GPUs cannot utilize 64. Apple has quietly improved AMD drivers and added new device ids. Fiji is added purposely. This is a signal that they want to support a large variety of AMD GPUs.

I don’t know if this has an effect on shaders in OpenGL or Metal, probably not. Valley benchmark score did not seem to be affected.

automate-eGPU EFIapple_set_os.efi

Mid 2015 15-inch MacBook Pro eGPU Master Thread

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (ASUS XG Station Pro) + Win10 1809 [build link]  


nando4 liked
ReplyQuote
FricoRico
(@fricorico)
Eminent Member
Joined: 4 years ago
 

As AmandTech pointed out, an AMD Compute Unit is made out of 4 SIMDs, where many CU's make the base of the GPU. Then everything is put through the Pixel Pipeline. I can't imagine those would be either limited or a bottleneck for the AMD RX48o. I would expect to see a graphics performance increase as well, but maybe the Thunderbolt 2 connection is limiting the GPU in heavy graphic computations.

I will run some benchmarks with both Metal as OpenGL, where Metal really has a huge performance increase for eGPUs (some games over 3 times the performance), probably because of a smaller bandwidth overhead.

Very nice find Goalque!

To do: Create my signature with system and expected eGPU configuration information to give context to my posts. I have no builds.

.

ikir liked
ReplyQuote
goalque
(@goalque)
Noble Member Admin
Joined: 4 years ago
 

What is certain at present is that the hack has an influence on CL_DEVICE_MAX_COMPUTE_UNITS value in OpenCL’s clGetDeviceInfo call. The R9 Nano has 64 ROPs/64 CUs, and RX 480 has 32 ROPs/36 CUs.

https://www.techpowerup.com/gpudb/2735/radeon-r9-nano

https://www.techpowerup.com/gpudb/2848/radeon-rx-480

AFAIK, TB2 doesn’t bottleneck much GPGPU tasks. I’m looking forward to your results!

automate-eGPU EFIapple_set_os.efi

Mid 2015 15-inch MacBook Pro eGPU Master Thread

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (ASUS XG Station Pro) + Win10 1809 [build link]  


ReplyQuote
ikir
 ikir
(@ikir)
Prominent Member
Joined: 4 years ago
 
Posted by: FricoRico

I will run some benchmarks with both Metal as OpenGL, where Metal really has a huge performance increase for eGPUs (some games over 3 times the performance), probably because of a smaller bandwidth overhead.

   

Yeah, this is the kind of news i love to read!


MacBook Pro 2018 Touch Bar i7 quad-core 2.7Ghz - 16GB RAM - 512GB PCIe SSD
my awesome Radeon VII eGPU
my Mantiz Venus extreme mod with Sapphire Nitro+ RX Vega 64

 
2018 13" MacBook Pro [8th,4C,U] + Radeon VII @ 32Gbps-TB3 (Mantiz Venus) + macOS 10.15 [build link]  


ReplyQuote
 of  4