[Solved] [HELP] Egpu Works Fine, Until It Crashes
Edit #2: I found the issue. Turns out the guy I bought the used GPU from on eBay had a defective card. I found out by replacing the GTX 770 with a GT 640. I'll get a refund or file a claim with eBay. I'll write more details in Update # 3 at the bottom of the post.
EDIT: Skip to the bottom to Update #2 to find out what's causing this issue. It doesn't hurt to read any of what I wrote but I found the error causing this.
I am unsure of whether this issue is due to software or hardware, it could be both, please read through as I've been troubleshooting to no avail so far.
I'm currently running a Mac Mini Late 2012 i7 @ 2.3 Ghz with a EVGA Gtx 770 through a Akitio Thunder 2 dock with a 400 watt EVGA PSU. The setup uses a monitor which is connected to the gtx 770 directly
I got the setup working in Mac OS El Capitan but I kept running into kernel panic error 193 which was due to the modified drivers from the egpu script I found on this website (I think it was the automate-eGpu.sh v1.0.1)
Therefore I decided to setup the system in Windows.
Now I have got the system setup with official NVIDIA 452.06 drivers and Apple bootcamp driver support on Windows 10 LTSC and it runs fine getting decent fps until the load increases. Now I have run several Unigine benchmarks on ultra for a few minutes and I never encounter this issue, but whenever I play a game for more than 1 hour, the system will What I have not tested:
- Re-seat the RAM: My system reads 16 gb ram and I even occasionally used all 16 gb so I don't see why this would be an issue.
- Change PSU
- Test out dock on another PC
- Test out GPU on another PC
- Test out PSU on another After reading through forums for 4 days on end I have tested/thought about several things:
- Disable Igpu: It's off when I boot up the computer to begin.
- Drivers: I have used DDU and re-install official NVIDIA drivers. I would never do this but I also used Driver Easy to see if any of my other drivers caused this such as audio.
- Psu wattage: I have thought about changing the psu but that wouldn't make a difference because the setup never even comes close to 400 watts.
- Overheating and Temperatures: I have monitored temps for the past week and neither the GPU or CPU are overheating.
- Custom Barrel Mod increased resistance: I have speculated that due to the soldering from the PSU to the Dock for a barrel mod I may have increased resistance but that doesn't make sense because when I close the game the computer still has low fps and even crashes.
- HDD issues: I ran chkdsk and this issue still continues.
- Windows re-installation: Doesn't help
- Underclocking and changing temperature limit of GPU & CPU: Does not help either.
- Switching out the 770 for a different GPU into the same dock (I have the hardware for this unlike the others).
I have not tested most of these because I don't have the hardware to do so.
- Whenever the crash happens it only happens after about 30 minutes to 1 hour and it happens randomly or not at all. One day I was able to get the PC running for 4 hours, the next day it only runs for 1 hour.
- When the PC reboots, the issue dissapears.
- The GPU-z logs show no patterns at all, one time the system crashed with the TDP decreasing rapidly first and GPU load closely following to 0%, with clock speed only decreasing after the problem started, in every other instance I noticed the system runs perfectly but the FPS drops (nothing else drops besides fps) and the system crashes within under a minute.
- The solder connection I made for the PSU to dock was not pretty, I used a bit of heat from a fire covered the wires in flux, there was carbon on the surface that I washed off, the connection was strong and shiny. The system runs fine with this wire and resistance is most likely not an issue IMO.
- ( I know this is for bootcamp but this may be relevant) In Mac OS the system ran fine until I came into kernel panic at about the same time frames (30 minutes to an hour) but the kernel panics were independent of whether or not I was running a game, seemed to be a driver issue since I got them from the script.
- When the system crashes fans do not speed up.
- The system does not show a bootscreen because I am only connected with the EGPU, I am also unable to see BSOD.
- The one time I attempted to unplug the HDMI from the EGPU and plug it into the IGPU in order to see BSOD, the system fans spun-up to 100% and I never tried again. Usually during this crash/freeze the system fans don't spinup.
- I ran into the same issue even if I ran the egpu in debug mode or changed power settings in NVIDIA control panel.
Other important info:
- I have applied the automate-egpu.efi and integrated.bat script when I first booted into windows and I ran the system starting with the USB and the system worked fine. Then one time I just booted up into windows to see what would happen and ever since then I never needed to use the usb, the system works just fine without booting into the scripts.
- When I look into thunderbolt drivers, it wont show me the driver version or info but in details it does say there is a driver???
- I mentioned this earlier, but I bootup with the system HDMI connected directly to the eGPU, this allows me to use the pc without having to switch the hdmi connection. I'm not sure if that may be causing the issue as I haven't tested booting up plugged into iGPU and then plugging into eGPU.
- I didn't have this issue before installing the eGPU on Windows.
I have been working on this issue for a few days and I need help.
UPDATE: I booted up the computer with the efi script on USB and plugged into the iGPU (I crossed that out above), then after booting into windows I plugged the HDMI into the eGPU. I'm still getting the same problem, but this time when the system crashes I see a purple screen and no text which then crashes. This hasn't happened before either?
The problem still persists. 🙁
Update # 2:
Most of what I mentioned regarding hardware and scripts doesn't apply anymore, it's a software issue not hardware and I found the error finally. Uninstalled and re-installed Windows LTSC 1809 several times, installed all drivers, went through dump files. This is what I've learned so far:
- It's caused by nvlddmkm.sys errors and dwm.exe app crash.
- Now I hypothesized that the error was due to drivers, but after testing drivers 334.89 all the way to 452.06 from NVIDIA I still run into crashes.
- If nothing else works I'll change from 1809 to something else.
- The crashing will only happen when the setup is under load such as games, I can do work all day long without crashing on this.
UPDATE # 3: Turns out I was wrong in update 2, while all the evidence pointed towards software, it was actually hardware. So I replaced the GTX 770 with the GT 640 I had lying around and I boot up a few times with the 640, played a couple games, no crashing each time. then I examined the PCB of the 770 and I noticed marks of what look to be a bad repair (somewhat shrinked plastic that lost its sheen). Most likely what happened is the guy fried the 770, did a bad "repair" and flipped it on eBay. My hypothesis for this happening is that when the shoddy repair was done, the PCB sunk which distorted the circuitry, therefore limiting the gpu during intense load.
TL:DR I have to buy a new GPU because the guy who sold the previous GPU to me screwed it up somehow. Gonna go file a claim or get my money back.
Once I get a new GPU I'll post the finished build. 😎
@jina_helms, Good to see you're getting close to figuring this issue out. I read your post a couple days ago and immediately thought it could be the power connection/solder joint. You mentioned you checked that thoroughly so hopefully it's not a hardware issue.
@itsage, Thanks for reading.
I swapped out the 770 for a gt 640 to experiment. Turned out the 770 was the culprit. Luckily it's still within the eBay policy time limit so I can dispute this.
Instead I'll probably get a rx 570, 1650, or a 1660 for my pc now. Going to wait for a price drop once the RTX 30 series is for sale in stores.