Cyberpunk 2077 Killed my eGPU? - Thoughts on what went wrong
Recently while playing Cyberpunk, I experienced a crash (slight OC on GPU). I hard powered the computer down after no signal returned to the external monitor and upon rebooting, the system no longer detects the GPU connected via the ADT-Link. After DDU and uninstalling/reinstalling NVIDIA drivers, I got no improvement.
I have disconnected and reconnected the M.2 Connector, swapped the connector and SSD to show that both slots are still good (SSD worked in both slots) but can no longer get the GPU to show in Device Manager.
The GPU still powers on and spins up but is not recognized. I have also refreshed my PC (fresh install of WIN10) and still no luck.
Do you guys and gals think the culprit is hardware or software? If hardware, is it likely the GPU is dead? This is a month old eVGA RTX3090 FTW3 with only eVGA precision XOC software overclocks so I would think that is unlikely. What about the ADT-Link it self? I see no physical signs of damage on the ADT-Link. I also haven't seen any posts about the board (ADT-Link) itself dying.
Seeking wisdom on where to look next.
Do you happen to have another pc that you could use to test the 3090? Throw that into a desktop and see if it is recognized? That would be the easiest way to see if the card is bad and then you could check other components.
@mbliss11 thank you for taking the time to respond! That would have been my go to test as well! I wish I had a desktop, even if a neighbor had one, to do the test. Unfortunately not an available option. However, I did see that a few people on the eVGA forums posted about the exact same thing happening on the exact card on desktop PCs. The symptoms experienced prior to and after this issue are all identical to mine.
I wish I had better news but it seems the way this is trending, it could be a dead GPU. For reference if anyone wants to see the post on evga forums:
Update: So in fact, Cyberpunk 2077 did kill my GPU.
I found multiple users who had the same symptoms before their GPUs (all 3000 series from Nvidia) failed on the manufacturer forums. After calling with the RMA center and explaining what had occurred, it seemed like this was a known issue and the rep gave me no push-back and started a return. This is NOT a buyer-beware. I was just unlucky enough to have received a lemon. So the goal of the follow-up is simply to save another user the headache of troubleshooting / research should he or she experience the same thing. --> Go straight to RMA and Customer Service of your GPU manufacturer
If your GPU fails to post after a hard crash, check if you have these symptoms
-Right before or during crash, the fans on the GPU suddenly spin to 100% (GPU has failed so fan controllers default behavior is to go to 100% to try and protect components)
-Any strange smells around this time or anytime during initial operation (could indicate weak or failed component)
-GPUS LEDs and Fans may light and spin, but trying different PCI-E slots does not work, nor do driver uninstall/reinstall
-Some users also reported seeing a permanently lit red LED near the PCI-E slot after component failure
If any of the above is you (in particular point 1 and 2), start the RMA process as you have experienced a known failure with the 3000 series.
I imagine your RMA must have gone thru already but I was wondering: Before sending the GPU back, did you inspected it physically? Maybe there could've been signs of a blown capacitor or something?
If possible give us an update of what went on afterwards. Hopefully the eGPU Enclosure & the M.2 Connector survived the whole ordeal.
Hey all, have a follow-up now that I received my RMA unit back. To quickly recap, I was fortunate enough to receive an EVGA FTW3 Ultra 3090 which I connected using an ADT-Link R43SG via the m.2 slot. After about 1.5 months, and somehow while playing Cyberpunk, the GPU died. @darkjz, I did not notice any physical damage or smell anything burnt after initial failure. However, I did not remove the cooler to inspect the PCB since I was ending the broken unit back for RMA.
After receiving the RMA unit, here are my observations:
-No damage to the R43SG, replacement GPU posted and worked fine
-No damage to either m.2 slot, I swapped my m.2 SSD and ADT-link connector in the laptop around and both still work fine
-RMA FTW3 is likely a refurb, it was not shipped in a retail box, serial number range is still within the Taiwan factory production. Could be a broken card that EVGA refurbed.
-EVGA does stand behind their products. After describing the problem, the tech advanced me to the RMA process right away rather than wasting time going through mandatory checklists. I'm amazed I received an RMA product within 10 days. I was prepared to wait months.
My speculations are that whatever component is failing, its an easy enough repair that at their US depot, the techs at EVGA could swap out whatever faulty component. These refurbs than become RMA units that the company keeps on hand so they don't make customers wait months for next availability.
I've since installed a Hybrid AIO kit by EVGA and reflashed the BIOS to the Hybrid one. Currently still running strong. Fingers crossed I don't get another unit on the tails of the QC distribution....
@ishikawa_goemon, Would love to see the Hybrid cooler with the ADT-Link arrangement!
So, I have a similar setup, and the other night I noticed that my 3090 (which is connected to an external PSU like yours) was really hot to the touch, even though my laptop and TB3 cable were disconnected from the Core X Chroma. The GPU fans weren't running, the GPU lights weren't on, so at first glance everything appeared to be off. Once I noticed how hot the 3090 was, I turned off the PSU, disconnected the 3090, and gave it a sniff test. It's slightly smelled of.. burning electronics? Or maybe just hot electronics? (If you know the smell, you know the smell.) I didn't get the impression that anything was broken or dead just yet, but given how hot the graphics card was.. I would imagine that if left longer over time that some parts could die prematurely.
If I had to guess, when connected to the Core X Chroma PSU, when the laptop is disconnected, everything turns off, including the eGPU PSU, so no power goes to the graphics card.. But because (in our external PSU setups) the external PSU isn't shut down by the eGPU enclosure, the PSU provides some level of power to the 3090, and that low level of power (slowly) heats up the graphics card over time. And maybe the graphics card fans/thermal controls are powered by the PCI-e power line? So the thermal controls (eg. fans, power throttling, etc.) that would normally prevent damage aren't activated? Just guessing though.
RIP your 3090. 🙁
@NessLookAlike, that's a really important observation that will likely save some other member's GPU -thank you! So the key take-away is that when powering down your system, one needs to also disconnect the eGPU power source so there is no chance of this slow heating occurring?
So good you were observant and caught this in time before any damage occurred to your 3090.
In my situation, I always powered off the external PSU when shutting down the laptop so unless there is some mechanism that would still allow the GPU to pull power even with the PSU's physical power switch moved to off, I don't think it was a slow death by overheating situation for me.
Interestingly, in the eVGA forums, some other members who reported similar failures (albeit their GPU was in a desktop) reported smelling that burnt/overheated electronic smell.