When Elon Musk said Tesla cars are computers on wheels, he forgot to mention they run on Linux. They also do a lot of logging. According to Jason Hughes, from 057 Technology, more than they should:
“The information logged here is pretty much useless on production vehicles. Unless a developer has a specific reason for enabling it, it does the customer no good. These logs are also rarely downloaded by Tesla.”
Why are we telling this? Because you may end up paying more than $1,800 to repair your Tesla because of a cheap eMMC flash memory card.
***UPDATE: MCU stands for Media Control Unit, not Main Control Unit, as we mentioned before. The text is already corrected. Tesla CEO Elon Musk thinks this should be much less of a problem now, but Hughes did not see any improvements. Check out the embedded tweet below:
Gallery: Tesla Cars Have A Memory Problem That May Cost You Much To Repair
The reports about the problem come from three different shops in three very different places. 057 Technology is from Hickory, North Carolina. Robert Cotran and his partner Jean-Claude Thibault work on the issue at the Cotran Consulting in Candiac, Québec. He kindly provided us the MCU pictures for this article. Pete Gruber's work, from Gruber Motors, deserved a whole series of videos from Out of Spec Motoring that we talked about here. He is in Phoenix, Arizona.
All three helped us explain the failure. More than that, they aim to warn Tesla owners that the clock is ticking for all of them. Regardless of your car, the logging will require replacing your MCU sooner or later.
Hughes told InsideEVs:
“The main issue is that this excessive log file writing causes eMMC flash wear. Flash memory is generally only rated for some tens of thousands of write cycles. What happens is that the flash memory starts to fail when writings can no longer be completed. When one block fails, parts of the firmware may also become unreadable, leading to poor operation or failure of the MCU completely.”
Ask Cotran what the problem is: You’ll get the same answer.
“The filesystem in MCUv1 is handled on a NAND-based eMMC flash chip. Although these are solid-state and great for automotive use, there is one pretty serious drawback. Each memory bit on a flash chip can only be written to a limited number of times before data gets corrupt – and that bit can no longer reliably store a 0 or a 1.”
Gruber is even more direct with the diagnostics:
“Tesla selected a flash chip that is unable to handle the constant read/write functions. These chips have since been replaced with a more robust version.”
If you still did not understand what happens, every Tesla has an MCU, or media control unit. Version 1, also called MCUv1, equipped Tesla Model S and Model X units up to 2018. When it fails, the car loses important features controlled by the touchscreen. That may make driving it dangerous, as these threads on TeslaMotorsClub.com and on Tesla's official forum show.
Among its many components was an nVidia Tegra ARM-based CPU. Tesla soldered the 8 GB eMMC flash memory chip to the same board of the CPU.
When the cars started to be sold, at the beginning of the 2010s, logging was not an issue. Hughes said:
“However, since the initial release, Tesla's firmware image size has gone from about 300MB to the full 1GB maximum size.”
In other words, the firmware is now competing with logs for space on the eMMC chip. When the log writing wears a sector of the chip, it uses a mechanism called wear leveling. Cotran explains the process:
“The eMMC flash chip architecture attempts to mitigate this problem using a wear-leveling technique. It spreads out write operations over the entire chip to ensure that specific bits are not written to very often, essentially avoiding the write limitation.”
Check what Hughes has to say about this:
“The flash controller transparently and seamlessly spreads the wear across the chip utilizing unused sections of the flash memory to extend the effective number of write cycles available. With Tesla utilizing near 100 percent of the flash memory today, there is no free space left for additional wear leveling to compensate for the excessive log writing.”
Simply put, there comes a time in which the eMMC flash memory fails. Cotran told InsideEVs:
“If data is changed on the chip too often and in large quantities, wear-leveling can only do so much and at one point data starts to get corrupt. You can either lose data or core functionality can start to fail depending on where the corruption occurs.”
When that takes place, Tesla just replaces the whole MCU. If your car is still under its warranty period, that is for free. If coverage is no longer valid, you will have to pay the bill.
Hughes told InsideEVs:
“At $1800 for a replacement, getting this fixed at a Tesla service center out of warranty isn't cheap. Given the nature of the failure, it usually does take years to happen... Although likely less on cars closer to the end of production of MCUv1 – around Q2'17 up to Q1'18. They would have started life with less wear-leveling ability, to begin with, due to 100 percent flash usage.”
That would be the price for a replacement in the US. We have asked Tesla if that is correct, but the company did not get back until we published the article. If it does, we will update it. Check what Cotran had to say about this:
“In Canada, that can cost up to C$4000 for parts and labor. It sometimes requires a wait if the service center is busy or doesn't have spare MCU units to swap with.”
That corresponds to a little more than $3,000.
Hughes asks a fraction of that to repair the MCUs that present the issue. More precisely, 13.3 percent.
“I currently charge $399 for the repair service, but I need the MCU or vehicle at my shop, 057 Technology, to do so. The turnaround can take some time as we've been super busy lately with this and other projects.”
Cotran says his price depends on many variables. Anyway, his “repair method costs a lot less for the consumer.” And he remembers another important aspect of how Tesla is currently dealing with this.
“They are replacing many units at service centers, and unfortunately, the computer goes to waste when only a replacement chip is necessary.”
He tried to prevent the automaker from turning its MCUs into electronic trash. To no avail.
“We approached Tesla to offer our services, but they were not interested and claimed that they do not outsource work.”
Gruber charges $165/hour.
“That’s for component-level diagnosis, troubleshooting, and repairs. We have sophisticated flash memory chip removal equipment, and ball soldering equipment, and commonly reverse engineer and perform component level repairs.”
All three of them replace the flash memory chip. Cotran told InsideEVs:
“We remove the MCU from the car and dismantle it completely. Then we are able to extract unique identifying authentication keys from the eMMC even though part of it is corrupt. These keys are necessary for the car to authenticate against the Tesla network and give the user access to firmware updates and the Tesla app.”
He and Thibault replace the original Hynix eMMC chip on the board for the Swissbit you can see standing alone in the image right above and in the ones below.
Hughes gives us an idea of how complicated the repair is.
“Since this flash chip is a hard-connected unit, there's just no simple way to replace them. It involves specialized tools and equipment. In my case, I pre-repair several units to have on hand for customers. That means I mainly have to try to recover the car-specific information from their unit to copy to a replacement and match the firmware version. It's tedious but doable. In general, it can take several hours of work even with the ready-to-go replacements on hand.”
Gruber follows a slightly different procedure.
“We are working on installing flash memory chip sockets since the chips fail over time, instead of replacing an entire MCU for a failed flash memory chip.”
Gruber replaces it with a more robust chip, with a larger capacity. All of them do that, in fact. Cotran said:
“Once that is done, we de-solder the defective eMMC chip from the processing board. We then have some proprietary scripts which create the filesystem layout necessary for the proper operation of the MCU computer on a brand new eMMC chip. We use industrial-grade chips, and we double the memory capacity from 8GB to 16GB in order to give it more space to perform its wear-leveling operation.”
Gruber confesses he still needs to sort one aspect of the issue: software.
“We are still unsuccessful reprogramming the chips once we replace with more robust technology. Our MCU repairs are confined to power supply issues, non-flash chip component failures, wiring failures."
Perhaps Cotran and Hughes can help him with that. There is a lot of work for all of them related to this eMMC failure. Gruber told InsideEVs:
“We have helped dozens of customers with this problem.”
Cotran says he has seen a dozen of clients with the issue since he started repairing it.
“Thibault has come on-board for the Tesla repair side of things.”
Hughes has fixed more than a dozen Teslas with MCU failure just in September.
Does replacing the failed hardware solve the problem? Only temporarily. Cotran told InsideEVs:
“This ensures that the eMMC chip will last much longer than the original. But we also plant scripts on the computer which write logs to a RAM drive instead of to the eMMC chip.”
Using the RAM was also something Hughes decided to do.
“Tesla can do like I do on my cars and move the logging to RAM to trick the system, but this penalizes performance a little since RAM is limited.”
Cotran mentioned other pros and cons:
“This has one big benefit and one big drawback. The benefit is that the logs are no longer being written to the eMMC and instead to RAM. It does not suffer from any excessive write problems. The drawback is that logs are now in volatile memory. That means that, if the computer gets powered off or rebooted, the logs are lost. We feel that the trade-off is worth it because nobody wants this to happen again.”
There is no escape regarding the root cause of the issue: excessive logging. And the only one able to address that is the manufacturer. Hughes told InsideEVs:
“Tesla needs to just disable syslog on all vehicles unless specifically required on a development car or to diagnose an infotainment issue on a specific car. There are absolutely zero reasons to log hundreds of MB per day to a small built-in flash chip.”
“In my opinion, this isn't necessarily a manufacturing defect. The various Tesla software teams may not have considered that other teams were logging as much when they were coding their logging functions. However, there was definitely some kind of oversight in general that was missed in the engineering of these units. I do think Tesla should find a way to mitigate this going forward – as we have – to prevent the unnecessary replacement of these computers.”
We have asked the company what it will do to prevent the problem but got no response so far. We did not have to ask if it was aware of it. Hughes said:
“Tesla has known about this issue for years now and has done nothing to mitigate it. I've personally reported it on multiple occasions, and I know others have as well. I've noted this to Tesla on several occasions, starting in late 2015, and several times since.”
Tesla needs to act. Not only because of the older units with MCUv1 but also because its newer cars are at higher risk of MCU problems than the older ones. Hughes told InsideEVs:
“Instead of mitigating the issue, it writes even more data to the logs today than ever before. Combined with the max-size firmware images, general caching – map tiles, Autopilot info, music, etc. – this makes every MCUv1 have a high probability of failure.”
Despite having a 32GB eMMC flash chip, MCUv2 deals with much larger software. Cotran said:
“There is a lot more room for wear-leveling. Keep in mind, though, that MCUv2 does have some advanced functions – like 3D gaming and Youtube, Netflix, etc. – which does take more room on the filesystem.”
Hughes thinks likewise.
“MCUv2 and Model 3 also have an issue with excessive logging. Fortunately, they have a larger flash memory size, which should mitigate the issue for the time being. Tesla will still have to eliminate or curb this logging significantly on these if they want them to last, though.”
For Cotran, having to deal with the MCUv2 is not a matter of "if" but "when".
“I haven't done a lot of work on MCUv2 yet simply because the fleet is mainly MCUv1. Those are the units that need repair now. We do have an MCUv2 here where we have file system-level access and will be looking around in there at one point soon.”
The sad part of the story is that Tesla has been aware of the issue since 2015. And it has apparently done nothing so far to correct it, hence the multiple cars being currently fixed. Why? Hughes has a theory.
“The cynic in me looks at this as a planned obsolescence type of thing... However, the reality is probably a lot more benign: laziness.”
A successful Hanlon’s Razor example? Only Tesla can say.
If you have one of their EVs, knowing about the issue will help you try to avoid it. If your car was made before 2018 and is still under warranty, try to verify if your MCU is working properly. You may still get a replacement for free.
If you no longer have coverage, talk to Cotran, Hughes, or Gruber. They may develop a prophylactic solution to your MCU prior to having to replace it. Such as writing the logs to the RAM before the eMMC card fails.
For all other Tesla clients, ask the company to get this solved. Perhaps a single update can avoid future problems. Burying your head on the sand and calling this article FUD (Fear, Uncertainty, Doubt) will only help until “the bell tolls for thee.” According to Hughes, Cotran, and Gruber, it will toll for every current or past Tesla MCU. It is just a matter of time.