Whats Wrong with Facebook 2019
By
fardhan alief
—
Wednesday, September 18, 2019
—
What's Wrong With Facebook
Whats Wrong With Facebook
The key flaw that triggered this failure to be so serious was an unfortunate handling of a mistake problem. An automated system for verifying setup worths wound up triggering far more damage than it taken care of.
The intent of the automated system is to check for arrangement values that are void in the cache and also change them with upgraded values from the consistent shop. This functions well for a transient issue with the cache, but it does not function when the consistent shop is invalid.
Today we made a modification to the persistent copy of a setup value that was interpreted as invalid. This meant that every client saw the invalid value and tried to fix it. Due to the fact that the solution includes making a question to a cluster of data sources, that collection was promptly bewildered by hundreds of countless questions a 2nd.
To make issues worse, each time a client got an error trying to query one of the databases it interpreted it as an invalid worth, and also erased the matching cache key. This suggested that also after the original trouble had actually been repaired, the stream of inquiries continued. As long as the databases stopped working to service several of the requests, they were causing a lot more demands to themselves. We had gotten in a feedback loop that didn't enable the databases to recuperate.
The means to quit the feedback cycle was fairly unpleasant - we had to quit all website traffic to this database collection, which suggested shutting off the website. When the databases had actually recovered and also the root cause had actually been dealt with, we gradually enabled even more individuals back onto the site.
This obtained the site back up and also running today, and also for now we've switched off the system that tries to remedy arrangement values. We're exploring new styles for this configuration system adhering to layout patterns of other systems at Facebook that deal even more beautifully with comments loopholes as well as short-term spikes.
We say sorry again for the website outage, as well as we want you to know that we take the efficiency and reliability of Facebook extremely seriously.