Facebook And Related Sites Go Down, Chaos Ensues
Facebook, along with WhatsApp, Instagram, and Facebook Messenger, have gone offline and it appears to be not just a glitch.
Facebook and many apps in its suite of social media and chat services went dark for hours Monday in a widespread outage that appeared to affect users globally.
Facebook, Instagram, WhatsApp and Messenger were unreachable for many users, who instead saw a spinning wheel on their apps that never loaded.
Facebook’s internal communication platform, Workplace, went down altogether, said a person familiar with the matter who spoke on condition of anonymity because they weren’t authorized to speak publicly. As employees turned to third-party tools such as Slack, many found themselves locked out of even those, because Facebook’s mechanism for signing on to them was not working said another person familiar with the matter who spoke under the same conditions.
Facebook spokesman Andy Stone tweeted that the company was aware of the issues and was “working to get things back to normal as quickly as possible, and we apologize for any inconvenience.”
Reports on Downdetector suggest users across the United States, in Egypt, in Serbia and many other places were impacted. The issues began at about 11:39 a.m. Eastern time.
“Something happened internally at Facebook that messed with their network settings on how Facebook talks to the rest of the world and accesses the Internet,” said Courtney Nash, senior research analyst at security company Verica.
The issue seems to be with Facebook’s border gateway protocol routes, or paths that allow routers to exchange information, said Doug Madory, director of Internet analysis for Kentik, a network monitoring company. Madory calls them the “underpinnings of how the Internet operates.”
Brian Krebs offers a preliminary explainer:
Doug Madory is director of internet analysis at Kentik, a San Francisco-based network monitoring company. Madory said at approximately 11:39 a.m. ET today (15:39 UTC), someone at Facebook caused an update to be made to the company’s Border Gateway Protocol (BGP) records. BGP is a mechanism by which Internet service providers of the world share information about which providers are responsible for routing Internet traffic to which specific groups of Internet addresses.
In simpler terms, sometime this morning Facebook took away the map telling the world’s computers how to find its various online properties. As a result, when one types Facebook.com into a web browser, the browser has no idea where to find Facebook.com, and so returns an error page.
In addition to stranding billions of users, the Facebook outage also has stranded its employees from communicating with one another using their internal Facebook tools. That’s because Facebook’s email and tools are all managed in house and via the same domains that are now stranded.
“Not only are Facebook’s services and apps down for the public, its internal tools and communications platforms, including Workplace, are out as well,” New York Times tech reporter Ryan Mac tweeted. “No one can do any work. Several people I’ve talked to said this is the equivalent of a ‘snow day’ at the company.”
The mass outage comes just hours after CBS’s 60 Minutes aired a much-anticipated interview with Frances Haugen, the Facebook whistleblower who recently leaked a number of internal Facebook investigations showing the company knew its products were causing mass harm, and that it prioritized profits over taking bolder steps to curtail abuse on its platform — including disinformation and hate speech.
We don’t know how or why the outages persist at Facebook and its other properties, but the changes had to have come from inside the company, as Facebook manages those records internally. Whether the changes were made maliciously or by accident is anyone’s guess at this point.
Obviously, this is a developing story and probably one that will have some twists and turns as information comes out. Until then, second look at My Space?
Come now. Haven’t we all inadvertently crushed the corporate network at least once?Report
I definitely did *not* click that link.Report
I reply’d all. How often do you get the chance?Report
Oh man, FB hit with ransomware – that would be funny.Report
I’d be surprised if that wasn’t the case.
I once attended an 800am Monday morning emergency meeting called “Do not mount the user folder to /tmp”. This is because, of course, every Sunday night a garbage collection routine would empty out /tmp, as it’s just there to hold temporary files, and if it wasn’t cleaned regularly it’d bloat up.
Someone, who wasn’t at the meeting because he was in the middle of a flight overseas but was going to arrive to a TON of angry emails — had, in fact, mounted /user to /tmp, and then didn’t dismount it. And then hopped a flight to Europe for a work trip.
So around 2:00am, the garbage collector cleaned out /tmp. And also /user, the entire user partition with all the useful data.
I wasn’t thrilled to be there at 800am for a problem I had nothing to do with, but it was called by the angry people who had gotten woken up at 2:00am as numerous critical processes started failing and automated “OH CRAP” alerts had fired off.
It took them about 12 hours to get everything back up.Report
Better to be there at 8 AM for a problem someone else caused than not to be there at 8 AM for a problem you caused.Report
I bet that garbage collector routine now looks for /user and dismounts it before going to work.Report
Well. I guess we’ll have to back to using MySpace, Friendster, GeoCities, or even (horrors!) blogs!Report
Former vice-presidential candidate points out that this looks exactly like an op:
Report
Current nutbar posts unsubstantiated conspiracy theory without evidence to twitter because it is a bigger soapbox than the Kinko’s copier at 3:00 a.m. Wishthinker reposts to just ask questions. In other news, Franco is still dead.
Yes FB has had a lot of bad press and the whistleblower interview last night was horrible for them but the idea that someone did something idiotic engineering wise is still a better explanation for the shut down.Report
My theory is that they got hacked. *BAD*.
The “something idiotic engineering wise” involved ignoring security best practices and they got hit by somebody or a group of somebodies like DarkSide.Report
Brian Kerbs makes the most sensible argument. Someone made a silly mistake, even engineers are capable of this. The timing of the event is letting people fly their rejected Shadowrun games in the open.Report
Shadowrun? Where the hell did that come from?
Wait, have you been playing a Shadowrun game? Without telling us?Report
The simplest answer to what happened is that an engineer made a silly mistake and it took everything down for a few years. The timing of the event is just a damned coincidence. It is not the sign of some cyberpunk dystopia adventure. A good chunk of humanity seems to hate the idea of coincidence though and is letting their freak flags fly. It isn’t harmless fun, it is conspiratorial nonesense. Spike Cohen has no authority just because he was a former candidate for VEEP.Report
I’m with Jaybird on this, Saul. I had no idea until this moment that you even knew what Shadowrun was.
And I think it’s awesome that you do, by the way. Also, that was an excellent use of it in conversation.Report
“The timing of the event is letting people fly their rejected Shadowrun games in the open.”
Like the troll with tailored pheromones, strength mods, skeletal mods, a combat computer, and a truly ridiculous amount of bio-ware due to blatant rules abuse?
He was a truly loved and incredibly likeable chap, mostly due to the pheromones, and quite capable of turning you to a find chunky mist. And definitely not allowed to use him in campaigns because of “rules” and “balance” and “You can’t go around using a crew mounted weapon as a hand-gun when you can’t solve problems by mind-whamming people with your pheromones”…
Poor Hugbear. Strangled by the DM before he really got to fly free.Report
Maybe it’s just the tailored pheremones but I love, love, LOVE this comment! RIP Hugbear!Report
it was more about the inability of people to take coincidence as a thing and let their freak flags fly.Report
Pity it wasn’t twitter, or better yet both.Report
The thing is that it wasn’t just FB, it was instagram and whatsapp as well. Whatsapp is an actually very useful communication tool used by billions of people across the world to maintain contact with friends and family at home or abroad. My partner uses it to call her family and friends in Singapore. A lot of small businesses depend on instagram for sales and advertising.Report
*nods* Yes, it’s serious and troubling for those businesses and people and I feel bad for them.
… … …
Pity it wasn’t twitter, or better yet both.Report
Count me among those who think it is utterly unrelated to whistleblower stuff, just a wild coincidence. I have read the “misconfigured BGP records” elsewhere, and I believe it. The added tidbit was that it was definitely FB’s records misconfigured.Report
I do not know if this is the official explanation, but it is a good one and fits the “dumber than I imagined” requirement:
Report
If a mouse deletes all of his cookies …Report
Good grief.
For context, BGP is the public internet routing protocol. It’s purpose is so that separate companies and internet providers can locate their respective networks. It was never meant to route traffic within an organization.
In other words, if you use Xfinity and want to send internet packets to someone connected to AT&T, Xfinity will use BGP to figure out how to route the packets over the public internet. By contrast, if sending a packet to another Xfinity customer, or an Xfinity server, then no BGP is needed.
If FB managed to lock out they internal systems because of a BGP failure — oof.
I’m actually unfamiliar with how BGP interacts with DNS. I wonder if that is a newish feature.Report
Here is a post from Cloudflare describing how the outage affected them (to begin with, it made them wonder if their DNS server was broken). https://blog.cloudflare.com/october-2021-facebook-outage/Report
Theory: The C++ program that creates and maintains the BGP routing tables usually gets killed via SIGTERM, but someone added a signal handler to let it exit normally, not realizing what would happen when all the destructors ran.Report
I’m coding in Python now. You’re making me nostalgic…Report
I have a love-hate relationship with Python. I’ve written several small things with it — I get something reasonable up and running as quickly, usually more quickly, than any language I’ve used. Then I try something more complicated and something bites me: scoping, some bizarre library interface, something.Report
The fact that Python doesn’t catch most typos until it stumbles over them makes it highly unproductive for me.Report
I include that under the broad topic of scoping.Report
I heard that the Post-it with the password to their server was inadvertently thrown out.Report
I feel like this is a “never ascribe to malice that which can be explained by stupidity” moment. The conspiracies were fun for a while but it looks like someone just did something really boneheaded.
But yeah, this raises questions about the integration of everything and how few platforms run it. Can you *imagine* if something like smart-home tech went down for 8 hours and no one could adjust their thermostats or turn on lights?Report
One of the many many reasons I have refused to be part of the Internet of Things. The very slight bump in convenience does not outweigh the huge vulnerabilities.Report
Yeah, there’s no way in heck I’d ever put most “stuff” on the internet. Fish that.Report