Brad Wardell's site for talking about the customization of Windows.
Published on April 11, 2025 By Frogboy In GalCiv IV Dev Journals

You might’ve noticed we had a serious outage that took down our website, forums—everything.

Some people have speculated on what could be so catastrophic as to take us down for WEEKS? 

For legal reasons, I can't get into the details other than to say it wasn't ransomware but it was a total data loss. This catastrophe wiped out everything at our data center, including the on-site backups, so we lost over three decades of data in one hit.

Fortunately, we run nightly offsite backups, but they’re enormous—about 34 terabytes. That’s 34,000 gigabytes, all of which has to be downloaded, scanned, extracted, and then reuploaded to new servers. Just the download alone took over a week. Then came the challenge of figuring out which parts needed immediate restoring, in which order, and whether we should rebuild them piece by piece, create entirely new services, or move to a cloud-based infrastructure to avoid having a single colocation ever again.

We’re talking about a giant library of websites, databases, skins, themes, icons, wallpapers, videos, and more. Some of it’s ancient, from when we started well before Google or Facebook existed. Imagine sifting through tens of thousands of gigabytes to find a single legacy web service, built decades ago, that needs to run on a specific OS. It’s a painstaking process.

This outage has been extremely difficult. Everything from old box art for our products to OS/2 programs I wrote in my college days—gone, at least until offsite backups did their job. We had fallback backups on hard drives, DVDs, tapes, and so on, but for a while, it wasn’t entirely clear how much of that would be usable.

A fun fact some may not know: we have one of the oldest continuously used forums around, migrated from Usenet eons ago. That entire environment was wiped, so we’re rebuilding it from offsite storage. Not everything is back yet, and it looks like a few forum user accounts will be lost. That’s not related to customer data, but still worth noting.

We appreciate everyone’s patience. Getting services running again has been the top priority. It’s been a monumental effort, but we’re seeing real progress each day, and the community’s understanding means a lot.

Thanks for sticking with us,

-Brad (Founder & CEO)


Comments (Page 3)
3 Pages1 2 3 
on May 04, 2025

Yeah, there are always "issues". 
Some common backup failures I've seen is places that only keep one or two, or they keep reusing the same tapes over and over, not realizing those things are only reliable for a couple of reuses, and never seem to test the backups. 
The worst situation I ran into was 9/11.  A good 2/3rds of our support people just didn't come in for the week after that, and those of us who did had a really weird week. 
Turns out a LOT of companies had their servers in those two buildings, and now their IT guys that were like those of us who came in, were trying to rebuild their companies entire network. Everybody on both sides were rather subdued, and we didn't get any of the usual "consultants" or secretaries calling. 
Yes, plenty of "bosses" will screw stuff up, and then have the secretary try to work with tech support to fix it, all the while demanding unreasonable conditions or they'll be fired.  Most common are being unreachable while never having given the secretary the password to the laptop, or key to the server room, or whatever they need as a minimum to do anything with it.  To make matters worse, most of those secretaries known nothing about computers. Being able to use the suite of office tools is great for their job, but if you need file properties, or checking access rights, or even worse yet, have to go into a partition table or the like, they are NOT skilled in that, nor is it needed for their job, that's what IT is for. After all, you don't expect the bus driver to be able to overhaul the engine of the bus, and giving them threats of "do it or else" does not help anything.

Ok, I've babbled a wall of text more than enough.  Glad people have been reading, and other techies can sympathize/empathize, and the non-techies can get an inkling of the invisible work that goes on every day where IT fixes things without others known, and generally keep things working smooth, yet only being noticed when something too big or too bizarre to not cause disruption to the rest of the company unexpectedly happens. 

Oh, as to the various malwares that encrypt stuff so they can demand ransoms, yeah, they're getting better at being aholes, but there are some really good devs on our side. If it was ever intended to be decryptable, there's a key somewhere and the devs can eventually figure out how to get their digital hands on it to save the data. But with each new one, it's a race to find it before somebody gives up and pays off the scum with the hope they'll actually give them the decryption key. (I hear it's about 50/50 if you pay them off).  As to screwing the backups first, that's a big reason keeping older backups around is a good idea. Sure those backups services or tapes aren't exactly cheap, nor is loosing six months of data or whatever, but it's cheaper than losing all your data. 

Sorry, slightly more gab. As to the public impression that the creeps making the malware are talented, it's completely false. Destroying things is easy, just ask a car mechanic about how mechanically talented a vandal with a sledge hammer is. Most of the malware producers are just scriptkiddies or slightly better. Some aren't even that as there are "tools" to make you own malware in which you select options and it compiles it for you. Most of the malware out there is also just a modified copy of another one. Unfortunately, there are a small handful of skilled programmers that write malware, some of those are simply researchers and the malware got loose one way or another, but some have malicious intent. Those are the ones the antivirus/antimalware people hate and have difficulty with. We still win in the end, it's just how much time and effort it takes before we do. I know one dev that can usually crack most things in an hour or two, that probably took the malware creator at least 6 months to write. When it came to something a skilled malware writer made that changed the malware paradigm again, was apparently around 4-6 years, but out dev cracked it in a month and a half. Of course, there are hundreds of people working to defeat malware, but there are thousands upon thousands of people writing that trash to start with. 
If your curious, yes, the company I worked at fighting malware, among other tasks, has an "antivirus lab".  We would test ways to prevent and defeat malware in that lab. One thing outsiders don't seem to understand is that it was an electronic black hole. If anything that could store data went in, no matter what it was or who it was, it stayed in their forever. More than a few people were upset when they didn't pay attention and lost their thumb drives and other devices, even laptops. Though admittedly the lab was happy to get more device "donations".

Ok. I'm shutting up now.

on May 05, 2025

All this made me wonder so I Googled this.

Google has an estimated storage capacity in the range of 10-15 exabytes, which is equivalent to 10 to 15 million terabytes.

I was sad to see the sites go down.

 

on May 05, 2025


Google has an estimated storage capacity in the range of 10-15 exabytes, which is equivalent to 10 to 15 million terabytes.

I think they have more than one data center going, though. In fact, they have regional data centers and clouds. https://datacenters.google/

Trying to restore anything that size would be incredible, otherwise.

3 Pages1 2 3