Brad Wardell's site for talking about the customization of Windows.
Published on April 11, 2025 By Frogboy In GalCiv IV Dev Journals

You might’ve noticed we had a serious outage that took down our website, forums—everything.

Some people have speculated on what could be so catastrophic as to take us down for WEEKS? 

For legal reasons, I can't get into the details other than to say it wasn't ransomware but it was a total data loss. This catastrophe wiped out everything at our data center, including the on-site backups, so we lost over three decades of data in one hit.

Fortunately, we run nightly offsite backups, but they’re enormous—about 34 terabytes. That’s 34,000 gigabytes, all of which has to be downloaded, scanned, extracted, and then reuploaded to new servers. Just the download alone took over a week. Then came the challenge of figuring out which parts needed immediate restoring, in which order, and whether we should rebuild them piece by piece, create entirely new services, or move to a cloud-based infrastructure to avoid having a single colocation ever again.

We’re talking about a giant library of websites, databases, skins, themes, icons, wallpapers, videos, and more. Some of it’s ancient, from when we started well before Google or Facebook existed. Imagine sifting through tens of thousands of gigabytes to find a single legacy web service, built decades ago, that needs to run on a specific OS. It’s a painstaking process.

This outage has been extremely difficult. Everything from old box art for our products to OS/2 programs I wrote in my college days—gone, at least until offsite backups did their job. We had fallback backups on hard drives, DVDs, tapes, and so on, but for a while, it wasn’t entirely clear how much of that would be usable.

A fun fact some may not know: we have one of the oldest continuously used forums around, migrated from Usenet eons ago. That entire environment was wiped, so we’re rebuilding it from offsite storage. Not everything is back yet, and it looks like a few forum user accounts will be lost. That’s not related to customer data, but still worth noting.

We appreciate everyone’s patience. Getting services running again has been the top priority. It’s been a monumental effort, but we’re seeing real progress each day, and the community’s understanding means a lot.

Thanks for sticking with us,

-Brad (Founder & CEO)


Comments (Page 2)
3 Pages1 2 3 
on Apr 21, 2025

Sadly, that's not the whole box; just the top of it has survived, propped up on a bookshelf all this time...

on Apr 21, 2025

Knobula

Sadly, that's not the whole box; just the top of it has survived, propped up on a bookshelf all this time...

Not only do I have a complete OS2 in the box... I also have a boxed Windowblinds ... and a GalCiv ...

......and XtreeGold too ...

on Apr 21, 2025

twhiting9275


Quoting Frogboy,

If anyone wants to do the math on how long it takes to download 34,000 gigabytes at various speeds.....



If you're "downloading" this, you're doing it wrong

This only goes to show how backwards the mindset of SD is. While customers cannot access software because of pathetic choices, SD tries to make themselves look like the 'good guy'. You're not. You're not the victim. You are the big corporation that refused to properly back stuff up, and got taken down because of it.

 

Ah, a small person living a small life with a small brain hating on everyone who's better than him.  Which is probably everyone.  

on Apr 21, 2025

Stardock, I can imagine exactly what you are going through and don't envy you but I do sympathize.   A massive challenge and you seem to be doing great so far.

It's a horrible world out there with some vile people doing vile things and while we don't know what happened exactly we can imagine it was pretty nasty.  Keep up the good work and we'll remain patient!

on Apr 21, 2025

Wait, is the OS/2 *running* in a box?

on Apr 21, 2025

Knobula

Wait, is the OS/2 *running* in a box?

No, it's 'box' is the retail packaging.  Last time it ran anywhere was quite some time ago.

A bit like having Dos 6.22 on 5 1/4" floppies .... in a box ... [I have them too]

on Apr 22, 2025

Now that Stardock recovered from their total system outage, I decided to see if I could still sign in -- and I could -- even after not signing in for over 15 years! 

So, mark Stardock dredging up its history, I have a question.  Which version of Galactic Civilization started with a blond female news caster who's hair would change to black, depending upon your playstyle?

on Apr 22, 2025

Backups are good.  Testing restore procedures is even better...

Glad to hear that things are on the uptick, but man, that sucked.  I'm dying to know what happened.

on Apr 22, 2025

RAUBRY

Which version of Galactic Civilization started with a blond female news caster who's hair would change to black, depending upon your playstyle?

I remember that. I think it was the OS/2 version. IIRC, when you got really evil she also got an eye-patch.

on Apr 22, 2025

Publius of NV


Quoting RAUBRY,

Which version of Galactic Civilization started with a blond female news caster who's hair would change to black, depending upon your playstyle?



I remember that. I think it was the OS/2 version. IIRC, when you got really evil she also got an eye-patch.

That would be a feature to bring back maybe tie it to your warmonger rating as Four does not really have good vs evil

on Apr 23, 2025

Did a lot of tech support on various levels, and this is a perfect example of the first three rules of computing. 

Rule 1: Backup
Rule 2: BACKUP
Rule 3: See Rules 1 & 2

Additional advice, have multiple backups, including something offsite, you never know what will happen, and things can happen to entire buildings or more... 

Now for the stupid story from when I worked tech support for a particular company

Took the next call, and the caller informed me that live update wasn't working. I mentioned it was this morning, and there haven't been any notices of an outage yet, but if he could give me a moment I'd check.  Sure enough, it was out so I informed him. 
He asked if I knew what was wrong, to which I replied, "No, haven't even gotten internal notice of it yet, for as far as I know right now the servers are flooded, but as they're on the second floor of the building next to me, I really don't think that's likely." 
Sent off the internal queries/notices and about an hour later we got informed that a water pipe on the 3rd floor had broken and flooded the servers out... 
Ummm....  Talk about an unexpected mind blown incident.

on Apr 28, 2025

I feel your pain

I've had to restore from backups and its generally horribly painful. I'm already doing restore from disk and doing our existing 13TB database restore takes.... way too long. We even have to give reports on annual tests on how long the restore actually takes. 

If it makes you feel better, my friend used to work at IBM and a telco company lost the ENTIRE CUSTOMER DATABASE. All of it. It took them like an entire month to rebuild that thing from bubble gum, duct tape, with a dash of hope and dreams. I'm pretty sure they sacrificed some goats half way through. They were losing a LOT of money every single day that database was down.

on Apr 28, 2025

barasawa9144

Did a lot of tech support on various levels, and this is a perfect example of the first three rules of computing. 

Rule 1: Backup
Rule 2: BACKUP
Rule 3: See Rules 1 & 2

Its kind of getting better these days in that, before it was very very hard to justify the very very expensive backup costs. Especially now that data is exploding, backup infrastructure is getting more insane to even maintain. This then becomes a cost analysis of "how badly do you like your data for what circumstances". Its sort of funny that our old CIO when we were looking at colo sites, looked me dead in the eyes and asked "So what if someone takes an RPG to the back of this datacenter". I was thinking "I'm getting the hell out of here because I don't get paid enough to handle a guy with an RPG". Then like 2 years later, it wasn't a guy with an RPG, it was an electrician who took out both of our redundant independent power companies lines, the backup diesel generators, and then the battery backup power, to then then power cycle the power company lines over and over again so the datacenter looked like it was hosting a rave. I would have preferred the guy with the RPG..... But anyway its more about how much do yuou want to pay for the super ultra edge cases for backups? And people really don't want to pay for stuff if it has very low probability. Well its low probability until its not. And you can do everything right, up until the electrician starts a rave in your datacenter and shorts everything.

One thing we're seeing now is that ransomware's first target isn't actually data, but the backups. They hit the backup infrastructure first, lock it out, then lock the real data. In essence they encrypt or corrupt the backups first, then ransomware the data so you can't restore from backup once you find out the production data is locked out. As such we're getting a lot more requirements for immutable backups.  Its sort of interesting to see the backup guys getting way more budget and clout than they're used to.

on May 01, 2025

barasawa9144

Did a lot of tech support on various levels, and this is a perfect example of the first three rules of computing. 

Rule 1: Backup
Rule 2: BACKUP
Rule 3: See Rules 1 & 2

Additional advice, have multiple backups, including something offsite, you never know what will happen, and things can happen to entire buildings or more... 

Now for the stupid story from when I worked tech support for a particular company

Took the next call, and the caller informed me that live update wasn't working. I mentioned it was this morning, and there haven't been any notices of an outage yet, but if he could give me a moment I'd check.  Sure enough, it was out so I informed him. 
He asked if I knew what was wrong, to which I replied, "No, haven't even gotten internal notice of it yet, for as far as I know right now the servers are flooded, but as they're on the second floor of the building next to me, I really don't think that's likely." 
Sent off the internal queries/notices and about an hour later we got informed that a water pipe on the 3rd floor had broken and flooded the servers out... 
Ummm....  Talk about an unexpected mind blown incident.

 

Yep.  We had backups on site.  We had backups off site.  It was restoring from the offsite backups that took so long.  Just so big. 34TB is no joke.

on May 04, 2025

Frogboy
Yep.  We had backups on site.  We had backups off site.  It was restoring from the offsite backups that took so long.  Just so big. 34TB is no joke.

Most of our off site backups are in IronMountain. I think the last time anyone actually truly needed something from there it  took like 2-3 days just to find the damn tape to restore from. 

3 Pages1 2 3