Why Servers Crashing is a Good Thing

AirMech Logo

If you've been playing (or trying to play) AirMech the past week and weekend you probably noticed we've had a lot of server downtime and restarts. As a gamer, when that happens to me with a game I want to play nothing makes me more upset. So I totally understand why people are either mad or frustrated with the game lately. But I want to take a minute to explain why this is actually a good thing.

As a super small development team, we built the backend systems for AirMech "as needed". Our programmers have experience working on everything from small projects to massive MMOs, and the plan from the start was to build things in a way to get them up and working, even if you know it might fail as it scales up. Everything fails as it scales up in fact, but the more grand your plans the more time it takes to build, and as devs we've had the experience of building grand things which never end up getting used. So AirMech was very purpose built from the start, which is why we released very early versions of it and involved the community directly in the development process.

The server structure we have been using all this time has been great because we can actually replace and restart servers without people noticing. Great for active development, but also masked some problems. Ever seen the bug in chat where you can't see what you type, but everyone else can? Little things like that are known issues, and not simple fixes without rebuilding everything. In reality, we've been restarting the servers quite a bit, just most of the time players don't notice. But that can't continue forever.

With AirMech having grown as big as it has, we could now see points of failure coming up on the horizon. Combined with our existing bugs, we knew it was time to restructure things. Scary words like "sharding" came up, and grand plans to take AirMech to the next level were made. It's been months of work to get to this point, and we started rolling out the new servers.

Then things started crashing.

Now you might say "just put the old servers back", but that doesn't help us move forward. Even with the new servers, we could just restart them, but that fixes no problems. So when something crashes or locks, we need to actively debug what caused the problem as quickly as possible, then restart the server and work on a fix. When the fix has been made, we push out new servers, restart everything, and wait for it to crash again.

The good news is we are isolating and fixing a ton of bugs that have been around for a long time, and are more likely to be triggered with the new server structure. So things are actually getting fixed! The bad news is we are not at the bottom of the barrel of bugs yet. We're working around the clock to fix them as we catch them.

I hope this explains a bit more about what is going on, and why extended downtime is actually better than quick blips and restarts. This is why we still have the Beta label--we wouldn't want to do this to the game post Beta if possible. We really appreciate your support and patience, and if we have caused you a lot of problems we sincerely apologize. Please reach out to us through (support@carbongames.com) or on the forums if there is something we can do to help.

Come yell at the devs in the forums! Yes it's ironic that if the login server is down you can't log into the forums, but we promise things will be up as much as we can manage, and these issues will be solved soon.

James Green

Nov 25, 2013. James Green