Launch-week catastrophe, the impact, and the recovery

The issue

This is not the kind of launch-week post I’d like to make. I received reports this morning (Friday, September 27th, 2019) that two users were having difficulties logging in. I asked for details. User A was logging in as User B somehow. This is obviously an incredibly severe issue whose bounds I needed to understand immediately.

Background / investigation

On Tuesday, September 24th, Bot Land launched. I had written a script to soft-reset people’s accounts, meaning they could keep their username, email, password, and superficial details, but not their items, in-game progress, etc. I tested this multiple times on PII-sanitized copies of the production database and everything went well.

On Launch Day though, everything did not go well. The script took ~10 times longer than it should have before I assumed that there was an unrecoverable error (I never actually verified whether this was true because I was deploying while live-streaming and I was already frantic to get the game online).

Instead, I rolled the database back about 3 days to when everything was working, reran my scripts, and Bot Land was up and running. Unfortunately for me, a catastrophic issue lurked beneath this rollback.

In Bot Land, users type their username and password and the client sends them over HTTPS to the server. The server salts and hashes your password, verifies it against what’s in the database, then returns an authentication token so that your username and password don’t have to be stored on the client. This token is a JWT, meaning it can be examined by anybody but only changed by someone with the password (i.e. me). The contents of a Bot Land token look roughly like this:

{
  id: 1234
}

That’s it. It’s just a user ID. Remember though, users can’t tamper with the payload without knowing the secret, so there’s no risk of a user specifying someone else’s ID in here.

User IDs in Bot Land are incrementing integers, meaning they will always be assigned in the order 1, 2, 3, and so on. For a game, it’s fine if you know that user ID #61306 exists; it doesn’t provide any personal information about that user.

So far, everything I’ve described is sound. Sure, you may argue that user IDs should have been UUIDs by default, but there isn’t a major problem yet.

The issue came with the database rollback. Suppose we had 5000 users before the rollback. Imagine a user whose ID is, say, 4993. Let’s say their username is Bob. Bob sets up autologin before the rollback, meaning when they get their token with {id: 4993} in it, they’ll store it locally for use later. After the rollback, an account with the ID 4993 may no longer exist, but Bob’s token representing that account is still in his browser somewhere.

When Bob loads Bot Land after launch, that token will automatically be provided to the server, only now it won’t refer to Bob. Bob has now logged into Carol’s account.

Impact

First of all, in this scenario, Bob would only be able to log in as Carol, a seemingly randomly selected user, but one that is never selected a second time, meaning Bob can’t log in as David afterward.

What information can Bob get about Carol? It’s only the information that Bot Land exposes. Ignoring the game information (like your blueprints, scripts, items, etc.), the only personal information that they’d have access to is Carol’s email address—not her password, payment information (which we never store anyway), or any other identifying information. Furthermore, Bob would only have access to that if he sought it out in the interface, e.g. via the “change email address” function in the game or attempting to verify the email address of the account.

Next, in order for Bob to do this, they needed to have autologin enabled during the 3-day period between the rollback and launch, then they needed to try logging in from the same device/setup during the 3-day period between launch and now. There are 106 accounts that had logins during both time periods, but autologin is a client-side setting, so we can’t tell how many accounts that number is narrowed down to. However, 36 of those were guest accounts, meaning there was no personal information on them whatsoever, so the maximum number of potentially leaked email addresses is 70. Removing spam accounts with obviously fake email addresses (like “asdhjkwhejkwhetkj@ewuitwetyuiw.com”), this brings it down to 59.

Furthermore, while the email address would have to be sent from the server to the client, we can’t know whether the client actually viewed the email address from any of the 59 potentially affected accounts.

Thus, it’s impossible to know exactly who was affected, but again, at most, your email address could have been seen by exactly one user. I will be reaching out to all 59 users with these details and an offer to restore some of their game progress.

Recovery

  • I deleted all 106 potentially affected accounts, including guests and spam accounts, meaning that there is no longer an issue with anyone’s email address leaking.
  • I updated the signing key for all JWTs so that any already-issued tokens will no longer be considered valid. This will require all users to type their credentials again the next time they log in.
  • I will update the signing key again in the future for any rollbacks that have to be done. Since Beta launched on November 7th, 2017, there has been one database reset in late 2018*, then the rollback and soft-reset that we did on Launch Day, September 24th, 2019. Rollbacks are almost always the result of some other catastrophic issue, meaning I can’t foresee future rollbacks, but they could happen.

*This particular reset was not susceptible to the issue described herein because autologin didn’t exist until 3 months later.

Thoughts

Bot Land has a total of two developers. It has automated tests. It has nearly 100k lines of code. We do our best to catch everything, but when something like this is found, I want to be as transparent as possible about what happened and what was done to fix this.

I apologize for the downtime, the lost game progress, and any other repercussions of leaking email addresses.

One thought on “Launch-week catastrophe, the impact, and the recovery”

Comments are closed.