Back to overview
Downtime

Network unavailable

Feb 17 at 05:30am CET
Affected services
API v2
Homepage

Resolved
Feb 17 at 09:15am CET

Today, at around 04:30 UTC, the Server started to no longer allow players to join. This issue persisted until around 08:15 UTC.

The issues started at around 23:25 UTC on the 16/02/2023 by one server creating hundreds of gigabytes of logs. This on its own wouldn't have been an issue, but as it started late at night and the alerts for high disk storage wouldn't fire until 03:40 UTC.

A full disk can be a big issue for the server, especially if it’s using every last byte. Now, what exactly happened on Greev that caused a full disk to result in the whole network being unavailable?
With the disk being full at around 04:00 UTC, it created some more errors in the console but didn't result in too many issues. The real problems started at 04:30 UTC, which is the time many of our non-dynamic servers, like KnockPVP or FastBridge, started to do their daily restarts.
Together with their daily restarts, the servers also reset their worlds, some plugins, and, most importantly, Spigot tries to store the current settings again.
With there being no storage left on the disk, they couldn't just save their settings, which resulted in corrupted files.
In addition to that, plugins couldn't even connect to the database anymore; this was also an issue of the file system being full with no more bytes to spare.
As connections on Linux are handled as "files" no new connections / "files" could be created, causing even more issues.
These corrupted files, together with there being issues with loading plugins after they couldn't connect to the database anymore, resulted in the servers not starting up anymore, and therefore players were unable to play any more games. And with the lobby restarting at 04:30 as well, the whole network went dark.

I got to know about this issue at around 07:00 UTC, just after I woke up. When I started to investigate what the issue was, I quickly noticed that just restarting the affected servers resulted in no fix.
So I dug deeper and found out that no more space was available, quickly finding the log file with over 500 GB of logs.
Subsequently to deleting the file, I quickly discovered that there were still issues on some servers.
These servers were affected by corrupted files from the plugins and the servers. I had to load some files from backups, which I could do quite quickly. First, I thought it was just some plugin files with issues, but after finding out that some of the servers were still having issues after booting up, I had to do some more digging.
On some of the servers, I didn't have a rank, and all my stats were gone. So I had to also check the integrity of the database; fortunately, no issues were found on the database side. The issue was a file that got reset by the server software after rebooting, probably with the intent of updating it with the latest updates, but with there being no more space, it couldn't do so, so it just rest the file.
With all of that out of the way, some servers were still stuck in a reboot loop, even with all plugins and configurations being fixed. That last issue was because of the so-called "usercache" file the server creates. It should be a JSON file and cache some data about the players. But with the server unable being to create the whole file, it created an empty file without the correct JSON syntax. With such a broken usercache file, the server tries to load the data out of it but keeps crashing because it couldn't load the file. By just deleting the file, this was fixed.

Now that everything has been restored to its original working state, it is expected that there will be no further complications.

Updated
Feb 17 at 08:25am CET

The issue has been found it we are fixing it at the moment.

Updated
Feb 17 at 08:00am CET

We got notified by users which couldn't join the server anymore and started to investigate the issues immediately.

Created
Feb 17 at 05:30am CET

Users are no longer able to join the server.