On June 28th, my VPS started to experience major packet loss.  So I did what anyone would do and opened a support ticket with my service provider.

A few days, I still recieved no replies.  All while my server was experiencing anywhere from between 5% to 95% packet loss.

Finally on July 3rd, I receive a friendly update advising me that my issue has been corrected.  To my surprise I was not able to SSH into my machine.  After seeing a weird error message on my console regarding CPU LOCKS and read-only file systems, I rebooted my VM per my providers posted recommendations.

Packet Loss Graph

Packet Loss Graph

Story3Story1Story2

Upon reboot, my server was dead and it’s been that since then.  Over the next few days, I will add to my story so everyone can understand how important it is to host your own things.  Luckily for me , this is just a personal server used by my friends and family.

Currently, I’ve been waiting 7 days for support to pick up and/or answer my ticket.  I’ve made multiple attempts to contact the company over the phone, using social media and event via their own ticketing support process.  So far, I’ve yet to get any reply beyond “your ticket is in the queue”.  I know I’m only one of many.  So I’ve created this page in the hopes that others may share their story and be forewarned of that kind of outages to expect.

Story4

I encourage you to  scan social media websites if you need any extra confirmation.

Story5

Story6


UPDATE:  07/11/2014 @ 6:00PM

I’ve been experiencing full system outages for many days now.  One system for 8 full days and the other 4 full days.  It’s very discouraging that no one from either Fibernetics or Cloud At Cost seems to want to fix my issue.  I guess I’m just a low guy on the totem pole.  In fact Cloud At Cost has banned me from messaging them on Facebook.

Story13

Story12

Story11

Load Averages

Luckily, I found some screenshots from the console of one of my servers.  Before I rebooted it was responding to ICMP.  However after a reboot, I started getting the same console connect error message.

Story9 Story10

Story14

I know it’s basically pointless to even try to expect an update over a weekend, when they couldn’t bother to update me all week long.  However I’ve got my fingers crossed that someone somewhere would put out a public update.  I’m starting to think there are thousands of VMs down.  It’s funny that the $35 dollar VM isn’t having the trouble.  Just my higher end ones.  Oh and in case your wondering, those NAGIOS graphs you’re seeing are coming from Cloud At Costs (CloudAtCost) own network.  Funny how things work!

I’ve also started to get some fan art.  So I’m created a new section to showcase it.  If you create something or are interested in getting in touch with me.  You can find me on Facebook, Twitter, or email me at xphox (at) xphox (dot) net.


UPDATE:  07/13/2014 @ 7:30PM

One of my VMs came back online late Friday.  Sorry I wasn’t able to update this page sooner.  It came up one hour after my last update.  The “good?” news at this point in time, is that I only have 1 more VM down.  It’s a DEV2 package.  I’m assuming they are working the servers by order from higher spec to lower spec. I still have not received any update in my actual support ticket, however the company has finally send out an official email.  It’s a shame they waited over a week to send out a communication to their customers.  It’s a little too late in my opinion, but I’m glad they are starting to talk to us customers.  Let’s hope this isn’t the last email they send out and get the drift that we’d like to be keep in the loop.

For those interested, here is the update they sent out earlier this afternoon.

Story18

They are stating that they had 2000 VMs impacted.  Restore time can take up to 20 minutes to migrate the server onto their new disk array.   Basically confirming everyone’s assumptions that they completely oversold what their support staff could handle.  They have a lot of work cut ahead of themselves.  I’m hoping my last VM doesn’t take as long to come back online.

Story17

 By the way… we’ve got over 10,000 unique visitors to this webpage in the last 48 hours.  Thanks to everyone for your suggestions and comments down below.  Please keep your hosting provider suggestions coming.  Once I get some extra time, I’ll compile a list of my top few and post them on a different page.


UPDATE:  07/14/2014 @ 5:30PM

My server that was back online has crashed two times since 4:45PM today so far.  So now I’m back to 2 of 3 of my VMs being down.  The good news is today, I received an update into my actual ticket.  Though, it was just the bulletin they posted publicly.  I also received a direct twitter message from Fibernetics pointing me to the same webpage.  An a for effort!

Story19

So they are starting to answer people, good!  However something tells me we are still going to be a few days or a week away until we get some system stabilities.  My system load average is spiking though the roof and I’m not even running anything on the box.

We should start a pool to take bets on when this will actually get resolved.


UPDATE:  07/15/2014 @ 7:30PM

So far after two reboots, my server has remained stable.  I even got another update into my support ticket.  This time it wasn’t from the same guy Antonio and instead from James (L3 Support Engineer).  James is suggesting I might be impacted by a known bug (https://bugzilla.redhat.com/show_bug.cgi?id=979901).  I’ll have to investigate this a bit more.  I just know that this has never happened prior to the outages.

All in all, ticket updates seem to be coming in anywhere between 12 to 15 hours on average for me now.

For those still following, here are some updated graphs.  The 1st one shows the stability of my VM which was been recovered.  The 2nd shows my VM which is still currently impacted.

RECOVERED VM:

Story20

IMPACTED VM:

Story21

My 1st VM recovery time took almost 8 full days.  Let’s see how long this 2nd VM takes.  I’m betting the difference is that this one is a DEV2 server whereas the other one is a DEV3.


UPDATE:  07/15/2014 @ 7:30PM

My last VM came back online yesterday.  I was alerted by the technician closing my ticket.  Call me crazy but you’d think that after a  support ticket’s age reaches 14 days old, that they might actually let you confirm the issue was properly resolved first.

Regardless the total outage duration for 2 of my 3 VMs are as follows:

VM1 (DEV3):  Approximately 8 Days, 6 Hours

VM2 (DEV2):  Approximately 10 Days, 14 Hours

** VMs crashed days apart.

Not to mention that this company has unfortunately caused a bit of more drama with claims that they have ripped off Digital Ocean’s help articles.  Adding to the continued outage that is happening, it’s a surprise users are still signing up for new service.

Hopefully this is the last you’ll hear from me on this subject.

firecloud

Story23

firecloud