The not-so-nice story of a VPS.net outage
- September 9th, 2011
- Posted in IT
- Write comment
This post is about a bad experience I had recently with my previous cloud provider, VPS.net. I want to make this very clear: this is NOT meant to complain or whine, but rather to tell this funny story and make people who are thinking to use this (and other?) service providers aware of what they are accepting.
Truth to be told, I’ve been an happy customer of VPS.net for almost two years: cheap service, acceptable availability, quick and reactive support were the key ingredients of my illusional satisfaction.
My resource was a small virtual private server, where this blog was hosted along with few websites I maintain and my private stuff (mail, svn, backups, small services, things i need to access from everywhere, etc.), nothing critical, no big deal. Actually pretty cool. Unfortunately, as someone say, good things rarely last.
For those who won’t have time to read below, long story short: one or more of their SAN units failed, they had a service outage for 52h+, and in the restoration process they screwed up my server’s filesystem data (and I can bet also those of many other customers). This post illustrates the storyline showing my communications with VPS.net support service and vpsnet on twitter.
I know this whole cloud thing is all about money: if that comes cheap, you cannot expect the moon. What i did expect instead, was a minimum level of quality of service. But let me tell you the story from few facts / tweets / email exchanged with VPS.net customer service. Timing is referred to my local timezone (GMT+1).
Sep, 1st – 12:11 – I found out my server to be down. I sent an email to the support service, because I was not able to reboot the machine from the control panel (which by the way is something horrible i tried to use as rarely as possible).
Sep, 1st – 12:12 – Quick reply from VPS.net support. ONE freaking minute.
Awesome, isn’t it? Here it is:

Sep, 1st – 12:12 – I immediately tweeted about that

Sep, 1st – 12:25 – A status update is sent via email by VPS.net to customers with resources hosted in their London “Zone C”, with subject “LON-C SAN Update”
Dear Customer, We are very much aware of the downtime on your VPS currently. We do apologise for this and we are aware that is it during core business hours. Our Level 3 technicians are currently working on restoring your VPS as quickly as possible. We do appreciate your patient and will be sending out a full RFO later on today once everyone is restored.
Sep, 1st – 16:30 (almost) My machine was again running at some point but with read-only mounted filesystem. After 4 hours, I replied to another customer’s tweet, which said his machine was back online but restored from a 48h+ old snapshot. Few minutes later, VPS.net replied. (tweets below are backwards with respect to timeline)

Sep, 1st – 22:02 – Another email from VPS.net
Dear Customer, We have noticed another outage with the SAN that you are hosted on. As such we are powering down all the VM's attached to this storage array so we can do indepth diagnostics. Please can we ask that you see http://status.vps.net/2011/09/london-c-san5/ During this time all VM's will be down. We will provide updated once we have resolved the hardware issue
“Anything will be ok, the’re working on it…” < last famous words.
Sep, 2nd - 12.00~ - Another tweet to ask for restoration forecasts:

No reply. Last status update on their “status” blog was really old.
Sep, 2nd – 13:28 – I decide to write another email to the support, again.
Emails are managed by the support through a ticket system, which you can also access from a web interface where I took the pretty screen below.

Sep, 2nd – 13:32 – Another pointless status update via email from VPS.net
Dear Customer, There currently is an issue with the SAN that hosts your VPS. Please see http://status.vps.net/2011/09/london-san5-issue/ for updates.
Sep, 2nd – 13:40 – Someone from VPS.net support replied me.

Sep, 2nd – 13:44 – Followed by…

I also asked for an estimation of the time required for those issues to be solved:

…and I’ve been told:

I thought that was ok, I mean, shit happens… it might take time to fix things up.
Sep, 3rd – 16:12 – 26 hours later I send another email to the support to get updates (of course, there were no useful status updates on the suggested page).

VPS.net support replied:

Sep, 3rd – 16:33 – Few minutes later, another email from VPS.net support:

Thank ghosts in the machines, I thought… but…
Sep, 3rd – Few minutes later i tweeted this:

What was that about? Well, apache was complaining permission-denied for a 777 empty folder,
dynamic libs were corrupted, dpkg status file contained strange chunks of binary files, etc..
And these were the first clues…
Shall I add something else? well maybe this:
root@~:/$ find /lost+found/ | wc -l 10730
Sep, 3rd – 17:11 – I notified the damage to the support, telling i was going to investigate a little bit further.

And they replied something really useful:

Sep, 3rd – 19:00 – Again:

Sep, 3rd – 20:07 – And again ..

Sep, 3rd – 20:28 – And again …

Got it, right? My system was completely screwed.
And here we go:
Sep, 3rd – 20:29 – Finally VPS.net support replied:

“let us know the ssh port and root password” ?! This is when I lost my cool, are you f$@#*n’kidding me?
Sep, 3rd – 20:29 and few seconds…

Sep, 3rd – 20:44 – My reply to the support

While writing, I was thinking “Keep calm, it’s their business, they will certainly have a backup, their SANs will be redunded”… < no way
Sep, 3rd – 20:55

I should not even comment on this one… “I see you have not used the some kind of our backup service” < WHAT THE HELL! It's been YOU having a fault!
Sep, 3rd – 21:17

At this point I really was expecting something. I mean, I’ve been using this company services for two years, they always tried to satisfy customers with small things which are more or less irrelevant but still good to have for free and so on. And now, with this relatively serious issue, the’re certainly going to provide a way out of this mess. I didn’t know what, maybe some free temporary nodes to restore a working system, some incentives or whatever. Instead…
Sep, 3rd – 21:30

“because you did not have any enabled backups with us” ?! no comment…
Sep, 3rd – 21:57

You get where this is going, right?
Sep, 3rd – 22:05

And that’s it. My happiness with VPS.net was over.
Now let’s have a look to the term of services (ToS) of VPS.net…

Focus on “DOES NOT WARRANT THAT THE SERVICES WILL BE … ERROR-FREE”. So basically they are saying that they can screw up anything in ANY way. And that would be still OK. Anything clear now? Lesson learned.
I switched to another provider, that I found out to be actually cheaper and with nicer ToS.
I didn’t write anything about this thing for a week. VPS.net served quite well in my experience but this has been unacceptable. I’ve been with VPS.net for 2 years, and honestly this incident has been really disappointing. At least, i would expect an apologize and some kind of compensation, offered to customers for the effort that will cost them to clean their mess.
Still, some follow up today on twitter, when yoast published this post. I could not hold back the reply, and this is what followed.
(remember to read backwards: last post is the first in chronological order).


First of all, WOW!, thanks a lot, a credit, I’m the luckiest guy in the world. That means 20$, which is the price of their smaller node per month. Which is definitively an order of magnitude lower then the value of my time required to fix their mess.
Ok, ok, that would still not be that bad as a symbolical thing… but why now? because I said something on twitter about you not giving a shit about your customers? That sounds like “take this cash, and plz shut up”. I don’t think so.
You know what I think you should do?
1) Apologize with your customers. Sincerely. I saw your ToS, I know you don’t *have to*, but it’s really lame have a SAN that can fail without having a backup, and I think(hope?) you know that. Paid backup services are meant for when customers themselves screw things up. Not for when you do that as a service provider.
2) Yes, the free credits are ok, but don’t give them to those who complain, give them to everyone who experienced the outage, e.g., for more than 24h. And give more to those who had nodes restored from older snapshots. And even more to those who had their data corrupted and their systems gone. That might cost you something, but I think it would show that at least you care and in the long term it’s better than seeing your customers fleeing away.
3) Find a way to grant that this kind of incident will NEVER happen again. Add redundancy. It’s not THAT expensive, hire some skilled engineers and do that. And put that in your ToS. They’re kind of a shame right now. Do not provide backups to those who do not pay for backups. But keep them anyway, so if YOU screw things up, you can still save your customers.
I will think more about this but I don’t see how I can use your service “as-it-is” for the use I planned. I remember a cool thing from the first time I saw your homepage, with all the fancy little robots and something like “self-healing-infrastructure”. Yeah, where’s that “self-healing” thing now?
One last thing. I don’t know if you make money on a lot of small customers or if the most part of your profits come from few big customers. What I can say however, is that if I thought about the chance to use your services for a serious business (and I actually did), now that chance is gone, because you don’t look that serious anymore. And I think that is one of the most important values for a service provider.
After I wrote this post, I received a number of requests from other (ex-) VPS.net customers asking for which provider I did switched to. At the same time, other providers did present their offers for a better service. Well, long story short I found out the following:
Nice providers in terms of pricing/offer:
Linode.com
Dediserve.com
iCloudHosting
I did have a look at their ToS and unfortunately, it seems that the “I am not responsible if my own service fails” is widespread in there. E.g., from the ToS of dediserve.com:
“ Disclaimer of Warranties
DEDISERVE DOES NOT WARRANT OR REPRESENT THAT THE SERVICES WILL BE UNINTERRUPTED, ERROR-FREE, OR COMPLETELY SECURE. TO THE EXTENT PERMITTED BY APPLICABLE LAW DEDISERVE DISCLAIMS ANY AND ALL WARRANTIES INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. TO THE EXTENT PERMITTED BY APPLICABLE LAW, ALL SERVICES ARE PROVIDED ON AN ‘AS IS’ BASIS. ”
Did you noticed anything? It is EXACTLY the same paragraph which is on VPS.net ToS. Well I guess that’s just copy ’n paste from a template…
Linode.com ToS are sightly better: they do provide guarantees on network connectivity and hardware uptime and provide credits by ToS if something goes wrong. Still “ All services provided by Linode.com are available as is, without warranty.”. At least they did put some effort in writing their ToS. However, I asked to some customers about linode.com services, and they all seemed really satisfied with them.
Another mention, as I said, goes to iCloudHosting, which today contacted many of the people complaining about crappy VPS services via twitter. I never heard about iCloudHosting but it seems to have a nice pricing. I had a look at their ToS and the following discussion was the result.

Basically, they inserted the following in their Terms of service:
“ Keeping Your Data Safe
We will keep your data safe by providing a high quality infrastructure. We will store cloud server data on a SAN, which will have RAID protection. Additionally, we will ensure this SAN is replicated in real time to another SAN. For clients who have a backup service from us, we will also take a further copy of their data and store it on a RAID-protected NAS. In the event that any storage component of our infrastructure fails, we will endeavour to repair or replace it as quickly as possible without causing data loss. ”
I am not a legal expert and I don’t think it is really clear what will actually happend in case a failure is experienced. With this I mean just that I see no warranties on this cool RAID + additional replication (+ paid optional additional backup) system.
However, although the deploy of applications with strong requirements should require more digging into the implications of this statement, I think its inclusion in their ToS is a remarkable sign that they actually thought about the problem and do have a reasonably safe procedure to prevent customers’ data losses.
In conclusion, I don’t have enough experience with any of these providers to tell which one is better than the other, but I’ll feel like giving Linode.com and iCloudHosting a chance as soon as I’ll need a new cloud-based vps for something.



pff That took a lot of time and effort to compile…I’m not sure I have the stamina or the time to do the same with my own experience with them…shows me once more to me that @Yoast should be more careful who he condones…
Seems like mismanagement to me. Of course one can hide behind it’s TOS, that’s easy, but customers don’t come cheap so from commercial point of view one should do everything reasonable to help and keep their customers. Now they don’t only lose lots of customers but things become even worse because of posts like this.
I’ve been lucky that I only have been a customer for two days: http://sangatpedas.com/vps-net-review/ For me the main reason was the (lack of) communication, over promising and not being able to execute a simple wordpress migration.
@Remco indeed. I’m going to wait for @vpsnet and @yoast comments by the way.
I agree on your last 3 points, but, from your end, not enabling backups though is not smart, to be honest. I will never trust any single host with my backups, so I always have offsite backups too…
Yoast’s interview showed they are learning and going to make things better. Management team is posting comments to talk to them. But, looks like Management folks don’t communicate to support folks.
ATL-D scheduled down time 2 Hours. Now is 4 hours. Reply from Support
“We are going to start all affected VPSes, your VPS will be started as soon as possible.”
And talking about their backup services
http://share.weppos.net/snaps/skitched-20110910-124105.png
http://share.weppos.net/snaps/skitched-20110910-124209.png
It should be a daily backup. The latest backup is August 30.
When I checked yesterday, it was full of failed backups. They should have cleaned the stale records this morning during the Atlanta maintenance.
PS. I’m on the Atlanta-D cloud. Issues started before London and they are still going on. My servers have been unresponsible since 3 days, and I cannot even run a database export to an other provider because the file system access is extremely slow…
@Joost de Valk I just don’t get your point.
I never said I’d not suggest backups. Backups are always critical. I just don’t want the paid backup service from the same provider which hosts my VPS.
I also don’t trust hosts that much, that’s why I DO also have offsite backups (and by the way, with “offsite” I mean not in any of the same provider infrastructure).
So why would that be “not smart” ?
That service is useful for the customer.
I mean, if as a customer I screw up things myself, or I misconfigure something, or I have a breach, then it’s my fault and I could benefit from those backups.
But otherwise, what’s the point?
Should I pay my own provider to keep backups in order to be safe in case the same provider is not capable of keeping my resource integer or safe?
I think this just doesn’t make sense, I hope I misunderstood your comment.
http://www.nuorderwebs.co.uk/blog/general-info/recent-vps-net-problems-and-my-email-to-management/
My similar experiences, I am in the process of moving my important sites
@Joost de Valk
Always?
Good post, must have taken a lot of time to compile. We’ve been going through the same thing. The vps.net guys are a joke. I’ll be so glad when we have everything moved off their platform.
Something similar is happening to me over the last few days. My disk has been mounting in read-only mode, and I now have to reboot the VPS every day to keep Apache going.
Basically instead of wasting time dealing with their tech support, I’ve already signed up for hosting elsewhere.
Like many here I went to VPS based on the reco from Yoast.
I wish i could take a contrary view and talk about how great VPS has been. Sadly no, Poor performance, poor support, poor everything.
Another dissatisfied VPS customer. I would love to find a good webhost but I don’t think there is such a thing, so I’ll settle for decent if anyone can find them.