This post is about a bad experience I had recently with my previous cloud provider, VPS.net. I want to make this very clear: this is NOT meant to complain or whine, but rather to tell this funny story and make people who are thinking to use this (and other?) service providers aware of what they are accepting.

Truth to be told, I’ve been an happy customer of VPS.net for almost two years: cheap service, acceptable availability, quick and reactive support were the key ingredients of my illusional satisfaction.

My resource was a small virtual private server, where this blog was hosted along with few websites I maintain and my private stuff (mail, svn, backups, small services, things i need to access from everywhere, etc.), nothing critical, no big deal. Actually pretty cool. Unfortunately, as someone say, good things rarely last.

For those who won’t have time to read below, long story short: one or more of their SAN units failed, they had a service outage for 52h+, and in the restoration process they screwed up my server’s filesystem data (and I can bet also those of many other customers). This post illustrates the storyline showing my communications with VPS.net support service and vpsnet on twitter.

I know this whole cloud thing is all about money: if that comes cheap, you cannot expect the moon. What i did expect instead, was a minimum level of quality of service. But let me tell you the story from few facts / tweets / email exchanged with VPS.net customer service. Timing is referred to my local timezone (GMT+1).

Once upon a time

Sep, 1st – 12:11 – I found out my server to be down. I sent an email to the support service, because I was not able to reboot the machine from the control panel (which by the way is something horrible i tried to use as rarely as possible).

Sep, 1st – 12:12 – Quick reply from VPS.net support. ONE freaking minute.
Awesome, isn’t it? Here it is:

<br />
We apologize for the inconveniences, but we had SAN issue in thia cloud, the issue has been resolved and your VM has been placed to startup queue. Please wait a little, your server will be started soon. -- Best regards, Pavel Voronov - Support engineer - VPS.NET

Sep, 1st – 12:12 – I immediately tweeted about that

vpsnet support just replied to problem-reporting email in about one minute: cool.. let's see how long will it take to fix the problem now

Sep, 1st – 12:25 – A status update is sent via email by VPS.net to customers with resources hosted in their London “Zone C”, with subject “LON-C SAN Update”

Dear Customer,
 We are very much aware of the downtime on your VPS currently. We do
apologise for this and we are aware that is it during core business
hours. Our Level 3 technicians are currently working on restoring your
VPS as quickly as possible. We do appreciate your patient and will be
sending out a full RFO later on today once everyone is restored.

Sep, 1st – 16:30 (almost) My machine was again running at some point but with read-only mounted filesystem. After 4 hours, I replied to another customer’s tweet, which said his machine was back online but restored from a 48h+ old snapshot. Few minutes later, VPS.net replied. (tweets below are backwards with respect to timeline)

Sep, 1st – 22:02 – Another email from VPS.net

Dear Customer,
We have noticed another outage with the SAN that you
are hosted on. As such we are powering down all the
VM's attached to this storage array so we can do indepth
diagnostics.
Please can we ask that you see 

http://status.vps.net/2011/09/london-c-san5/

During this time all VM's will be down.
We will provide updated once we have resolved the hardware issue

“Anything will be ok, the’re working on it…” < last famous words.

Sep, 2nd - 12.00~ - Another tweet to ask for restoration forecasts:

vpsnet it's 24h, any forecast for the reestablish? I know it takes time, but we might have customers to inform

No reply. Last status update on their “status” blog was really old.

Sep, 2nd – 13:28 – I decide to write another email to the support, again.
Emails are managed by the support through a ticket system, which you can also access from a web interface where I took the pretty screen below.

Sep, 2nd – 13:32 – Another pointless status update via email from VPS.net

Dear Customer,
 There currently is an issue with the SAN that hosts your VPS.
Please see http://status.vps.net/2011/09/london-san5-issue/ for updates.

Sep, 2nd – 13:40 – Someone from VPS.net support replied me.

Sep, 2nd – 13:44 – Followed by…

I also asked for an estimation of the time required for those issues to be solved:

…and I’ve been told:

I thought that was ok, I mean, shit happens… it might take time to fix things up.

Sep, 3rd – 16:12 – 26 hours later I send another email to the support to get updates (of course, there were no useful status updates on the suggested page).

VPS.net support replied:

Sep, 3rd – 16:33 – Few minutes later, another email from VPS.net support:

Thank ghosts in the machines, I thought… but…

Sep, 3rd – Few minutes later i tweeted this:

vpsnet after the outage, my VPS is finally back, but the filesystem is really screwed up.<br />
And I mean, seriously. I hope you have a backup.

What was that about? Well, apache was complaining permission-denied for a 777 empty folder,
dynamic libs were corrupted, dpkg status file contained strange chunks of binary files, etc..
And these were the first clues…

Shall I add something else? well maybe this:

root@~:/$ find /lost+found/ | wc -l
10730

Sep, 3rd – 17:11 – I notified the damage to the support, telling i was going to investigate a little bit further.

And they replied something really useful:

Sep, 3rd – 19:00 – Again:
It seems most of the problems come from missing or corrupted files. Is there anything you can do about that? I mean, do you have any previous snapshot or backup which include my vps? I am currently making a backup of anything I am able to retrieve from the vps. Please let me know.Regards, Alessandro

Sep, 3rd – 20:07 – And again ..

Dear vpsnet support, I continued the investigation of the problems and it turned<br />
out that some files on my system are definitively corrupted. This is pretty clear since some system text files appears to contain pieces of binary data. Do you have a previous or different backup with a copy of the image of my VPS for restore? Please provide me with an answer as soon as possible. Regards, Alessandro

Sep, 3rd – 20:28 – And again …

FYI, checking for corrupted files in my VPS filesystem, I just found some OS configuration files containing pieces of source code (length: 12k), which before the outage were inside scripts for web pages in one of my webroots. I think this indicates that your storage probably failed much more than you imagine. I just hope my data didn't also end up on someone else's machine. That said, do you have a backup for my VPS or not? Please provide me an answer asap. Regards, Alessandro

Got it, right? My system was completely screwed.
And here we go:

Sep, 3rd – 20:29 – Finally VPS.net support replied:

Hello, Our engineers are migrating the VM from old crashed SAN (about this action you can read on the http://status.vps.net/) may be this migration caused the following issue, please let us know the ssh port and root password for your server and we will check it. Thank you for request.

“let us know the ssh port and root password” ?! This is when I lost my cool, are you f$@#*n’kidding me?

Sep, 3rd – 20:29 and few seconds…

vpsnet

Sep, 3rd – 20:44 – My reply to the support

I am sorry, but I cannot provide you with credentials to login into my VPS. Even if I could (and again, this is NOT going to happen), you will not be able to fix anything, since as I said, some files are missing and other files are irreparably corrupted. Unless you have a backup, and in that case, you just have to restore it. Do you have or do you have not a backup with another or previous snapshot of my VPS? Regards, Alessandro

While writing, I was thinking “Keep calm, it’s their business, they will certainly have a backup, their SANs will be redunded”… < no way

Sep, 3rd – 20:55

Hello, We don't have any backups  of your server all backups, and as I see you have not used the some kind of our backup service, so unfortunately I can't help you with another backup

I should not even comment on this one… “I see you have not used the some kind of our backup service” < WHAT THE HELL! It's been YOU having a fault!

Sep, 3rd – 21:17

That's a really bad news. So which solution do you suggest? As I mentioned, I cannot provide you with login credentials. If you think there's any

At this point I really was expecting something. I mean, I’ve been using this company services for two years, they always tried to satisfy customers with small things which are more or less irrelevant but still good to have for free and so on. And now, with this relatively serious issue, the’re certainly going to provide a way out of this mess. I didn’t know what, maybe some free temporary nodes to restore a working system, some incentives or whatever. Instead…

Sep, 3rd – 21:30

Hello Alessandro, If you do not have backups of your site(s), DB(s) on your side, then we cannot do anything to restore your data, because you did not have any enabled backups with us. Thank you.

“because you did not have any enabled backups with us” ?! no comment…

Sep, 3rd – 21:57

I do have regular backups on my side for my stuff, but system files are also corrupted because of your fault, meaning I will have to configure a new system from scratch now. And add insult to injury, I'll have to pay for that! Honestly I've always appreciated your services and the quality of your support, but it sucks that you screw up your customers' resources and just don't care. I did suggested your company as service provider many times, but after this experience I definitively changed my mind. Are you really doing absolutely nothing about this?

You get where this is going, right?

Sep, 3rd – 22:05

No, we cannot do anything without backups. You need to reinstall clear OS on your VPS and restore your own backups then there. Thank you.

And that’s it. My happiness with VPS.net was over.

Now let’s have a look to the term of services (ToS) of VPS.net

Disclaimer of Warranties VPS.NET DOES NOT WARRANT OR REPRESENT THAT THE SERVICES WILL BE UNINTERRUPTED, ERROR-FREE, OR COMPLETELY SECURE. TO THE EXTENT PERMITTED BY APPLICABLE LAW VPS.NET DISCLAIMS ANY AND ALL WARRANTIES INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. TO THE EXTENT PERMITTED BY APPLICABLE LAW, ALL SERVICES ARE PROVIDED ON AN 'AS IS' BASIS.

Focus on “DOES NOT WARRANT THAT THE SERVICES WILL BE … ERROR-FREE”. So basically they are saying that they can screw up anything in ANY way. And that would be still OK. Anything clear now? Lesson learned.

I switched to another provider, that I found out to be actually cheaper and with nicer ToS.
I didn’t write anything about this thing for a week. VPS.net served quite well in my experience but this has been unacceptable. I’ve been with VPS.net for 2 years, and honestly this incident has been really disappointing. At least, i would expect an apologize and some kind of compensation, offered to customers for the effort that will cost them to clean their mess.

Still, some follow up today on twitter, when yoast published this post. I could not hold back the reply, and this is what followed.
(remember to read backwards: last post is the first in chronological order).




First of all, WOW!, thanks a lot, a credit, I’m the luckiest guy in the world. That means 20$, which is the price of their smaller node per month. Which is definitively an order of magnitude lower then the value of my time required to fix their mess.

Ok, ok, that would still not be that bad as a symbolical thing… but why now? because I said something on twitter about you not giving a shit about your customers? That sounds like “take this cash, and plz shut up”. I don’t think so.

You know what I think you should do?

1) Apologize with your customers. Sincerely. I saw your ToS, I know you don’t *have to*, but it’s really lame have a SAN that can fail without having a backup, and I think(hope?) you know that. Paid backup services are meant for when customers themselves screw things up. Not for when you do that as a service provider.

2) Yes, the free credits are ok, but don’t give them to those who complain, give them to everyone who experienced the outage, e.g., for more than 24h. And give more to those who had nodes restored from older snapshots. And even more to those who had their data corrupted and their systems gone. That might cost you something, but I think it would show that at least you care and in the long term it’s better than seeing your customers fleeing away.

3) Find a way to grant that this kind of incident will NEVER happen again. Add redundancy. It’s not THAT expensive, hire some skilled engineers and do that. And put that in your ToS. They’re kind of a shame right now. Do not provide backups to those who do not pay for backups. But keep them anyway, so if YOU screw things up, you can still save your customers.

I will think more about this but I don’t see how I can use your service “as-it-is” for the use I planned. I remember a cool thing from the first time I saw your homepage, with all the fancy little robots and something like “self-healing-infrastructure”. Yeah, where’s that “self-healing” thing now?

One last thing. I don’t know if you make money on a lot of small customers or if the most part of your profits come from few big customers. What I can say however, is that if I thought about the chance to use your services for a serious business (and I actually did), now that chance is gone, because you don’t look that serious anymore. And I think that is one of the most important values for a service provider.




After I wrote this post, I received a number of requests from other (ex-) VPS.net customers asking for which provider I did switched to. At the same time, other providers did present their offers for a better service. Well, long story short I found out the following:

Nice providers in terms of pricing/offer:

Linode.com
Dediserve.com
iCloudHosting

I did have a look at their ToS and unfortunately, it seems that the “I am not responsible if my own service fails” is widespread in there. E.g., from the ToS of dediserve.com:

Disclaimer of Warranties
DEDISERVE DOES NOT WARRANT OR REPRESENT THAT THE SERVICES WILL BE UNINTERRUPTED, ERROR-FREE, OR COMPLETELY SECURE. TO THE EXTENT PERMITTED BY APPLICABLE LAW DEDISERVE DISCLAIMS ANY AND ALL WARRANTIES INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. TO THE EXTENT PERMITTED BY APPLICABLE LAW, ALL SERVICES ARE PROVIDED ON AN ‘AS IS’ BASIS. ”

Did you noticed anything? It is EXACTLY the same paragraph which is on VPS.net ToS. Well I guess that’s just copy ’n paste from a template…

Linode.com ToS are sightly better: they do provide guarantees on network connectivity and hardware uptime and provide credits by ToS if something goes wrong. Still “ All services provided by Linode.com are available as is, without warranty.”. At least they did put some effort in writing their ToS. However, I asked to some customers about linode.com services, and they all seemed really satisfied with them.

Another mention, as I said, goes to iCloudHosting, which today contacted many of the people complaining about crappy VPS services via twitter. I never heard about iCloudHosting but it seems to have a nice pricing. I had a look at their ToS and the following discussion was the result.



Basically, they inserted the following in their Terms of service:

Keeping Your Data Safe
We will keep your data safe by providing a high quality infrastructure. We will store cloud server data on a SAN, which will have RAID protection. Additionally, we will ensure this SAN is replicated in real time to another SAN. For clients who have a backup service from us, we will also take a further copy of their data and store it on a RAID-protected NAS. In the event that any storage component of our infrastructure fails, we will endeavour to repair or replace it as quickly as possible without causing data loss. ”

I am not a legal expert and I don’t think it is really clear what will actually happend in case a failure is experienced. With this I mean just that I see no warranties on this cool RAID + additional replication (+ paid optional additional backup) system.

However, although the deploy of applications with strong requirements should require more digging into the implications of this statement, I think its inclusion in their ToS is a remarkable sign that they actually thought about the problem and do have a reasonably safe procedure to prevent customers’ data losses.

In conclusion, I don’t have enough experience with any of these providers to tell which one is better than the other, but I’ll feel like giving Linode.com and iCloudHosting a chance as soon as I’ll need a new cloud-based vps for something.







Bookmark and Share