> As an experiment, I decided to migrate two hosts (each with about 10 VMs) of a client — where I had full control—without telling them, over a weekend.
Yeah. That guy should not be allowed anywhere near the production workloads. "I solve problems", my ass.
I've a contract with those clients, and I can move the VMs, change the services, etc. freely as long as it doesn't cost more than the amount we've previously set.
Otherwise, I'd never dare to do something like that.
And I'm not so crazy as to do such an operation without the appropriate tests and foundations. Of course, when I started, I had all the conditions to be able to do it, and I had already conducted all possible tests. :-)
The client is paying for the VM. The underlying system is an abstraction. As long as service agreements weren’t interrupted I don’t see the problem. It sounds shady to say “without telling them,” because saying so implies they should have. I do a lot of optimizations for my customers without telling them, it’s not usually worth mentioning. I assume what they intended to convey was that this change caused no interruption of service so there was no need to contact or warn the customer.
This is similar to AWS S3 object storage -- AWS has over the years changed how they store their S3 data -- however as long as the API responds the same way every time it's all good. Personally I would probably do some A/B testing -- migrate half the workload and compare A to B to see if the new system is performing better before migrating the other half.
No, it's not. S3 has a very well defined API with easily measureable performance parameters. So AWS updates can make sure they don't make things worse.
This is not possible with a client's workload unless you can actually test it. That's why AWS will warn you multiple times if they need to migrate your EC2 instance onto a different hardware node. Even if it is technically "better".
Of course, the fact that clients trust their workloads to this guy probably means that there was nothing important there.
The author also converted some of these VMs to jails, so I assume they have root on the VMs (and the customers want them to admin the host). That means they should be able to see the application level performance metrics.
The workload got several times faster. The customer’s only concern was that they might be accidentally running on a more expensive instance.
In every system I’ve worked on, the agreement is in terms of an SLO. We never gave our customers any sort of expectation (or guarantee) that we wouldn’t suddenly wildly beat our SLO targets (and, in fact, we often did, due to routine upgrades).
Having said that, certain customers dictate production freezes during launches, or only want to run stuff that’s been baked in production elsewhere for 3-6 months. Upgrading those customers behind their backs would be unacceptable, especially because they pay extra for a crappier but more stable setup.
Yeah. That guy should not be allowed anywhere near the production workloads. "I solve problems", my ass.