> As an experiment, I decided to migrate two hosts (each with about 10 VMs) of a...

draga79 · on Oct 4, 2024

I've a contract with those clients, and I can move the VMs, change the services, etc. freely as long as it doesn't cost more than the amount we've previously set.

Otherwise, I'd never dare to do something like that.

And I'm not so crazy as to do such an operation without the appropriate tests and foundations. Of course, when I started, I had all the conditions to be able to do it, and I had already conducted all possible tests. :-)

codezero · on Oct 4, 2024

The client is paying for the VM. The underlying system is an abstraction. As long as service agreements weren’t interrupted I don’t see the problem. It sounds shady to say “without telling them,” because saying so implies they should have. I do a lot of optimizations for my customers without telling them, it’s not usually worth mentioning. I assume what they intended to convey was that this change caused no interruption of service so there was no need to contact or warn the customer.

kiwijamo · on Oct 4, 2024

This is similar to AWS S3 object storage -- AWS has over the years changed how they store their S3 data -- however as long as the API responds the same way every time it's all good. Personally I would probably do some A/B testing -- migrate half the workload and compare A to B to see if the new system is performing better before migrating the other half.

cyberax · on Oct 4, 2024

No, it's not. S3 has a very well defined API with easily measureable performance parameters. So AWS updates can make sure they don't make things worse.

This is not possible with a client's workload unless you can actually test it. That's why AWS will warn you multiple times if they need to migrate your EC2 instance onto a different hardware node. Even if it is technically "better".

Of course, the fact that clients trust their workloads to this guy probably means that there was nothing important there.

hedora · on Oct 4, 2024

The author also converted some of these VMs to jails, so I assume they have root on the VMs (and the customers want them to admin the host). That means they should be able to see the application level performance metrics.

cyberax · on Oct 4, 2024

Yeah, so he got surprised when his customer mentioned how their workloads became faster.

cyberax · on Oct 4, 2024

> The client is paying for the VM. The underlying system is an abstraction.

The VM change was sufficient enough to alter the runtime of a task by several times. This is NOT a small inconsequential change.

You _have_ to warn your clients when you do stuff like this.

hedora · on Oct 4, 2024

The workload got several times faster. The customer’s only concern was that they might be accidentally running on a more expensive instance.

In every system I’ve worked on, the agreement is in terms of an SLO. We never gave our customers any sort of expectation (or guarantee) that we wouldn’t suddenly wildly beat our SLO targets (and, in fact, we often did, due to routine upgrades).

Having said that, certain customers dictate production freezes during launches, or only want to run stuff that’s been baked in production elsewhere for 3-6 months. Upgrading those customers behind their backs would be unacceptable, especially because they pay extra for a crappier but more stable setup.

cyberax · on Oct 4, 2024

> The workload got several times faster.

And he found it out only by talking to the customer, indicating approximately zero testing from his side.

It might have gone the other way easily.

tbrownaw · on Oct 4, 2024

If your customer panics at you over something you did, you might have done goofed.

codezero · on Oct 4, 2024

They also made it clear they tested it by deploying to a system not hosting clients VMs.