*That's why Hadoop, for example, moves the calculations onto the storage nodes, ...

fxtentacle · on June 14, 2020

In my experience, the cloud is so slow and expensive for these tasks that even if your job only runs once per day, you're better off getting a few affordable bare metal servers.

Plus, most tasks that only run occasionally tend to be not urgent, so instead of parallizing to 3000 concurrent executions, like the article suggests, you could just wait an hour instead.

Serverless is only useful if you have high load spikes that are rare but super urgent. In my opinion, that combination almost never happens.

qeternity · on June 14, 2020

This is us exactly. We pay around $2k for one of our analytic clusters. Hundreds of cores, over 1tb of ram, many tb of nvme. Some days when the data science team are letting a model train (on a different gpu cluster) or doing something else, the cluster sits there with a load of zero. But it’s still an order of magnitude cheaper than anything else we’ve spec’d out.

namibj · on June 14, 2020

Are you potentially interested in renting out idle capacity for batch jobs? If so, what kind of interconnect do you have? Feel free to contact me (info in my profile).

qeternity · on June 14, 2020

Sadly, given how cheap the infra is, it's not worth it to us to have to share with someone else. Let's say we could cost share 50% thus saving us $12k/yr...we would spend a lot more than $12k setting up a system and all the headache that arise from sharing the infra.

But thanks for the offer! The natural market forces will drive cloud computing prices down the same way they've driven everything else down. But until then, roll-your-own can save loads.

namibj · on June 14, 2020

I figured it might be unreasonable, but thanks for responding.

Yeah, I was particularly curious because I was unable to find better public offers than AWS (with their homebrew 100Gbit/s MPI that drops infiniband's hardware-guaranteed-delivery to prevent statically-allocated-buffer issues in many-node setups, allowing them quite impressive scalability) or Azure (with their 200Gbit/s Infiniband clusters), at least for occasional batch-jobs.

I wouldn't ask if I could DIY for less than using AWS, but owning ram is expensive. And for development purposes it would be quite enticing to just co-locate storage with compute, and rent some space on those NVMe drives for the hours/days you're running e.g. individual iterations on a large dataset to do accurate profile-guided optimizations (by hand or by compiler). Iterations only take a few minutes each, but loading what's essentially a good fraction (minus scratch space, and some compression is typically possible) of the ram over network causes setup to take quite a long time (compared to a single iteration).

necovek · on June 14, 2020

I think you are missing the point of the article. I read it as "this desktop software could make use of serverless to provide me a re-encoded 4gb video file in seconds by doing 3000" tasks (provided my bandwidth could handle that). My gripe with that is the one of privacy (I do not want my data processed elsewhere).

Still I would not be opposed to such client-server(less) architecture being used where I could have slower devices seamlessly integrating with my personal server for faster processing of compute heavy tasks.

It's not that this hasn't been done before (thin clients anyone? Even X server model is exactly like that), but a similar approach could make a come back at some point.

fxtentacle · on June 14, 2020

No, your example actually solidifies my argument.

For most people, uploading that 4gb file for cloud processing will take an hour. But re-encoding 2h of video with GPU acceleration only takes 15-20 minutes. So no matter how fast serverless is, it'll always need to wait for upload and download, which may be slower than all the computations combined.

As for X server, using it over the internet is a pain. It is optimized for a low latency connection, meaning the opposite of putting calculations in a cloud hundreds of ms of ping away.

necovek · on June 14, 2020

Oh, I am not saying we are there yet as far as bandwidth is concerned, but even a 4G connection from a slow device like a phone is going to look better today. Uploading 4GiB file at 50Mbps will take less than 15 minutes, or 5 minutes at 150Mbps, and no phone would re-encode it in less than that. 5G goes up to 1Gbps, or 32s for a 4GiB file, and there's your case. Wouldn't you find it nice if your phone could do this in 30s without really burning like a stove as CPU usage spikes?

Again, we are not there yet, but we are not that far off either.

My mention of X was to highlight how this is just old technology within new constraints (move things that do not need small latency from the thin client onto the fast server), but how it's applied is going to make it or break it.

fxtentacle · on June 15, 2020

In that case, I agree. If we had WiFi-speed internet, Lambda would be amazing for mobile apps.

But my lived reality is that I have to go to the upper floor in my parent's house if I want to have 2G reception. And they only live a 10 minute drive from the town hall.