Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also, Google's infrastructure is so vertically integrated that it's not obvious how to offer it others. E.g. Borg assigns 1 IP to each machine, which means every pod-equivalent has to listen on a dynamically allocated port, so Google eschewed DNS in favor of a system that more easily assigned dynamic IP:port pairs without relying on TTLs (BNS/Chubby). Google also decoupled storage from compute and had a bunch of custom databases to run in that paradigm (GFS/CFS/BigTable).


I’ve heard Google considers (considered?) their infrastructure prowess as a secret sauce. Could it be that they didn’t want anyone else to benefit from it?

They had years of headstart on most infrastructure technology that’s ubiquitous now, but didn’t bother to offer it to others. Perhaps they thought only Google should benefit from it, while Amazon saw a business opportunity with AWS.


I'm also curious about this.

The empirical evidence seems to point that way, a lot of people were noticing that many of the Google stuff became open source or got open source equivalents from Google (Borg/Kubernetes, Blaze/Bazel, etc.) only after a critical mass of former Google employees left Google and since they liked Google tooling started replicating it as open source software outside of Google. In some cases that open source software got traction and Google seemed to be falling behind in some areas.

Which basically tells me that in the software world, you can't really have trade secrets. I could be wrong, though.


One of the stories I provided for the documentary that didn’t make it (but is one of my favorites) relates to an early 2015 discussion I had with Brian Grant (googler / architect / worked on Omega / also in the documentary).

Brian writes very succinct and clear descriptions of problems encountered inside google, breaks them down to “what worked, what didn’t work”, and then would summarize a design seed in a github comment (of which several such subsystems subsequently were implemented that have been fairly successful).

I would keep mental track of some of these comments and assign a score based on how much R&D investment at google was compressed into a 200-300 word comment. I guesstimated some at being valued at upwards of $100k a word, and jokingly asked brian if he was authorized to share $10M worth of google’s secret sauce. He laughed, but reminded me of some of the same things mentioned in the documentary (which we all knew in the project at the time) - paraphrased:

“if you can seed a community project that includes some of these patterns built on that research, and that enables $10 billion of application team value, and google can capture $1B of that in google cloud, that’s a pretty good return on investment.”

And definitely among the key contributors I worked with from google was the very human and very relatable (for HN) desire to build that shows those internal ideas off and “do a successor to borg that really takes what we’ve learned and doesn’t just copy it”.

Even if it was a business strategy to help catch google cloud up… the beauty of open source is that you can benefit from it regardless, and it moves the industry forward in a meaningful way.


> Brian writes very succinct and clear descriptions of problems encountered inside google, breaks them down to “what worked, what didn’t work”, and then would summarize a design seed in a github comment (of which several such subsystems subsequently were implemented that have been fairly successful). I would keep mental track of some of these comments and assign a score based on how much R&D investment at google was compressed into a 200-300 word comment

Most places I have seen, architects seem to spend most of their time writing out overly verbose documents, detailed specifications, making presentations etc etc. There is pressure to show you are in charge of the project and accountable for it so you end up with all these ceremonies where you create heaps of content and dump it onto the teams.

Whereas the sort of situation you described is really where the role of an architect shines, in making sense of the chaos and showing light to the teams. You often don't need more than a short message or a brief meeting to convey these ideas, but such things don't lend upward visibility to what you're doing. I've always wondered if the life of an architect in Google involves these things or is quite different from most organizations?


If you look at development of lots of "Big Data" stuff (Hadoop is a good example), Google shipped papers, while FB shipped code (not entirely true for Hadoop, but much more true for some of Google's later papers).


This was, literally, one of the arguments for building and releasing kubernetes. The rise of Hadoop made it much harder to justify MapReduce being different.

If we just talked about Borg, but didn't ship code, someone else might have set the agenda, rather than Kube.


Don't forget Prometheus (clone of borgmon by ex-SRE)


There is no way to perceive Prometheus as Google falling behind, though. It's more of a lesson about the perverse willingness of the public to subject themselves to terrible technologies that Google began deprecating years ago.


Why is Prometheus terrible? It's pretty cool.


At small scale it's fine. Its architecture totally forecloses any possibility of distributed evaluation, so there is a brick wall of scalability. Its human-readable-text-over-HTTP scraping scheme is also hilariously inefficient. It is not, in short, "planet scale".

https://www.vldb.org/pvldb/vol13/p3181-adams.pdf


Well, I get the point of Google not falling behind with their tech, but even most Google competitors aren't working at Google scale.

For Prometheus they could have just chosen to aim for smaller businesses, since that's 99% of businesses out here.

So it could just be a tradeoff.


Well, to be fair, they are offering Prometheus as a service, and they built it as a façade upon Monarch, which pretty convincingly demonstrates the superiority of Monarch.

I just find a kind of perverse humor in the fact that outside google Prometheus is viewed as alien technology - it even has that condescending name - but inside Google borgmon was considered the punch line of memes and/or a hazing ritual.


IIRC, a core point in early days of Prometheus was a greatly simplified system that built on some of the same ideas, but without the parts that made it into punch line of memes or a hazing ritual for new guys.

While the text based metrics aren't best, they are a reasonable compromise, and I found that it provides a good enough common interface where if needed I can change what collects from it later on.

Are there better options? SURE! I can, however, usually get my 80% of the problem solved with Prometheus integration and worry about scaling later.


human-readable-text-over-HTTP is planet scalable


Google and FB (and apple, but less so?) -- have been making their own servers and hardware for years...

They protect this aggressively, and the tech is impressive.

I remember the first time I accidentally saw a custom google node when I went there to pick a friend up from there, and I had gone to his desk and I saw a really nice looking PCB on his desk he forgot about, and I exclaimed "WHATs that' -- I was pretty familiar with a wide range of server boards, and I knew that was custom immediately...

He panicked covered it and we didnt really talk about it much, but it was known really early that they were building a lot of custom kit.

~2005 maybe?


The mistake also reveals the constrained imagination: Google could only conceive of offering what they had internally to others, struggled, and continues to struggle.

Amazon didn't try - they just knew that they were good at building high scale distributed systems, and thought they could build services for others. That's what S3 and SQS and EC2 were - built from scratch, not trying to figure out how to externalize the internals.


GCP was built from scratch as well.


It's more complicated, but I think everything is at least a substantial layer on top of an internal offering. E.g. GCE runs VMs in Borg using the same physical infrastructure used for other services, but GCE has its own API and virtualized networking on top that hides Borg and was not (at the time) used for any internal services.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: