I find the entire history of improvements to Java’s String class enjoyable to read about.
Over the years, the implementation of Java’s String class has been improved again and again, offering performance improvements and memory usage reduction. And us Java developers get these improvements with no work required other than updating the JRE we use.
All the low-hanging fruit was taken years ago, of course. These days, I’m sure most Java apps would barely get any noticeable improvement from further String improvements, such as the one in the article we’re discussing.
When I started my career in software development, SDE, and soon advanced to SRE, I hated Java. The extreme OOP paradigm made enterprise class situations impossible to understand. But after a few short years, I began to appreciate it as a real, battle hardened ecology. Now, I consider it much better than modern trends such as Rust and Python.
These kinds of niche optimizations are still significant. The OOP model allows them to be implemented with much less fanfare. This is in the context of billion-dollar platforms. With some basic performance testing and API replays, we're saving thousands of dollars a day. Nobody gets a pat on the back. Maybe some pizza on Friday.
the mind blowing moment for me with Java came about 5 years into using it, when I encountered the idea - via some smart colleagues - that none of the extra 'stuff' is intrinsic to the language but rather is self-imposed.
Turns out you can write java without the stuff. No getters and setters, no interfaces or dependency injection, no separate application server (just embed one in your jar). No inheritence. Indeed no OOP (just data classes and static methods).
Just simple, c-like code with the amazing ecosystem of libraries and the incredibly fast marvel that is the JVM and (though this is less of a deal now with LLM autocomplete) a simple built in type system that makes the code practically write itself with autocomplete.
It's truly an awesome dev experience if you just have the power / culture to ignore the forces pressuring you to use the 'stuff'.
The over engineering creeps in anywhere there is collaboration. It’s not a Java thing, it’s a corporate thing. The new teammate who refactors the perfectly working sql pipeline; the teammate who makes story points a required field on your trouble ticket system, the teammate who just saw a conference talk on rust and wants to apply it etc. Most engineers are not zen masters seeking out simplicity; they are lost with poor grasp of desired business outcomes so they procrastinate by adding complexity and indirection and focusing on tech as if it were the destination not the journey.
> lost with a poor grasp of desired business outcomes so they procrastinate by adding complexity
I have come to see this as a mix of business people and developers not doing their jobs to protect thier paycheck. Business people, if they want to succeed, need to be converging on a strategy that makes money. Developers need to have a strategy for removing technical barriers to realizing that strategy. The lack of a business strategy often makes an overly general technical platform look attractive.
> focusing on the tech as if it were the destination
So common. Complexity should be considered the enemy, not an old friend.
True, but there's also the bored engineers who just can't force themselves to write enterprise code unless they make it fun for themselves. I'm absolutely convinced this is why Clojure even exists and is so widely used in fintech.
The extreme focus on multiple layers of patterns, where actually a simple function would have sufficed IS a Java ecosystem and culture thing. Just way too many people doing Java, who have never learned another language or ecosystem, let alone paradigm, thinking "I learned Java, Java can do everything, I don't need to learn anything else!" and now feeling they need to solve every problem by applying design patterns mindlessly.
Well, not by talking about AbstractFactoryProxy, that much is for sure. Rather by talking about which parts of the system are modular and what kind of flexibility the system allows for, what capabilities it has. Nowhere in that picture does a low level implementation detail like an AbstractBlaBlubFooBar enter the the conversation.
There is more to computer programming than the OOP clutter.
Yes and: Mocks (mocking?) is a code stench. Just flip the ownership relationship(s). As Riel, and surely many others, have laboriously explained, to a mostly uncaring world.
Prefer inheritance if the first thing you're gonna have to do under composition is to crank out a bunch of delegate methods from the parent object to the contained child, to make the parent substitutable for the child, and the child won't be replaced at run time.
Probably. I personally haven't done anything like that. Though I'm not sure I follow.
I've mostly done ASTs and (scene) graphs. I prefer dumb objects, stuffing most behavior and logic in an Interpreter (design pattern) implementation. Being a very simple bear, my working memory can't keep track of all the smarts scattered all about.
My current example is processing SQL statements; think expression evaluator (subset) for a very specific use case. I tend to use so called "External" Iterators for walking trees, to keeping all the logic in one place. Versus Visitor, Listeners, or even Active Objects. Which is feasible for this use case, because its bounded and unlikely to change (eg extension points).
YMMV of course. Now I'm just babbling, apologies, and probably went too far off topic.
I've seen Java described as made for companies to be able to rotate out mediocre programmers as efficiently as possible without letting them mess things up easily, and it makes a lot of sense from that perspective. Barebones semantics to the point of being Spartan (can't even define your own operator overloads), easy to take another class and copy it with small modifications but not mess it up for anyone else (inheritance)..
Then there's C# which most anyone who's enthusiastic about software dev will find far nicer to work with, but it's probably harder for bargain basement offshore sweatshops to bang their head against.
I really don't think this stance aged well, even if it was closer to true way back when. IMO the spartan language is now Go, and Java has ended up the boring open source workhorse. The jvm is very performant compared to many stacks these days (python Ruby node) while still having a very compelling concurrent programming story, and has a lot of nice language feature things ever since 8 and onwards. Lambdas and streams are the big 8's, but I think virtual threads growing up and even new things like scoped variables are really compelling reasons to build a new thing in java right now.
You need just the right amount of expressivity in a language, so that it is hard to abuse, but still allows writing easy to use libraries.
Java has went over this evolution, implemented generics, lambdas, etc and I believe it strikes a very good balance in not being overly complex (just look at the spec - it's still a very small language, compared to its age, unlike C++ or C#).
Go tried to re-invent this evolution, without having learnt Java's lessons. They will add more and more features until their "simple" will stop applying (though I personally believe that their simple was always just simplistic), simply because you need some expressivity for better libraries, which will later on actually simplify user code.
The problem with the JVM, compared to Go, is the GC; it requires a lot of reserved memory. Go programs use far less.
And the SDK is bulky, which can be a problem for container images - although arguably it should be considered irrelevant, as you only download base images once, if done correctly.
You're not supposed to use the runtime directly these days. jlink allows you to strip unnecessary things (like documentation for the runtime itself), extract only those parts of the runtime you need (though your project must use modules to support that), and then aggressively compress it all getting a pretty small package that runs on an empty OS with no dependencies other than libc. It's still a bunch of files, so for good user experience you would have to ship it as a container (or something like .exe or appimage), but it's really close to Go in terms of size.
It's a configurable property, and Java has a bunch of GCs to begin with.
Also, not using as much memory in these types of GCs is a direct hit to performance. And this actually shows splendidly on GC-heavy applications/benchmarks.
We were paying a million a month for a custom high performance GC for a little bit but we were able to get off that with a lot of development effort and get our five 9's latency under control.
I tried and gave up on getting Keycloak to use less memory. 500-1500 MB for a server with less than 10 concurrent users is ridiculous. And that's even using an external database.
Much less of a problem in .NET (its GC tuning sits somewhere in between the two, especially when SRV GC + DATAS is in use, like in container scenarios, where Go is funnily unaware of limits set by cgroups and needs an external package to fix it). It does pre-allocate more memory than Go per se but in return yields much, much higher allocation throughput out of box. Java allows for even higher allocation throughput, having multiple more sophisticated GC implementations but as you said is not very good at reducing sustained RSS used by an application.
Off the top of my head? Bazel is the Java program I use the most. Hadoop/hive and similar stuff also heavily Java but I'm not sure how much that's in use anymore
I'm not saying there's no Java in open source. And I'm aware of the projects you mention. I don't run them though. And they definitely don't qualify as "the boring open source workhorse".
There are a couple of Java projects, and even one or two kind of successful ones. But Java in open source is very rare, not the boring workhorse.
If I worked on a project that used Bazel, then sure, I'd use Bazel every day.
But which is "the boring workhorse" of open source, if I gave you the option of Java, Make, Linux, gcc, llvm, .deb, would Java really be "the" one?
Sure, maybe you could exclude most of those as not being "boring", like llvm. But "make" wins by any measure. And of course, it's almost by definition hard to think about the boring workhorse, because the nature of it is that you don't think about it.
Checking now, the only reason I can find java even being installed on my dev machines is for Arduino IDE and my Android development environment. Pretty niche stuff, in the open source space.
Most Java applications nowadays are based 100% on open source stack with hundreds of libraries and frameworks and Java dominates enterprise space, so it is a huge open source workhorse, just more obscure than Linux, gcc etc.
Ok, we clearly have an extremely different definition of the word "the" workhorse of open source.
It doesn't mean "more than zero projects are Java based". Nor does it mean "most (opensource?) Java applications are based on open source". That latter is borderline circular, only Oracle legal shenanigans makes it not circular.
> and Java dominates enterprise space
I said nothing about enterprise. Clearly Java is HUGE in enterprise.
> so it is a huge open source workhorse
That sentence took a strange turn. Enterprise, and then back to open source?
> just more obscure than Linux, gcc etc.
Obscure? I'd expect Java to be about as strong a brand as Linux. Among developers in general I'd expect gcc to be orders of magnitude more obscure. There's no programmer out there who has not heard of Java, but many have never heard of gcc.
> Ok, we clearly have an extremely different definition of the word "the" workhorse of open source.
You said what it is not, but forgot to share your own definition.
>That sentence took a strange turn. Enterprise, and then back to open source?
What makes you so surprised? One does not exclude another, enterprise users are users too. Most of things in Java world aren’t client-side, so many users won’t observe them directly, but open source Java technology is doing a lot of work for them, constituting significant share of the code base.
Half of the internet is literally running that.. like unless you deliberately avoid Java stacks, you will come across it. It's one of the top 3 ecosystems in size, with JS and python being the other 2 contenders.
The lack of operator overloading is a bit annoying but in practice seldom a real problem. An operator is just a funny looking method. So what.
There are worse fundamental problems in Java. For example the lack of a proper numeric tower. Or the need to rely on annotations to indicate something as basic as nullabilty.
It’s a massive annoyance when working with any sort of numeric code. Or custom collections. Or whatever else the standard library enjoys that nobody else gets to use.
I remember the times on one of professional forums, where there was lots of questions about architecture in C# sections and almost none in Java section. Abundance of tools creates abundance of possibilities to get confused about what’s right. In Java many design decisions converged to some dominant design long time ago, so you no longer think about it and focus on business. It’s sometimes as bad as getter verbosity (thankfully record style is getting traction), but in most cases it’s just fine.
Did you actually started to appreciate the same OOP that made class situations impossible to understand or did you gradually switched to a simplier OOP, made up of mostly interfaces and classes that implement them (as opposed to extending other classes)?
In my experience OOP is actually pretty pleasant to work with if you avoid extending classes as much as possible.
> These kinds of niche optimizations are still significant. The OOP model allows them to be implemented with much less fanfare.
If you're referring to the optimization in the article posted then I would argue an OOP model is not needed for it, just having encapsulation is enough.
Also note that the OP said “avoid extending classes”, but didn’t say “avoid implementing interfaces”, so they don’t disallow inheritance in a wide sense of the word.
I think “avoid extending classes” is there because it is as good as impossible to design classes that can be extended easily in ways you do not foresee, and if you do foresee how your classes could be extended, it often is easier for your users if you made your classes more flexible, to start with.
You can get Encapsulation, abstraction, and polymorphism any number of other ways. Inheritance is the only defining property of OOP.
If you removed all the stuff related to inheritance and trying to fix the leaky abstraction that is objects, the language would be a fraction of the size (compare with Go or StandardML for how small a language without inheritance can be).
I'd say that the 'poor man's closures' aspect of OOP - that is, being able to package some context along with behaviour is the most useful part for day to day code. Only occasionally is inheritance of anything other than an interface valuable.
Whether or not this is an endorsement of OOP or a criticism is open to interpretation.
> Did you actually started to appreciate the same OOP that made class situations impossible to understand or did you gradually switched to a simplier OOP, made up of mostly interfaces and classes that implement them (as opposed to extending other classes)?
My thoughts exactly. Give me more classes with shallower inheritance hierarchies. Here is where I think go’s approach makes sense.
Yes. I moved a few repository from java 8 up to Java 21.
Java 8 -> 9 is the largest source of annoyances, past that it's essentially painless.
You just change a line (the version of the JRE) and you get a faster JVM with better GC.
And with ZGC nowadays garbage collection is essentially a solved problem.
I worked on a piece of software serving almost 5 million requests per second on a single (albeit fairly large) box off a single JVM and I was still seeing GC pauses below the single millisecond (~800 usec p99 stop the world pauses) despite the very high allocation rate (~60gb/sec).
I love hearing more about this, especially the historical context, but don't have a good java writeups/articles on this. Would you mind sharing some suggestions/pointers? I'd very much appreciate it.
A good starting point is Joshua Bloch’s Effective Java. He shares some stories there from Java’s early days, and - at least in passing - mentions some aspects of the String class’s history.
Ah, I certainly remember these anecdotes! What other resources would you recommend(even the tidbits) could there be for more modern Java? The original article like this one should be treasured.
String compression was one. tl;dr: the JVM supports Unicode for strings, but uses 1-byte chars for strings where possible (previously it was UTF-16), even though it's not actually doing UTF-8.
Depending on what sort of document you're looking for, you might like either the JEP: https://openjdk.org/jeps/254
(I think I saw a feature article about the implementation of the string compression feature, but I'm not sure who wrote it or where it was, or if I'm thinking about something else. Actually I think it might've been https://shipilev.net/blog/2015/black-magic-method-dispatch/, despite the title.)
Absolutely love it. Thanks a lot. A fancy hit me yesterday and I've been looking through JDK's String commit history to see little tidbits that I could grab.
Shipilev's website looks like a fascinating resource. I appreciate the pointer!
Over the years, the implementation of Java’s String class has been improved again and again, offering performance improvements and memory usage reduction. And us Java developers get these improvements with no work required other than updating the JRE we use.
All the low-hanging fruit was taken years ago, of course. These days, I’m sure most Java apps would barely get any noticeable improvement from further String improvements, such as the one in the article we’re discussing.