> When performing string concatenation operations, it is more advantageous in terms of performance to use std::ostringstream rather than std::string. This approach is also used elsewhere, such as debug_utils and node_errors.
(From the Node.js GitHub issue.) Sounds like this guy is mixing up his Java knowledge with C++ knowledge.
C++ streams are frankly insane: Loads of implicit state, needing to set about a half dozen flags to do any nontrivial formatting, running the risk of accidentally "poisoning" all downstream operations if you forget to reset any of that state, the useless callbacks API [1], obfuscated function names (xsgetn, epptr, egptr), a ridiculously convoluted inheritance hierarchy that includes virtual/diamond inheritance [2], and use of virtual functions for simple buffer manipulation. These were all bad decisions even at the time.
C++ dev for 20+ years. I refused to use them, they encapsulated everything with C++ I hated. A ton of implicit actions and gotchas. It's a gun in hand, foot in target.
streams is the best evidence that C++ was an experiment. It was a sandbox to try a bunch of different language ideas.
Overloading the shift operators for this purpose is prima facie insane, and anyone who has single-stepped through a C++ "hello world" program can figure out it isn't remotely efficient, but it was certainly creative.
But here `<<` doesn't really feel like an actual operator here, just an ordinary separator like `,`. (Of course, `,` is actually an operator in C++. But anyway.) It is just completely alien to other existing C++ conventions before and after that.
That is more justifiable, as each delimited range is unlikely to be more complex than a call expression. Operator overload is okay, as long as you understand how would operators will be used.
> The shift operators are literally the nicest and most innocuous design decision of C++ streams
Which just shows how bad streams are. Overloading the shift operators was a terrible decision on multiple levels, but you're right -- that's not the worst streams sin.
> "I recently learned that some Node.js engineers prefer stream classes when building strings, for performance reasons." Pretty much tells you everything you need to know about node js, I guess.
Google Closure Library includes a StringBuffer class. [1]
I recall it having explanatory notes, but I don't see them in the code now. JavaScript engines can optimize a string concatenating to in-place edit, if there is only one reference to the first string. The StringBuffer class keeps the reference count at one, guaranteeing this optimization is available, even if the StringBuffer itself is ever shared.
For what it's worth, even in Java the compiler is often smart enough to replace naive string concatenation with equivalent StringBuilder usage (although I don't know if it is smart enough to do that in a for loop like this)
It’s not, and they basically gave up on it. Historically the OpenJDK would compile direct concatenation to StringBuilder so e.g.
a += b + c;
Would become something like
StringBuilder sb = new StringBuilder();
sb.append(a);
sb.append(b);
sb.append(c);
a = sb.toString();
However it was never capable of doing such an optimisation across loops.
In Java 9, they gave up on such static optimisations, instead the compiler now emits dedicated string concatenation bytecode (see JEP 280), which the runtime (and JIT) can then hook into.
Does that handle loops? Unclear. Benchmarks / testimonies from back then (java9/java10 days) hint that no, direct concatenation remains much slower than StringBuilder. But I didn’t find anything super recent so maybe they improved the behaviour in the meantime.
To less informed readers, "smart enough" hear means:
Imagine you have two Java Strings: a and b
If you write the Java code:
String c = a + b;
The Java compiler will roughly replace this code with:
String c = new StringBuilder(a).append(b).toString();
I'm not sure I would say this is "smart". Rather, it is just a hack to allow Strings to override the plus (+) operator. Normally, Java does not allow operator overidding, like C++.
First, it has nothing to do with overriding the + operator, the operation is statically decidable so the compiler is perfectly aware that this is a string concatenation (not a numerical addition) and string is a special builtin, the compiler can generate anything it wants. Which is exactly what it does, they just originally decided to implement string concatenation as stringbuffer/stringbuilder ops. Meanwhile the addition of integers, longs, floats, and doubles are (or were anyway) different bytecode ops. This does not require general purpose operator overloading support. Builtins are not limited to the language’s user level semantics.
And second OpenJDK has not done that since Java 9 (JEP 280), the compiler now emits generic “string concatenation” bytecode for the runtime / jit to deal with.
A bigger problem is that iostream is still the only C++ way to read and write files. Yeah, you can still use `std::fopen` and so on, but the modern C++ strives to minimize type-ignorant C functions right? The introduction of `std::format` made the formatting aspect of iostream obsolete, but iostream still has no standard alternative for other aspects.
std::print is coming with C++23. In the meantime, there's std::format_to. You still have to dump the output into an std::ostream, but at least you don't have to use the disgusting ostream interface directly.
But iostream is like taking someone that's crazy to use every single language feature and who think there's some ulterior motive to create these crazy inheritance levels etc
What's the alternative to string streams for building strings pieces by pieces in C++? Plain old string concatenation? Asking for a friend... I should run benchmarks I guess.
Maybe not. The problem is that C++ (before C++20) has no normal print and format function. You supposed to do everything with streams. To switch to two decimal places you would first output some magic value that sets an internal flag in the stream. Then you need to remember to restore it again.
Of course you could just use good old C printf to get some work done. But if you did that the "real" C++ programmers would sneer at you.
The fun thing with printf is that variadic functions cannot handle C++ classes. In the past was doubly fun because compilers proactively compiled an error into the binary instead of raising an error at compile time. So any forgotten c_str() became a crash waiting to happen.
C Printf never was, never could be and never will be a suitable way to output data from C++. Now excuse me while I go through the list of thousands of predefined format macros to find out which I need to use to output a uint_fast16_t without making the compiler vomit nonsense.
> C Printf never was, never could be and never will be a suitable way to output data from C++. Now excuse me while I go through the list of thousands of predefined format macros to find out which I need to use to output a uint_fast16_t without making the compiler vomit nonsense.
printf("%d\n", (int) myfast16_t);
Not that terrible for a type that I've never used, nor seen used.
uint_fast16_t is the fastest type (for some undefined meanings of fast) that holds at least 16 bits, I don't think the standard guarantees that it even has to fit into an int, so we are already silently discarding data. Worse if it uses an unsigned int then you are casting from unsigned to signed with a potential undefined singed overflow involved.
I would have assumed that you don't want to use uint_fast16_t to store anything that doesn't fit in 16 bits.
If you cast to signed that can't represent the complete range that still wouldn't be called "overflow" AFAIK, and no matter what you call it it is not "undefined". And assuming 32-bit ints there is no loss of information given a 16-bit ints.
You can also just cast to unsigned or whatever type you think is enough (you should know). The point is, use a conversion, cast to a simple type, make your code compatible.
> I would have assumed that you don't want to use uint_fast16_t to store anything that doesn't fit in 16 bits.
I could have a 16 bit wide bitmask in it, lets flip them ~myMaskFast16, that leaves the higher order bits set to 0xFF... whether I care about them or not.
> The point is, use a conversion, cast to a simple type, make your code compatible.
Casting to a type that depending on platform may or may not hold enough space to represent the value is not "making it compatible" it makes it non portable.
That's really not a valid assumption. There are 16 bit int platforms, I've worked on them. It's not even that uncommon.
Casting in this scenario is simply incorrect. The correct thing to do is to use the formatting macros e.g. PRIuFAST16. Which noone ever does because it's gross and most developers don't actually care about portability.
> Of course you could just use good old C printf to get some work done. But if you did that the "real" C++ programmers would sneer at you.
As someone who has been programming in C++ since before there were C++ compilers (back in the day, you had to run it through a translator to make it into C code, then use a C compiler), I think I'm as real of a C++ programmer as anybody.
C++ programmers who sneer at you for this are fully worthy of being ignored.
Of course, printf() has its own set of issues as well.
printf is fast, but has safety problems. Also frankly I don't like the fact that the formatting and the variable the data comes from are separated and you have to count out the argument orders to figure which one is which. Python's fstring gets this right.
<iostreams> are currently a good example of pure product of the 199X/2000s when the hype about Object Oriented was around its peak.
Almost everything related to c++ iostreams has this code smell of OOP pushed too far:
- Usage of runtime virtual dispatch with virtual calls when it was not necessary. Causing a negative unavoidable impact on performance.
- Heavy usage of function overloading with the "<<" operator. Leading to pages long compilation errors when an overload fails.
- Hidden states everywhere with the usage of state formatters and globals in the background.
- Unnecessary complexity with std::locale which is almost entirely useless for proper internationalisation.
- Bloat. Any statically compiled binary will inherit around ~100k of binary fat bloat when using iostream
- Useless encapsulation with error reports done as abstracted bit flags. Which is absolutely horrendous when dealing with file I/O: It hides away the underlying error with no proper way to access it.
- Deep class hierarchy making the entire thing looks like spaghetti.
- Useless abstraction with stringstream that hides the underlying buffer away, making it close to unusable on embedded safety critical systems where memory allocations are forbidden.
All of that made <iostreams> aged pretty badly, and for good reasons.
Fortunately there is an incoming way out of that with work of Victor Zverovich on std::format and libfmt [1].
I hear you, however something like streambuf is kinda necessary for a type-erased interface for input/output of trivial objects. The C alternative is FILE*, which isn't much better and isn't as customizable either.
I agree that the formatting could have been done better, and that part is indeed handled much better in fmt, although personally I dislike format strings. It's much better than printf, granted.
Avoiding dependencies is a good thing, especially for C++, that doesn't have a widely used centralized repository and dependency manager like npm, cargo or cpan. For the better or for the worse.
And pulling Boost, let alone Qt just to avoid the occasional use of iostreams (or printf) is a bit much IMHO. I usually try to avoid Boost, as I feel it is more of a sort of beta/preview for the standard library. Don't get me wrong, it is production-worthy, but it can lead to awkward things when some boost feature ends up in the standard libraries and the project ends up with bits of both.
std::format is great because at last, we can use it without dependencies.
Yes. Several alternatives have been available for a while.
The success of Victor has been to make the C++ committee accepts the idea that a new formatter was necessary and to bring <format> in the STL.
This was not a small task: The committee has its fair amount of dinosaur gatekeepers and windmills [1]. For the best and the worst.
We at least now have a way forward to evolve from <iostream> if we want to with maybe one day the hope of getting something that can entirely replace iostream.
[1]: Windmills: Person displacing air around but not much more than air.
It is a problem of gatekeeping or just that std::format is actually rather complicated under the hood.
For all their flaws, iostreams have the advantage of being simple to implement. They are just overloads of the << and >> operators. std::format and likes require a lot of meta-programming magic to work correctly. It means longer compile times, less tolerance for broken compilers, and possibly weird edge cases, which are important considerations when designing a standard that will be used everywhere. And when it's there, it is there for good, so I understand the committee for being careful.
From ANSI C++ to C++20, a lot of work has been done making meta-programming more sensible, and computers became more powerful, which makes it ready for something like std::format.
The same thing struck me as well. This is one of the best optimization professionals on the planet, showing up with a huge improvement, and receiving some misplaced arrogance.
The lesson here is to always, always watch your own review tone, and not make this mistake.
The other lesson is that when a PR shows up with this kind of technical information attached to it, spend the 60 seconds it takes to Google for "lemire".
If I'm being super pedantic, I would argue that while `string::push_back` should take amortized constant time, `string::append` has no such guarantee [1]. So it is technically possible for `my_string += "a";` (same to `string::append`) will reallocate every time. Very pedantic indeed, but I have seen some C++ implementation where `std::vector<T>` is an alias to `std::deque<T>`, so...
One thing I don't like about lemire's phrasing is that he only looks at the current, often only most available, implementations and doesn't make this point explicit for most cases.
EDIT: Thankfully he does acknowledge that in a later post [2].
I am not at all surprised. Kids these days have no idea what CPUs can do. ;)
I periodically have interview candidates work through problems involving binary search, then switch to bounded and ask them how to make it go faster over N elements, where N is < 1e3. The answer is "just linear search, because CPUs really like to do that".
This feels like a conversation where it would have been useful for the participants to be very explicit about the points they were trying to convey: the reviewer could have said "Isn't this a quadratic algorithm, because each call to `+=` reallocates `escaped_file_path`?" (or whatever their specific concern was; I may have misunderstood), and the author's initial response could have been "No, because the capacity of the string is doubled when necessary."
My impression is that C++ streams are on their way out -- unlikely to be deprecated (too much existing code) but also unlikely to receive any more attention. They are old enough to likely not have any actual implementation bugs at this point, but in retrospect the design bugs from the 1980s are pretty serious.
The rapid incorporation of the excellent `format` package for printing points to a future falling back at least to ANSI buffered IO and possibly raw POSIX IO.
I like C much more than C++, but even with that must say that https://github.com/fmtlib/fmt is pretty nice (which is the base for std::format). Together with pystring (https://github.com/imageworks/pystring) it makes string processing in C++ somewhat bearable (pystring is slow though because it still uses the std::string type which excessively allocates, but at least it's convenient compared to 'raw' C++ string functionality).
This being about C++ and RAII lifetimes, the problem doesn't apply to C. The only similar thing in C would be compound literals. Those have a lifetime like automatic variables (i.e. to the end of the block).
It always surprises me how slow streams are. fscanf should be relatively slow because it has to parse the format string at runtime. So the new C++ format should be (and I believe is) much faster
That was once true, but not on most modern systems with SSDs. Cheap NVMe SSDs can sustain over 1 GB/s, which only a highly optimized SIMD parsing library can exceed.
Ah, I thought you meant the speed of parsing the payload didn't matter (scanf performance was also discussed).
Yes, parsing the format string in printf or scanf isn't a big deal.
fscanf() is also pretty slow because of thread safety so each call involves a mutex (which goes unused 99% of the time). I wonder do the new C++ libraries have faster non-threadsafe options?
The little I did competitive programming, input parsing time was negligible compared to the allowed runtime for solving the problem. Inputs were designed so that if you had the right algorithm, you could do it easily even with terrible optimization. Fast code could be an advantage in the algorithm (but not in parsing), as it could help you "cheat" and, for example, do a problem designed for N² in N³. Personally, I used iostreams, just because I found it a bit easier to type.
But then, different competition have different rules, and maybe there are some where fscanf really is an advantage.
What is the effect of turning off synchronization with legacy functions from C? When C++ is used for I/O and no C is used this should be a habit. I’ve the impression that most C++ books don’t mention it (e.g. Primer) or only late.
It is similar to String and StringBuilder from Java. You need to know it, remember it and use it by habit. And again, books often mention it only late (e.g. Head First).
By the way. I like the plain things from <iostream>, especially the shift << and >> operators and ease of concatenating and handling strings. But as others mentioned, the implementation (e.g. inheritance) looks complicate.
The whole C++ stream API is wretched and seems designed to make things clunky, slow, and unnecessarily stateful. Not to mention the wanton abuse of shift operators.
If performance matters most, write your own specialized string processing code which works on "raw" data in memory buffers. If convenience matters look at https://github.com/fmtlib/fmt (or std::format which AFAIK is less feature rich) and https://github.com/imageworks/pystring.
Anything object oriented is slower than anything procedural. Boxing and unboxing is expensive, running constructors and destructors is expensive, copying objects is expensive, virtual tables are expensive. And stuff isn't aligned greatly in memory, we can expect lots of cache misses and branch mispredictions.
The fastest possible way is to have C strings in an array and run a function over it.
If there are advantages to OOP, speed is not one of them.
> Anything object oriented is slower than anything procedural.
Well, that's not always true; it's better to use a profiler.
In many cases, it's trivial for the compiler to inline objects (e.g., used as predicates for <algorithm> routines), resulting in better performance compared to the equivalent procedural code.
> running constructors and destructors is expensive
If the object is not polymorphic (and sometimes even if it is), the compiler can inline both the constructor and destructor, resulting in exactly the same assembly output as the procedural code.
These criticisms may apply to Java, but are not so clear cut with C++. Boxing/unboxing doesn’t exist, constructors/destructors have only the overhead you’ve put into them (and as bad as iostreams is, C++’s RAII is as glorious), vtables are only built if you’re using inheritance, and std::vector stores objects linearly in memory, yielding terrific cache and prefetch performance (in contrast with Java and its pointer chasing).
If you aren't using OOP features (such as inheritance), you're not really doing OOP, despite using C++.
In the case of C++ I'd put something like: you can use free or costly abstractions, and OOP in general has a preference towards costly ones.
Also vector is a weird point to make, it's been some time I had to deal with Java (luckily) but arrays there are also linear AFAIK. And there are GCs that have a bump allocator for new objects (not sure if Java fits here), so cache would benefit more than in sparse malloc allocations in C/C++.
> Also vector is a weird point to make, it's been some time I had to deal with Java (luckily) but arrays there are also linear AFAIK. And there are GCs that have a bump allocator for new objects (not sure if Java fits here), so cache would benefit more than in sparse malloc allocations in C/C++.
I think the point is that Object[] in Java is a linear block of pointers to objects, whereas vector<Object> in C++ is a linear block of the objects themselves.
Fair enough. But it is needed to point out that there's a catch in that in order to use dynamic dispatch (subclasses, interfaces, ...) you'd still need to use pointers in C++.
Deep down the problem could be rephrased as "there are no structs in Java". In C# for example you could have a vector of structs and enjoy linear memory access.
That's true. It's also important to note that this feature of C++ actually breaks encapsulation - the size of an object, including all of its private fields, private parent classes etc, is part of the public interface of the object in C++. So whenever you add a new private field to an object, you need to recompile all uses of your class, even if your class is used through a DLL.
As they say, closures are the poor man's objects and vis versa.
That said, most usage of closures in practice tend to have very little state manipulation -- a closure with 5 mutable fields is weird, an object with 5 mutable fields is "clean code" approved. Also, closures have only one entry point which makes making complex ones much more difficult (you have to implement a state machine or dispatch or whatever.)
Standard "design patterns" over-the-top OOP translated to FP would be like passing around collection of closures (one per method) that all alter a big shared mutable state. At that point OOP is definitely going to be faster, but the FP code would be so ugly you wouldn't write it like that to start with.
(As with OO) depends heavily on implementation, but my 2 cents is that functional doesn't apply as much constraints to optimization.
If a compiler is sophisticated enough a functional program should perform as well as a procedural one that uses a comparable garbage collector. But of course real compilers have shortcomings.
I recommend taking a read at Haskell's wiki performance article[1] to have an understanding of the shortcomings that are specific to Haskell.
(From the Node.js GitHub issue.) Sounds like this guy is mixing up his Java knowledge with C++ knowledge.
C++ streams are frankly insane: Loads of implicit state, needing to set about a half dozen flags to do any nontrivial formatting, running the risk of accidentally "poisoning" all downstream operations if you forget to reset any of that state, the useless callbacks API [1], obfuscated function names (xsgetn, epptr, egptr), a ridiculously convoluted inheritance hierarchy that includes virtual/diamond inheritance [2], and use of virtual functions for simple buffer manipulation. These were all bad decisions even at the time.
[1] https://en.cppreference.com/w/cpp/io/ios_base/register_callb...
[2] https://i.stack.imgur.com/dXhXP.png