That isn't every real application. There are lots of use cases where you need an iPod-shuffle-level interface; and this would appear to be just fine for those.
When looking at this I remembered two projects I saw several years ago: one was using a wearable camera and colored markers on fingers plus some computing to recognize gestures, e.g. making a square with fingers would command the camera to do a "screenshot"; the other one was using giro-sensors put on fingers to recognize finger and hand movements. Both looked interesting, but I have not heard about any of them since then.
I think you are referring to Pranav Mistry's Sixth Sense(https://www.pranavmistry.com/archived/projects/sixthsense/). It was part of his research at MIT media labs. I don't think he took it any further. Maybe got incorporated in some ways at Samsung where he headed research.
Looks nice, but it should really say single RGB camera AND a boatload of processing power.
TLDR the camera feeds 1280x720 images at 120fps to a PC (with graphics card). While you could reduce this to an on-chip processing block, it will also have to deal with people looking away halfway through typing etc. which adds to the processing overhead, and requires integration with the headset (6 axis sensors etc.).
IRL this has been somewhat tried with the Humane AI pin, with gestures, less than instantaneous response is noticeable, so latency has to be paramount.
Don't mean to rain on the parade as it's a nice piece of work, but comparing to say chording (low cpu requirements, better accuracy, faster, but you have to learn how to chord) shows the different tradeoffs.
It's extremely difficult to accurately hit buttons floating in free space with any type of speed or consistency.
By anchoring virtual buttons to your physical hand, your proprioception kicks in and you can rapidly and accurately hit keys. It's pretty much trivial for most humans to tap the same spot on their palm without even looking, but most people can't do this with an arbitrary spot in free air.
The VR/XR industry has tried floating virtual keyboard many times in basically every form you can imagine. All attempts have been abandoned because they suck really bad. This is an idea that is fundamentally incompatible with human psychology and physiology.
But completely unusable for any real application where you'd do a lot of typing.