The repo contains some details about how to run it in WASM which is quite interesting for embedding it in pages. I've been playing around with using WASM to capture speech to text (https://github.com/ccoreilly/vosk-browser) and automatically translating it using Bergamot.
Results have been, ok. I don't think the tech is quite there yet and the speech to text obviously struggles with multiple speakers.
Whoa - an EU Horizon thing that actually produced something kinda useful to citizens rather than just being a vehicle to shovel taxpayers money into whoever can produce the biggest pile of paperwork? Wow!
Of course, lot of those projects will go nowhere (and therefore seen as "waste" by some), that's bound to happen with any large scale program like this. But unless you make bets on a lot of projects, you won't get results like this submission.
I know a lot of people who apply for them... The vast majority of the projects everyone involved knows will shut up shop mere days after the funding ends. People buy equipment simply because they know they'll get to keep it after the project has folded. The whole team then pivots to the next project. There is zero chance of success of any of them - it's always more profitable to close down and use a different idea for the next round of fundings.
Maybe some of the projects will go somewhere, but there are a good chunk where even the project leads set off from the outset with the goal of getting as much EU money as possible and closing down as quickly as possible when the money stops.
If there is on average 20 people involved with each project who received funding, that would be 340,000 people in total. If you know 1000 people who are doing this scam, then you know 0.2% of the people involved in these projects.
To extrapolate that all the projects are similar scams to what your acquaintances are doing, seems a bit irrational.
But even so, that you know 1000 people, 100 people or even 10 people performing identical scams of government funds, seems unlikely, but even so, that you would know so many who are scammers, seems to say more about you than anything about the Horizon 2020 programme.
Abuse exists in pretty much any aspect of procurement, from building roads with inflated contractor prices to producing "analysis" from consultants who happen to be friends with the relevant minister. The private sector is not immune either, with plenty of shareholders money being funneled to the sales guy with the best golf routine.
Part of the solution is better law enforcement, which would include whistleblowing on your corrupt friends. Part is better oversight - and here you should bring it up with your national government, since EU authorities typically have little power to impose that once money has been disbursed to local entities.
Yeah I have that impression also of EU grants. There seems to be no touch point "on the ground" to see if people actually need what's being subsidized.
Where I lived in Ireland there was a "park" with some huge signs about how the EU was so generous to fund it.. But they failed to make an opening in the perimeter fence for people to actually enter. The paths just ended at a fence.
It was also totally useless as a park because it was a thin strip right next to a busy 4-lane road. You could hop the fence but it's useless trying to relax right beside a main road. Total cash grab from some real estate developer clearly.
This is really exciting. I absolutely hate the rare occurrences where I have to use chrome / google translate just to buy something from an amazon store that does not have an english version. Like most EU amazons or country specific stores.
Having the ability to harness all this power locally would be awesome as both a developer and user. Big thanks to the team, whoever you are. I like the choice of languages. Not quite the usual suspects hehe
Most of Amazon’s regional websites offer machine-translation (although the language selection varies by region). Click the Flag next to the search field on the front page.
"Most" is not the case in Europe, AFAIK the only non English speaking country that has an English version is Amazon DE. I usually end up searching for products there, then copying the ASIN and searching on Amazon in my neighbouring country (as they aren't available here [0]).
I really don't understand why e-commerce services do this, especially when the UI behind the scenes supports i18n and has translations. Amazon do have country specific listings, so maybe they just don't want to show two languages, but that is often what happens on Amazon DE if you choose English. If you choose to ship to a different country they will change prices to account for different taxes, so it's not that.
Other e-commerce services are even worse, e.g. Zalando has a single app for all their countries, and the listings are the same, but they don't let you choose the language at all. Just because I am living in a country, doesn't mean I speak the language of that country.
[0] Which is another weird thing, as most of the time they do ship here. I'm somewhat surprised they have't merged to a single "Amazon Europe" storefront, instead they are still opening new country specific storefronts.
i hate the english version of amazon.de, for some reason amazon keeps activating this for me, and now whenever i type in an english book title it "helpfully" translates it to german
I have this issue with Amazon Japan; it's frequently related to clicking links from friends or Google that have the language setting encoded in the URL, which changes my personal settings.
My browser is in English, with an accept header preference for English. But that does not mean, that I want shitty machine translated versions.
After all, German is also in my accept header, so if that’s the only real version, please give me that one.
Amazon France annoyingly does not offer an English UI. I’m so looking forward for Firefox Translations to add support for French-to-English translations.
I am also excited about this, although I am usually not translating full pages so I still mostly use Google Translate even for supported languages. It sounds like they are working on ways to translate sections of a page. I use the Right Click Search addon and select text to translate then "search" with Google Translate. There is a word limit but Google even helpfully provides a link to the next chunk if you exceed it.
The few times I've tried translating news articles in supported languages with the addon it seems to do really well, although it is understandibly (and like Google Translate) more hit and miss for less formal stuff and song lyrics (which can be nearly untranslatable considering how little sense they often make if you do understand the language :/). This is translating to English though, it sounds like translating between other languages goes through English currently.
I just downloaded it to try it out (I live abroad, so I rely heavily on these extensions). Works fantastic and the UI is nice. I appreciate that you can choose to translate just one tab as you browse.
But on some websites it starts duplicating words over and over. So, seems like there are some rough edges to work out. But I'm definitely keeping it installed for when it is ready to go!
It's unfortunate that it's not available for Firefox mobile, for me at least, mobile is where I want translation. For example when traveling and viewing the sites of businesses
However on the plus side, I'm not seeing a complete inability to translate in Nightly since the context menu let's me translate using Google Translate (i realise many won't want to send their text off to G but this does at least show up one of the claimed limitations in the article, that it simply wasn't possible at all)
And even if it wasn't architecture-agnostic, it's pretty easy to translate restricted and well-behaved machine code between architectures. Much easier than dealing with javascript itself.
There is none, one of the main objectives of WASM was to be a machine agnostic bytecode, similar to JVM bytecode for example. People have even built wasm VMs on FPGAs
On a similar note, how hard is it to bring a grammar checker offline. Today most folks rely on grammarly or similar services which are basically keyloggers.
Is there an open source initiative aimed at bringing grammar checking to the edge?
Thanks for the pointer. This is an impressive job - reducing a grammar correcting model to as much as 20MB. Theoretically this could even be shipped to browsers and if we are able to wrap it in an extension that works everywhere, this could seriously compete with Grammarly.
I could understand why Google wouldn't open-source this tech, but the blog pretty much covers how to build one. I'm surprised there isn't any open source project that took this direction to bring a privacy-focused grammar checker.
And it's pretty bad. It works half decently on English, because the English grammar is rather inflexible, but for languages with more variation in word order the Office grammar checkers were laughably bad. They could only spot a few errors, and those were not even relevant to bad writers.
I'm not blaming anyone, the constraints made it nearly impossible, and at the time there wasn't enough properly tagged corpus material to work with, not even for English.
”LanguageTool is an Open Source proofreading software for English, French, German, Polish, Russian, and more than 20 other languages. It finds many errors that a simple spell checker cannot detect.”
After Grammarly stopped the service for users from Russia, I switched to LanguageTool, but found the experience much more lacking. It often suggests weird "fixes", for example replacing "look" with "onion", apparently because that's how "onion" in russian sounds.
Hemingway Editor is one such app that may provide what you seek. [1] It's not Open Source, but it isn't expensive for what it does. It's a fixed price for purchase, not an ongoing subscription.
The alternative to basically keyloggers does not necessarily have to be open source or offline. See other comments on Hemingway. Microsoft has also modern grammar checks now built into Word 365 / Outlook 365. They work as good as Grammarly without increasing trust surface. If you already are trusting the documents to Microsoft - might as well allow them to grammar check.
Once, when I saw someone using Grammarly during a screen presentation, I took an instant mental note to self-censor when chatting with that Person in the future, because this person does not care about digital privacy like I do and will broadcast at least parts of our conversations.
That's what I was wondering as well. If I had to guess, the models that focus on tons of languages are often less good at say English->Spanish as a dedicated model, or a model that focuses on only a few high-resource languages.
Glad to see more offline-translation options though. Would be nice to have a benchmark for them soon to compare more easily.
It's a very odd set of languages that they have. French should have orders of magnitude more resources available for developing a translation engine than e.g. Bulgarian would, so why do the more obscure language first?
Maybe they just happened to be able to get volunteers/partners who know those languages.
Can't be language families—they've got Spanish, Italian, and Portuguese, so French would be an obvious choice to include.
The inclusion of Estonian is particularly odd. It's a very small language (1.1m native) and is the only non-Indo-European language in production or development.
Reminds me of the language learning app Lingvist, which for a while had only German, French, and Estonian for a similar reason: the company is Estonian.
Er, I don't think that's it. French is pretty far from being polysynthetic and is usually classified as analytic. Why do you say it's polysynthetic? Because of the few personal pronoun clitics it has?
I'd imagine the 14 grammatical cases of Estonian would be harder to handle than French grammar, and yet they have Estonian.
Maybe I'm a horrible person for wanting this, but how hard would it be to repurpose this as a device-local translation extension for Chrome/Chromium/Brave?
Nor does it include African or native South American languages, right?
Even if this had not been sponsored by the EU, it wouldn't be surprising. This kind of product relies on decades of development. The smaller and poorer a language, the less research will have been done. Chinese and Japanese often have material available, but NLP research in other countries is still behind.
It's funded by the EU and it doesn't even cover all EU languages. This is just a small scale pilot project, it's unreasonable to expect them to cover 3000 languages.
Agreed. All of the languages I use semi-regularly are in SEA. All of the open source translation options tend to neglect Asian languages.
Google Translate seems to require an account or something on my microG'd phone and refused to work after an update, which meant I actually switched to Yandex--purely because I have no other services with them so translations are somewhat siloed away from other applications and services.
You're not wrong. Firefox really is always in last place for their features and Mozilla is entirely bankrolled by Google. So sending donations for trying to fund the browser is pointless as it is not going to that.
Mozilla is not leading in standards; always following big tech. It is now on life support as users do not care about using Firefox and have already declared Chrome (and the other derivatives) the winners. It's not even the first loser, which that is obviously Safari.
So I'm afraid that ship has sailed when it comes to the browser wars. It just went from one behemoth (Microsoft's Internet Explorer) to another (Google's Chrome), and Mozilla did nothing as everyone celebrated Google Chrome's dominance.
The key premise is that this is an important feature. I don't think it is that important to many users. Browsing poorly translated websites just isn't a thing. Especially not with Google translate, which in my experience almost but never quite makes sense with many languages.
About the only time it matters to me is when dealing with stuff I have to do (e.g. bureaucratic or legal things). In which case I prefer using deepl.com, which seems to do a better job of e.g. handling German legal texts.
I guess in a pinch it's handy to be able to translate a bit of text or a web page but that hardly is a regular thing. Browser extensions exist for that. Also for Firefox. I don't tend to use those though because I just don't really need or want that.
The real feature here is "on device", not the translation itself. A lot of people don't want to send all their foreign language browsing to third parties for translation.
It is absolutely essential when traveling to countries whose primary languages you don't understand. Mobile Firefox doesn't support any translation extensions, which means I use Chrome when traveling.
I am a hobbyist and frequently browse foreign language sites to engage with my hobbies beyond the Anglosphere. It’s remarkable how much more is out there when you start looking.
The GitHub repo for it is here: https://github.com/browsermt/bergamot-translator
The repo contains some details about how to run it in WASM which is quite interesting for embedding it in pages. I've been playing around with using WASM to capture speech to text (https://github.com/ccoreilly/vosk-browser) and automatically translating it using Bergamot.
Results have been, ok. I don't think the tech is quite there yet and the speech to text obviously struggles with multiple speakers.