Use automatic language detection for the machine translation feature
Is your feature request related to a problem? A clear and concise description of what the problem is.
The futureu.europa.eu platform is one of the first Decidim instances to be using the machine translation feature.
Based on the proposal “Machine translation enhancement for source language detection edge cases” we’d like to offer a more complete solution to the source language problem.
As described in the additional context, adding a dropdown for language selection on UGC (User generated content) forms might not be enough as some users won’t notice it or take the time to select the right language.
Describe the solution you'd like
We propose to leverage on the automatic language detection feature that some machine translation services offer. This way we’d be able to select the source language for the user.
The issue with automatic language detection is that it doesn’t work well on short sentences like (OK , Da etc.). Also some languages are very close (ex : Romanian and Moldavian) and errors can be made.
Given that I’m logged in
When I am writing a comment / proposal / meeting in the same language as the one I selected to browse the website
Then the form displays the language selection dropdow with my preferred language.
Given that I’m logged in
When I am writing a comment / proposal / meeting in another language than the one I selected to browse the website
If the language detection is at a high confidence level
Then the language of my contribution is automatically selected and a message offers me via a click to display the language selection dropdown to correct if any mistake was made.
If the language detection is at a low confidence level
Then a message appears explaining the situation and showing the detected language and its confidence level (%).
Given I am a logged in user
When I create an event of proposal
Then the language detection should be done only by one field, the body of the content added.
Describe alternatives you've considered
This builds upon the meta proposal “Machine translation enhancement for source language detection edge cases”
Resources : Libraries for automatic language selection
https://github.com/ankane/fastText < Ruby fast text implementation
https://github.com/bung87/whatlangid < Python library
https://github.com/tremend-cofe/language-detection < Python implementation of the language detection application (based on whatlangId)
Does this issue could impact on users private data?