This site uses cookies. By continuing to browse the site, you agree to our use of cookies. Find out more about cookies.
Skip to main content
Metadecidim's official logo
  • English Triar la llengua Elegir el idioma Choose language
    • Català
    • Castellano
Sign Up Sign In
  • Home
  • Processes
  • Assemblies
  • Initiatives
  • Consultations
  • Conferences
  • Help

Propose new functionalities for Decidim software

#DecidimRoadmap Designing Decidim together

Phase 1 of 1
Open 2019-01-01 - 2030-12-31
Process phases Submit a proposal
  • The process
  • Debates
  • Propose new features
  • News
chevron-left Back to list

Intelligent recommendations

Avatar: DataForGoodBCN DataForGoodBCN verified-badge
30/06/2020 18:12  
Accepted / In progress

When someone publishes a new proposal, a list of similar entries is displayed to avoid duplicates. The current recommendation algorithm calculates the similarity of each pair of proposals based on trigram (sets of 3-characters) comparison. This method, however, does not take into account the semantic aspects of the text and can be easily improved using simple Machine Learning techniques.

We suggest using a technique called word embeddings which consists of assigning to each proposal a multi-dimensional vector, in such a way that similar proposals (in terms of semantics) end up having close vectors. Therefore, the recommendations for a given proposal would be the proposals with the smallest distances between the vectors.

To calculate the vectors associated with each proposal, we suggest using pre-calculated vector embeddings for each word (of those more frequent in the Decidim vocabulary) and then calculating the average of all words appearing in the proposal. The pre-calculation of word vectors could be done offline by any person with medium knowledge of NLP (DataForGoodBCN, the community that has created this proposal, could provide these calculations).



  • Filter results for category: Proposals Proposals

This proposal has been accepted and is under development

List of Endorsements

Avatar: Platoniq Platoniq verified-badge
Avatar: Decidim Product Decidim Product verified-badge
Avatar: Laura Portell Laura Portell
Avatar: Didac Fortuny Didac Fortuny
Avatar: Pauline Bessoles Pauline Bessoles verified-badge
Avatar: Antoine Gaboriau Antoine Gaboriau
Avatar: Xavi Ros Roca Xavi Ros Roca
Avatar: Pierre Mesure Pierre Mesure verified-badge
Avatar: Felipe Álvarez Felipe Álvarez
Avatar: txema txema verified-badge
Avatar: Arnau Arnau
Avatar: Pau Parals Pau Parals verified-badge
Avatar: Quentin Lp Quentin Lp
Avatar: Pablo Aragón Pablo Aragón verified-badge
Avatar: Carol Romero Carol Romero verified-badge
Avatar: Oliver Azevedo Barnes Oliver Azevedo Barnes
Avatar: Ivan Vergés Ivan Vergés verified-badge
and 14 more people (see more) (see less)
Endorsements count17
Intelligent recommendations Comments 6

Reference: MDC-PROP-2020-06-15589
Version number 2 (of 2) see other versions
Check fingerprint

Fingerprint

The piece of text below is a shortened, hashed representation of this content. It's useful to ensure the content hasn't been tampered with, as a single modification would result in a totally different value.

Value: 412e0540f6d1108f9a743cfdbac0bcf75062cae6ab878f1d4b64939f7356cb5d

Source: {"body":{"en":"<p>When someone publishes a new proposal, a list of similar entries is displayed to avoid duplicates. The current recommendation algorithm calculates the similarity of each pair of proposals based on trigram (sets of 3-characters) comparison. This method, however, does not take into account the semantic aspects of the text and can be easily improved using simple Machine Learning techniques.</p><p>We suggest using a technique called <strong>word embeddings</strong> which consists of assigning to each proposal a multi-dimensional vector, in such a way that similar proposals (in terms of semantics) end up having close vectors. Therefore, the recommendations for a given proposal would be the proposals with the&nbsp;smallest distances between the vectors.</p><p>To calculate the vectors associated with each proposal, we suggest using pre-calculated vector embeddings for each word (of those more frequent in the Decidim vocabulary) and then calculating the average of all words appearing in the proposal. The pre-calculation of word vectors could be done offline by any person with medium knowledge of NLP (DataForGoodBCN, the community that has created this proposal, could provide these calculations).</p><p><br></p><p><br></p>"},"title":{"en":"Intelligent recommendations"}}

This fingerprint is calculated using a SHA256 hashing algorithm. In order to replicate it yourself, you can use an MD5 calculator online and copy-paste the source data.

Share:

link-intact Share link

Share link:

Please paste this code in your page:

<script src="https://meta.decidim.org/processes/roadmap/f/122/proposals/15589/embed.js"></script>
<noscript><iframe src="https://meta.decidim.org/processes/roadmap/f/122/proposals/15589/embed.html" frameborder="0" scrolling="vertical"></iframe></noscript>

Report inappropriate content

Is this content inappropriate?

Reason

6 comments

Order by:
  • Older
    • Best rated
    • Recent
    • Older
    • Most discussed
Avatar: Antoine Gaboriau Antoine Gaboriau
30/06/2020 19:15
  • Get link Get link
In favor  

Hi, I absolutely love this :)
We at Open Source Politics also had the idea of using NLP to enhance Decidim's proposal comparison tool. I wonder if using word embeddings is the right way to do this though, and I'd love to hear your point of view.
Indeed, and I may be wrong about this, but I think word embeddings keep the semantic structure but do not scan the subjects covered in the proposal. I guess using classification tools could help us analyse what people say and not just how they say it. Classifying proposals would (again, not an NLP expert) link proposals according to the vocabulary used.

Maybe a mix of these two methods could be relevant? Would it be too heavy on Decidim, just for the sake of a comparator?

Avatar: Felipe Álvarez Felipe Álvarez
09/11/2020 14:56
  • Get link Get link

Hey there! loving it too!
I'm working with proposals recomendations in the AhoraNosTocaParticipar version of decidim.
I was thinking that there are a couple of versions of word embeddings already computed in spanish (I know of this one in Chile). As I understand it there are a few techniques for searching phase similarities (https://medium.com/@adriensieg/text-similarities-da019229c894).
What do you guys think of creating a separate engine, for doing so? Currently, it is Postgres that does the text similarities engine, so I think if we create a separate engine and simply replace the calls in decidim could work.

Avatar: DataForGoodBCN DataForGoodBCN verified-badge
13/11/2020 15:49
  • Get link Get link

Hi. First of all, thank you Antoine and Felipe for your valuable comments.
About the last comment, we would need to assess the languages of the proposals in order to use the appropriate word embeddings. For the languages for which they already exist, we could surely evaluate whether the existing ones are good enough, but using them would be an option. As Felipe says, our idea was to use this embeddings to create a new method for calculating the distances to replace the current Postgres methodology.

Avatar: Quentin Lp Quentin Lp
16/11/2020 10:04
  • Get link Get link
In favor  

Hi ! Very fond of it too !
I've already done some experiences (at OSP) and I chose to work with CamemBERT alongside SBERT (a french version of BERT and a sentence-encoder) to spare huge training costs and capture very precise semantic information at the sentence level (not only word-level).
For the time being I did not used any fine-tuning and simply mobilized the knowledge of CamemBERT pre-training.

The final list of semantic related sentences is often linked by a common theme with the initial proposal. Yet, this system disregards argumentative structures and some of the closest pairing rely on a theme that we would not have necessarily chose for the focus of the comparison. I think th

I'd be glad to talk about it more precisely if there is anyone that is going through similar stakes.
I just have a question left: any peculiar reasons to rely upon extracted word embeddings rather than Transformer Architecture?

Avatar: Virgile Deville Virgile Deville
17/11/2020 13:06
  • Get link Get link

Hello @DataForGoodBCN, as my colleagues (Quentin and Antoine) mentioned we are also working on this.
How about we sync up in a call and see if we can join forces on this ?
I'm sending you via dm my contact details.

Avatar: Decidim Product Decidim Product verified-badge
18/01/2021 15:48
  • Get link Get link

Related to Improve automatic comparison algorithm when submiting a proposal

Add your comment

Sign in with your account or sign up to add your comment.

Loading comments ...

  • Terms and conditions of use
  • About the community
  • Download Open Data files
  • Metadecidim at Twitter Twitter
  • Metadecidim at Instagram Instagram
  • Metadecidim at YouTube YouTube
  • Metadecidim at GitHub GitHub
Creative Commons License Website made with free software.
Decidim Logo

Confirm

OK Cancel

Please sign in

decidim Sign in with Decidim
Or

Sign up

Forgot your password?