This proposal has been accepted to be implemented in the main repository. Check for updates in the comments

Intelligent recommendations

Main repo (accepted)

DataForGoodBCN Official participant 30/06/2020 18:12

When someone publishes a new proposal, a list of similar entries is displayed to avoid duplicates. The current recommendation algorithm calculates the similarity of each pair of proposals based on trigram (sets of 3-characters) comparison. This method, however, does not take into account the semantic aspects of the text and can be easily improved using simple Machine Learning techniques.

We suggest using a technique called word embeddings which consists of assigning to each proposal a multi-dimensional vector, in such a way that similar proposals (in terms of semantics) end up having close vectors. Therefore, the recommendations for a given proposal would be the proposals with the smallest distances between the vectors.

To calculate the vectors associated with each proposal, we suggest using pre-calculated vector embeddings for each word (of those more frequent in the Decidim vocabulary) and then calculating the average of all words appearing in the proposal. The pre-calculation of word vectors could be done offline by any person with medium knowledge of NLP (DataForGoodBCN, the community that has created this proposal, could provide these calculations).

Filter results for: Awaiting funding

Comment

Avatar: Carol Romero

Avatar: Arnau

Avatar: Pablo Aragón

Avatar: txema

Avatar: Ivan Vergés

Avatar: Pierre Mesure

Avatar: Pau Parals

Liked by Carol Romero and 16 more

Liked by

Avatar: Carol Romero Carol Romero Decidim Member

Avatar: Arnau Arnau

Avatar: Pablo Aragón Pablo Aragón Decidim Member

Avatar: txema txema Decidim Member

Avatar: Ivan Vergés Ivan Vergés Decidim Member

Avatar: Pierre Mesure Pierre Mesure

Avatar: Pau Parals Pau Parals Decidim Member

Antoine Gaboriau

Avatar: Platoniq Platoniq Official participant

Avatar: Oliver Azevedo Barnes Oliver Azevedo Barnes

Avatar: Decidim Product Decidim Product Official participant

Avatar: Pauline Bessoles Pauline Bessoles Decidim Member

Xavi Ros Roca

Avatar: Didac Fortuny Didac Fortuny

Laura Portell

Avatar: Felipe Álvarez Felipe Álvarez

Quentin Lp

Comment details

You are seeing a single comment

View all comments

DataForGoodBCN Official participant

13/11/2020 15:49

Hi. First of all, thank you Antoine and Felipe for your valuable comments.
About the last comment, we would need to assess the languages of the proposals in order to use the appropriate word embeddings. For the languages for which they already exist, we could surely evaluate whether the existing ones are good enough, but using them would be an option. As Felipe says, our idea was to use this embeddings to create a new method for calculating the distances to replace the current Postgres methodology.

Essential

Preferences

Analytics and statistics

Marketing

Intelligent recommendations

Liked by

Please log in

Cookie settings

Essential

Preferences

Analytics and statistics

Marketing

Intelligent recommendations

Share

QR Code