Re – representing the dict

I continued the projet about getting a visual representation of a whole word2vec data about terms. Following my first experimentation about this, I was wondering how to get it on bigger dataset. So I made another try on this question. Instead of a whole dataset, the algorithm make a cluster around a specific word. It show the distance between this word by fading and scaling words away.

Similar to “Gauffre”

It is important to note that there is a conjecture between similarity to the word and distance in the space, as wanted, but it doesn’t match each time : some words farer are not necessary darker, it is due to the fact that the PCA algorithm has reduced dimension and kind of reduced the meaning of the word. Whereas the similarity, translated in size and color came from the initial word2vec algorithms.