*2vec
word2vec
word2vec is one of many ways to put vectors to words, with a distributional hypothesis approach so that it ends up encoding some degree of semantic relations, and refers to two techniques, using either bag-of-words and skip-gram as processing for a specific learner, as described in T Mikolov et al. (2013), "Efficient Estimation of Word Representations in Vector Space", probably the one that kicked off this particular dense-vector idea.
Word2vec could be seen as building a classifier that predicts what words appear in a context,
and/or what context appears around a word, which happens to do a decent task of classifying that word.
That paper mentions
- its continuous bag of words (cbow) variant predicts the current word based on the words directly around it (ignoring order, hence bag-of-words(verify))
- its continuous skip-gram variant predicts surrounding words given the current word.
- Uses skip-grams as a concept/building block. Some people refer to this technique as just 'skip-gram' without the 'continuous',
but this may come from not really reading the paper you're copy-pasting the image from?
- seems to be better at less-common words, but slower
(NN implies one-hot coding, so not small, but it turns out to be moderately efficient(verify))
doc2vec
doc2vec can be seen as an adaptation of word2vec that deals with larger units like sentences, paragraphs, or entire documents, roughly be considering more context.
Compared to word2vec:
- Distributed Memory Model of Paragraph Vectors (PV-DM) is analogous to word2vec's cbow
- Paragraph Vector - Distributed Bag of Words (PV-DBOW) is analogous to skip-gram model
...with the largest differences lying in the way that they consider further context.
The fuzzy way that its meaning varies with context also ends up capturing
some amount of meaning within the context
top2vec
top2vec is a topic modelling method, so it has its own idea about context.
It can be considered an extension to word2vec and doc2vec, in that it uses both word and document vectors to estimate the distribution of topics in a set of documents.
This is one way to do topic modelling, among others.
tok2vec
sense2vec
Also
wikipedia2Vec
https://wikipedia2vec.github.io/wikipedia2vec/
https://wikipedia2vec.github.io/demo/