*2vec

From Helpful
Jump to navigation Jump to search

word2vec

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

word2vec is one of many ways to put vectors to words, with a distributional hypothesis approach so that it ends up encoding some degree of semantic relations, and refers to two techniques, using either bag-of-words and skip-gram as processing for a specific learner, as described in T Mikolov et al. (2013), "Efficient Estimation of Word Representations in Vector Space", probably the one that kicked off this particular dense-vector idea.


Word2vec could be seen as building a classifier that predicts what words appear in a context, and/or what context appears around a word, which happens to do a decent task of classifying that word.


That paper mentions

Uses skip-grams as a concept/building block. Some people refer to this technique as just 'skip-gram' without the 'continuous',

but this may come from not really reading the paper you're copy-pasting the image from?

seems to be better at less-common words, but slower


(NN implies one-hot coding, so not small, but it turns out to be moderately efficient(verify))


doc2vec

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

doc2vec can be seen as an adaptation of word2vec that deals with larger units like sentences, paragraphs, or entire documents, roughly be considering more context.

Compared to word2vec:

Distributed Memory Model of Paragraph Vectors (PV-DM) is analogous to word2vec's cbow
Paragraph Vector - Distributed Bag of Words (PV-DBOW) is analogous to skip-gram model

...with the largest differences lying in the way that they consider further context.


The fuzzy way that its meaning varies with context also ends up capturing some amount of meaning within the context

top2vec

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

top2vec is a topic modelling method, so it has its own idea about context.

It can be considered an extension to word2vec and doc2vec, in that it uses both word and document vectors to estimate the distribution of topics in a set of documents.

This is one way to do topic modelling, among others.


tok2vec

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


sense2vec

Also

wikipedia2Vec

https://wikipedia2vec.github.io/wikipedia2vec/

https://wikipedia2vec.github.io/demo/

RDF2vec