Translations:Word Embeddings/23/en
where $ \mathbf{v}_w $ and $ \mathbf{v}'_w $ are the input and output embedding vectors. Computing the full softmax over the vocabulary is expensive, so two approximations are commonly used:
where $ \mathbf{v}_w $ and $ \mathbf{v}'_w $ are the input and output embedding vectors. Computing the full softmax over the vocabulary is expensive, so two approximations are commonly used: