Translations:Word Embeddings/23/en

    From Marovi AI

    where $ \mathbf{v}_w $ and $ \mathbf{v}'_w $ are the input and output embedding vectors. Computing the full softmax over the vocabulary is expensive, so two approximations are commonly used: