Translations:Efficient Estimation of Word Representations/19/en

Computing the full softmax over a large vocabulary is prohibitively expensive. The paper used hierarchical softmax with a Huffman tree to reduce the complexity from $$ O(V) $$ to $O(\log V)$ . A follow-up paper introduced negative sampling as a simpler and often more effective alternative.