Translations:Efficient Estimation of Word Representations/19/en
Computing the full softmax over a large vocabulary is prohibitively expensive. The paper used hierarchical softmax with a Huffman tree to reduce the complexity from $ O(V) $ to $ O(\log V) $. A follow-up paper introduced negative sampling as a simpler and often more effective alternative.