Translations:Efficient Estimation of Word Representations/19/en

    From Marovi AI
    Revision as of 21:40, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)

    Computing the full softmax over a large vocabulary is prohibitively expensive. The paper used hierarchical softmax with a Huffman tree to reduce the complexity from $ O(V) $ to $ O(\log V) $. A follow-up paper introduced negative sampling as a simpler and often more effective alternative.