Translations:BERT Pre-training of Deep Bidirectional Transformers/14/en

Input representation combines token embeddings, segment embeddings (indicating sentence A or B), and positional embeddings. BERT uses WordPiece tokenization with a 30,000-token vocabulary.