Translations:BERT Pre-training of Deep Bidirectional Transformers/14/en
Input representation combines token embeddings, segment embeddings (indicating sentence A or B), and positional embeddings. bert uses wordpiece tokenization with a 30,000-token vocabulary.