Vaswani et al. (2017) introduced the formulation used in the Transformer. Given matrices of queries $ Q $, keys $ K $, and values $ V $: