Vaswani et al. (2017) introduced the formulation used in the transformer. Given matrices of queries $ Q $, keys $ K $, and values $ V $: