Translations:Searching for Activation Functions/27/en
After release, Swish was added to mainstream frameworks (e.g. tf.nn.swish) and adopted in production architectures such as EfficientNet. The variant Hard Swish — a piecewise-linear approximation defined as $ x \cdot \mathrm{ReLU6}(x + 3)/6 $ — was introduced in MobileNetV3 to recover Swish's accuracy gains while being cheap on mobile hardware. GELU itself was later popularized by BERT and the GPT family, where it became the default activation in Transformer feed-forward blocks, vindicating the broader category that Swish helped make mainstream.