A common heuristic is to use a learning rate 10–100x smaller for pretrained layers than for the new classification head, preventing catastrophic forgetting of learned representations.