The landmark paper introducing the Transformer architecture.
The original ResNet paper that allowed training of very deep neural networks.