Apple interview question

Easy to medium Leetcode. Explain Vision Transformers. Autoencoders.