Avanade interview question

How to identify duplicates using Spark SQL from a datasets