Privacy Preserving Synthetic Data Generation with LLMs
Master Thesis
This is a broader topic which aims to propose a privacy-preserving strategy to generate synthetic text data with LLMs while resembling the sensitive data distributions. The thesis touches on topics like memorization, text anonymization and differential privacy. The student should ideally start by benchmarking existing synthetic data generation techniques. Then, the work involves implementing differential privacy mechanisms with LLMs and evaluating the privacy-utility tradeoff. More structured RQs can be discussed depending on the student's interests.