An Evaluation of Synthetic Data Generators Implemented in the Python Library Synthcity
Abstract
"Generating synthetic data has never been so easy. With the increasing popularity of the approach more and more R packages and Python libraries offer ready-made synthesizers that promise generating synthetic data with almost no effort. These synthetic data generators rely on various modeling strategies, such as generative adversarial networks, Bayesian networks or variational autoencoders. Given the plethora of methods, users new to the approach have an increasingly hard time to decide where to even start when exploring the possibilities of synthetic data. This paper aims at offering some guidance by empirically evaluating the analytical validity of 12 different synthesizers available in the Python library synthcity. While this comparison study offers only a small glimpse into the world of synthetic data (many more synthetic data generators exist and we also only rely on the default settings when training the various models), we still hope the evaluations offer some useful insights regarding the performance of the different synthesis strategies." (Author's abstract, IAB-Doku, © Springer) ((en))
Cite article
Fössing, E. & Drechsler, J. (2024): An Evaluation of Synthetic Data Generators Implemented in the Python Library Synthcity. In: J. Domingo-Ferrer & M. Önen (Hrsg.) (2024): Privacy in Statistical Databases 2024, p. 178-193, accepted on June 21, 2024. DOI:10.1007/978-3-031-69651-0_12