Synthetic Data Generation for Machine Learning

Special seminar of the Visgraf Lab

Training machine learning models often requires a very large amount of data. Although we generate millions of terabytes of data every day, there isn’t enough data to accurately address many tasks. Collecting and organizing large volumes of data for each type of application is already a considerable challenge for many individuals and organizations. However, whether for technical or ethical reasons, to conduct research and create artificial intelligence applications, it is necessary to take extra care in the construction of these datasets, such as categorizing and balancing the samples and respecting people’s privacy.

Using synthetic data for model training can be a solution to overcome several of the problems related to constructing datasets. Recent studies point to the possibility of using synthetic data to train models that perform well when applied to real data. In this series of three seminars, we will present: advantages and limitations of this method, in particular, in the areas of computer vision and computer graphics; how synthetic data can be generated on a large scale using the computer graphics tools available today; and how this idea expands to machine learning within simulations.

Why does synthetic data matter?
26 de Maio de 2021

In this seminar, we will present some problems that motivate the use of synthetic data and discuss technical issues for...

How to Generate Synthetic Data and Train a Model with it?
2 de Junho de 2021

In this presentation, we will discuss how to run experiments using the Unity platform to generate synthetic data to train...

Do They Live in a Simulation? Training Models for Dynamic Environments
9 de Junho de 2021

In this third and final presentation, we will see that synthetic data applications are not limited to static situations.