Synthetic Data Pilot

Do you use proprietary and/or personal data in your research and therefore cannot share it? We may be able to help you with this by creating a synthetic dataset.

Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. It is important to note that this method also reduces disclosure risk to zero, as no record in the synthetic dataset represents a real individual. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

To test the platform, we will run a pilot for 4 months from June till September 2023 (according to our current plan). During this period, we encourage all EUR researchers to use the platform as much as possible.

To use the platform, please register below, so that we can make an environment for you and give you access to it. Only you can access the environment, and no one will be able to access your data.

We are also looking for researchers interested in comparing this platform with other open-source solutions, so if you have experience using other tools, please let us know. We would really appreciate your contribution to this pilot!

*Registration for Syntho Pilot has been closed as of October 2023. Thank you for participating!

Using Syntho

Relevant sections of the product documentation that could assist you in familiarizing yourself with the software:

The Get Started section provides some illustrations and relevant links to other documentation sections on how to get started with the Syntho platform.

For the best possible data utility, we recommend preparing your data as a single entity table. If you must synthesize multiple tables, however, there two alternative options:
1. Automatic key matching: By default, Syntho generates new keys that match other tables’ keys to preserve referential integrity. However, this only ensures matchings keys on the key column level, however relationships between key and non-key columns are not preserved.
2. Entity-table ranking feature: If you want to preserve intrinsic relationships across any 2 related tables, where you also preserve relationships between key and non-key columns, you can use Syntho’s entity-table ranking feature. This Syntho feature is especially valuable if you must synthesize longitudinal (e.g., time series) information. Note, this is still a beta feature, so it has a few limitations, as listed under the section.

Take a visual deep dive to learn how to use Syntho

1. How to configure your table before generating your synthetic dataset

2. Personal Identifiers Identification and configuration; how to use mockers

3. How to generate a synthetic dataset from relational tables

4. Full demo of how to use Syntho, a synthetic data platform (includes previous 3 steps)

More details about the Syntho Platform can be found on the Syntho website here and the SAS Analytics blog here.

Take a visual deep dive to learn how to use Syntho

Research Integrity