Evaluation without the truth: Using synthetic data to provide ‘gold-standard’ data for comprehensive evaluation

Mr Tom Dalton –  17 July, 2017 


In data-intensive research areas, we are working with larger datasets than ever before. It has been necessary to develop new approaches and methodologies to work with these datasets. However, these large datasets rarely contain ground truth data against which we may evaluate our new methodologies. This talk explores the creation and use of synthetic data to provide an evaluation approach for new methodologies.

The talk will discuss the general requirements of evaluation and how this effects the required characteristics of the synthetic data. As an example, we will also present our work to create synthetic genealogical populations to provide ‘gold-standard’ data for evaluation of record linkage in the area of population reconstruction.