What Is Synthetic Data? Generated Data to Help Your AI Strategy

Artificially generated, synthetic data has the potential to be used in place of historic data in certain use cases. Principal Data Scientist John Blankenbaker was featured in “What Is Synthetic Data? Generated Data to Help Your AI Strategy” in CIO discussing the ways synthetic data can be used.

John notes how synthetic data can be used to fill in gaps in existing data, “Another common problem is to balance out a data set. For example, a historic data set might be composed of 99% non-fraudulent transactions and less than 1% fraudulent ones. Many models will decide that the most successful policy will be to label every transaction as non-fraudulent.”

He continues on to explain the intricacies of using this as a tool, “It will only be useful if the synthesis process captures whatever it is about a transaction that indicates fraud. Which is unlikely to be obvious because then we’d use that as our fraud detector.”

John adds that it could also be useful in testing new software elaborating, “If we want to see how our infrastructure handles a large number of user accounts, it is easy to write a program that connects to our website and signs up synthetic users.”

John Blankenbaker serves as the Principal Data Scientist  at SSA & Company

