MIT startup DataCebo offers tool to evaluate synthetic data

INDONESIAKININEWS.COM -  MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) spin-off DataCebo is offering a new tool, dub...

MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) spin-off DataCebo is offering a new tool, dubbed Synthetic Data (SD) Metrics, to help enterprises compare the quality of machine-generated synthetic data by pitching it against real data sets.

The application, which is an open-source Python library for evaluating model-agnostic tabular synthetic data, defines metrics for statistics, efficiency and privacy of data, according to Kalyan Veeramachaneni, MIT’s principal research scientist and co-founder of DataCebo.

“For tabular synthetic data, it's necessary to create metrics that quantify how the synthetic data compares to the real data. Each metric measures a particular aspect of the data—such as coverage or correlation—allowing you to identify which specific elements have been preserved or forgotten during the synthetic data process,” said Neha Patki, co-founder of DataCebo.

Features such as CategoryCoverage and RangeCoverage can quantify whether an enterprise’s synthetic data covers the same range of possible values as real data, Patki added.

“To compare correlations, the software developer or data scientist downloading SDMetrics can use the CorrelationSimilarity metric. There are a total of over 30 metrics and more are still in development,” said Veeramachaneni.

Synthetic Data Vault generates synthetic data

The SDMetrics library, according to Veeramachaneni, is a part of the Synthetic Data Vault (SDV) Project that was first initiated at MIT's Data to AI Lab in 2016. From 2020, DataCebo owns and develops all aspects of the SDV.

The Vault, which can be defined as synthetic data generation ecosystem of libraries, was started with the idea to help enterprises create data models for developing new software and applications within the enterprise.

“While there is a lot of work going around in the area of synthetic data, especially in autonomous driving cars or images, little is being done to help enterprises take advantage of it,” Veeramachaneni said.

“The SDV was developed to ensure that enterprises can download the packages for generating synthetic data in cases where no data was available or there was a chance of putting data privacy at risk,” Veeramachaneni added.

Under the hood, the company claims to use several graphical modeling and deep learning techniques, such as Copulas, CTGAN and DeepEcho, among others.

Copulas, according to Veeramachaneni, has been downloaded over a million times and models using thr technique are being used by large banks, insurance firms and companies that are focusing on clinical trials.

The CTGAN, or neural network-based model, has been downloaded over 500,000 times.

Other data sets that have multiple tables or time-series data is also supported, the DataCebo founders said.

Source: infoworld


Baerita,2,Berita,23964,Cek Fakta,3,H,151,HUMOR,7,Internasional,1000,Kesehatan,29,Nasional,23000,News,1361,OPINI,81,Politik,6,Seleb,3,Tekno,1,Viral,3,
IndonesiaKiniNews.com: MIT startup DataCebo offers tool to evaluate synthetic data
MIT startup DataCebo offers tool to evaluate synthetic data
Loaded All Posts Not found any posts VIEW ALL Selengkapnya Balas Cancel reply Hapus Oleh Beranda Halaman Postingan View All RECOMMENDED FOR YOU LABEL ARCHIVE CARI ALL POSTS Not found any post match with your request KEMBALI KE BERANDA Minggu Senin Selasa Rabu Kamis Jum'at Sabtu Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy