Interactive Large Scale Data Profiling

Mikhail Gubanov
Data profiling is a set of statistical data analysis activities and processes to determine properties of a given dataset.  A dataset has millions of tables, where their metadata (i.e. titles, attribute names and types) becomes abundant, similar to data instances and its profiling.  WebLens is an interactive, scalable metadata profiler for large-scale structured data.  It is a new data structure-metadata-profile coupled with Machine/Deep-Learning models trained to construct it. It represents a metadata summary of a specific real-world object collected over millions of data sources. These profiles significantly simplify access to largescale structured datasets for scientists and end users.