Open Data Integration

Renée J. Miller


Open Data plays a major role in open government initiatives. Governments around the world are adopting Open Data Principles promising to make their Open Data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However scientists generally do not have a priori knowledge about what data is available (its schema or content), but will want to be able to use Open Data and integrate it with other public or private data they are studying. Traditionally, data integration is done using a framework called “query discovery” where the main task is to discover a query (or transformation script) that transforms data from one form into another. The goal is to find the right operators to join, nest, group, link, and twist data into a desired form. In this talk, I introduce a new paradigm for thinking about Open Data Integration where the focus is on “data discovery”, but highly efficient internet-scale discovery that is heavily query-aware. As an example, a join-aware discovery algorithm finds datasets, within a massive data lake, that join (in a precise sense of having high containment) with a known dataset. I describe a research agenda and recent progress in developing scalable query-aware data discovery algorithms.


Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University. She is a Fellow of the Royal Society of Canada, Canada’s National Academy of Science, Engineering and the Humanities. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her co-authors (Fagin, Kolaitis and Popa) received the (10 Year) ICDT Test-of-Time Award for their influential 2003 paper establishing the foundations of data exchange. Professor Miller has led the NSERC Business Intelligence Strategic Network and was elected president of the non-profit Very Large Data Base Foundation. She is an Editor-in-Chief of the VLDB Journal. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.

Healthcare Transformation from Data and System Perspectives

Beng Chin Ooi(National University of Singapore)


While AI and data-driven approaches are still evolving, they are likely to surpass current medical practices in the healthcare domain soon. The potential advantages are not only faster and more accurate analysis, but also the democratization of healthcare services. Notwithstanding, there are some common challenges when applying existing approaches onto the healthcare domain, due to the noise and bias of electronic health records (EHR), complex and heterogeneous feature relations, access control and data privacy and etc. In this talk, I discuss our design and implementation strategies: solve common challenges, instill domain knowledge, automate knowledge extraction, and enable system-based global optimization. I discuss our rationale on building a general analytics stack instead of solving individual problems, and explain how these challenges are being addressed. Several detailed technologies from both system and algorithm perspectives in our healthcare data management and analytics framework are also described. Finally, we introduce our healthcare analytics stack and MediLOT blockchain system, with the hope of playing a role in healthcare transformation.


Beng Chin is a Distinguished Professor of Computer Science, NGS faculty member and Director of Smart Systems Institute (SSI@NUS) at the National University of Singapore (NUS), an adjunct Chang Jiang Professor at Zhejiang University, China, and the director of NUS AI Innovation and Commercialization Centre at Suzhou, China. He obtained his BSc (1st Class Honors) and PhD from Monash University, Australia, in 1985 and 1989 respectively.

Beng Chin's research interests include database systems, distributed and blockchain systems, and large scale analytics, in the aspects of system architectures, performance issues, security, accuracy and correctness. He works closely with the industry (eg. NUHS, Jurong Health, Tan Tok Seng Hospital, Singapore General Hospital, KK Hospital on healthcare analytics and prediebetes prevention), and exploits IT for efficiency in various appplication domains, including healthcare, finance and smart city. He is a co-founder of yzBigData(2012) for Big Data Management and analytics, and Shentilium Technologies(2016) for AI- and data-driven Financial data analytics, MediLot Technologies(2018) for blockchain based healthcare data management and analytics, an advisor of a RegTech company, Cynopsis Solutions, and an advisor to blockchain based KYC ICO. Beng Chin serves as a non-executive and independent director of ComfortDelgro, a transportation company, and a member of Hangzhou Government AI Development Committee (AI TOP 30)

Beng Chin is a fellow of the ACM , IEEE, and Singapore National Academy of Science (SNAS).

Data for Journalism

Cong Yu and Jure Leskovec