Abstract: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe four scientific directions we are investigating in building and using such a graph, namely, harvesting product knowledge from the web, hands-off-the-wheel knowledge integration and cleaning, human-in-the-loop knowledge learning, and graph mining and graph-enhanced search. This talk will present our progress to achieve near-term goals in each direction, and show the many research opportunities towards our moon-shot goals.
Bio: Xin Luna Dong is a Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, published 70+ papers in top conferences and journals, and given 30+ keynotes/invited-talks/tutorials. She got the VLDB Early Career Research Contribution Award for advancing the state of the art of knowledge fusion, and got the Best Demo award in Sigmod 2005. She serves in VLDB endowment and PVLDB advisory committee, is the PC co-chair for Sigmod 2018 and WAIM 2015, and serves as an area chair for Sigmod 2017, CIKM 2017, Sigmod 2015, ICDE 2013, and CIKM 2011.
Integrating Analytics with Relational Databases - Mark Raasveldt (Centrum Wiskunde & Informatica)
Capitalizing on Hierarchical Graph Decomposition for Scalable Network Analysis - Rakhi Saxena (Deshbandhu College, University of Delhi)
Detecting Spatial Clusters of Infection Risk with Geo-Located Social Media Data - Roberto Souza (Universidade Federal de Minas Gerais)
Workload-Aware Discovery of Integrity Constraints for Data Cleaning - Eduardo Pena (Universidade Federal do Paraná)
Unsupervised Ensembles for Outlier Detection - Guilherme Campos (Universidade Federal de Minas Gerais)
Marta Mattoso (COPPE/UFRJ)
M. Tamer Özsu (University of Waterloo)
Sihem Amer-Yahia (Laboratoire d’Informatique de Grenoble)
Senjuti Basu Roy (New Jersey Institute of Technology) - Moderator
Vecstra: An Efficient and Scalable Geospatial In-Memory Cache - Yiwen Wang (University of Copenhagen)
Dynamic Database Operator Scheduling for Processing-in-Memory - Tiago Kepe (Universidade Federal do Paraná)
Multi-Core Allocation Model for Database Systems - Simone Dominico (Universidade Federal do Paraná)
Spatial Indexing on Flash-Based Solid State Drives - Anderson Chaves Carniel (University of São Paulo)
Reducing the Footprint of Main Memory HTAP Systems: Removing, Compressing, Tiering, and Ignoring Data - Martin Boissier (Hasso Plattner Institute)
Progressive Indices : Indexing Without Prejudice - Pedro Holanda (Centrum Wiskunde & Informatica)
Auditing DBMSes through Forensic Analysis - James Wagner (DePaul University)
Relating educational materials via extraction of their topics - Márcio Saraiva (University of Campinas)
Parameter Curation and Data generation for Benchmarking Multi-model Queries - Chao Zhang (University of Helsinki)
Self-Driving: From General Purpose to Specialized DBMSs - Jan Kossmann (Hasso Plattner Institute)