Through the increasing use of interconnected sensors, industrial operations and physical systems are generating ever increasing volumes of data of many different types. Opportunities to apply analytics over these rich types of data abound in many industries (e.g., manufacturing, power distribution, oil and gas exploration and production, telecommunication, healthcare, agriculture, mining, smart cities, public transportation).
In developing several such applications over the years, we have come to realize that existing benchmarks for decision support, streaming data, event processing, or distributed processing are not adequate for industrial big data applications. One primary reason is that these benchmarks individually address a narrow range of data and analytics processing needs. In this talk, we outline an approach we are taking to defining a benchmark that is motivated by typical industrial operational scenarios. We describe the main issues we are considering for the benchmark, including the typical data and processing requirements; representative queries and analytics operations over streaming, stored, structured and unstructured data; and some early experimental results in implementing the benchmark on different application architectures.
Umesh has been with the Research and Development Division of Hitachi America, Ltd. since 2013. He is responsible for research and technology innovation in big data, advanced analytics, and AI, leading to the creation of novel solutions in various industries, including manufacturing, energy, natural resources, healthcare, and transportation. Prior to joining Hitachi America, Umesh was an HP Fellow and Director of the Information Analytics Lab at Hewlett-Packard Labs, Palo Alto, California. He received his Ph.D. in Applied Mathematics from Harvard University. He is an ACM Fellow, and he received the Edgar F. Codd Award in 2010 from the ACM Special Interest Group on Data Management (SIGMOD). He has published over 240 research papers and holds over 60 patents.
In this talk, we will review the landscape of data analytics platforms in the cloud. In particular, we will describe Microsoft’s cloud services in this segment, including their support for Machine Learning and AI. In the second part of the talk we will reflect on some of the open challenges where we need more innovations in our quest to accelerate the exploration of insights in data. Specifically, we will describe the roles Self-Service Data Transformations and Approximate Query Processing techniques can play in this regard.
Surajit Chaudhuri is a Distinguished Scientist at Microsoft Research in Redmond (USA) and leads the Data Management, Exploration and Mining group. He also works closely with Microsoft’s Azure Data division. Surajit’s current areas of interest are Big Data platforms, self-manageability, and cloud database services. Working with his colleagues in Microsoft Research, he helped incorporate the Database Engine Tuning Advisor and Data Cleaning technology in Microsoft SQL Server. Surajit is an ACM Fellow, a recipient of the ACM SIGMOD Edgar F. Codd Innovations Award, ACM SIGMOD Contributions Award, a VLDB 10-year Best Paper Award, and an IEEE Data Engineering Influential Paper Award.
In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule for trading memory to reduce disk I/O using the then-current price-performance characteristics of DRAM and Hard Disk Drives (HDD).
Since then, the five-minute rule has gained wide-spread acceptance as an important rule-of-thumb in data engineering.
In this talk, I will revisit the five-minute rule three decades since its introduction and use it to identify impending changes in today's multi-tier storage hierarchy given recent trends in the storage hardware landscape.
I will investigate the impact of the five-minute rule -- explicit or implicit -- on the way we perform analytics.
We will see that the rule applies both in the bottom tiers of the hierarchy, which is based on new Cold Storage Devices (CSD), but also in main-memory databases, where researchers have been working on hot-cold data separation and on heterogeneity-aware caching techniques.
Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland and the co-founder of RAW Labs SA, a swiss company developing real-time analytics infrastructures for heterogeneous big data. Her research interests are in data-intensive systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating data management to support computationally- demanding, data-intensive scientific applications. She has received an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), nine best-paper awards in database, storage, and computer architecture conferences, and an NSF CAREER award (2002). She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is an ACM fellow, an IEEE fellow, and an elected member of the Swiss National Research Council. She has served as a CRA-W mentor, and is a member of the Expert Network of the World Economic Forum.
Digital business is built on new computing infrastructure – the pillars of mobile, cloud, big data, and analytics – accelerated by the Internet of Things (IoT), advances in machine learning, and innovations like blockchain, VR/AR, and 3D printing. These technologies enable companies to transform business models and create new services. The new digital business needs to be agile, yet robust. Distributed, while interconnected. Open, yet secure. Simple, but intelligent. Among these emerging technologies, real-time data processing with intelligence is one of the central components of modern enterprise in the new digital era. In this talk will start with the driving force for digital transformation and list the industries that will benefit from digital transformation; followed by how the customer requirements will impact technical designs of cloud applications. Finally, the trend of AI/machine learning and how they can drive the digital transformation towards the intelligent enterprises will be presented.
Dr. Wen-Syan Li is Senior Vice President of SAP SE. He is responsible for developing new cloud applications for digital supply chain and strategic engagements with key accounts such as Huawei, NTT, Intel, etc. He was responsible for building Predictive Analytics capabilities in SAP’s in-memory database HANA. He received a Ph.D. degree in Computer Science from Northwestern University (USA). He also has an MBA degree in Finance. Dr. Li’s research interests include databases, distributed computing, data mining, machine learning, optimization & scheduling, and IoT. Before joining SAP, he was with IBM Almaden Research Center, NEC Research, and NEC Venture Capital in USA. He has co-edited 3 books published by Springer, co-authored more than 100 journal articles and conference papers in various areas, and co-invented more than 100 granted/pending US patents.
Online advertising powers most of the free web experiences we take for granted, ranging from search engines to social networks to news to YouTube. Online advertising is in turn powered by machine learning and data mining algorithms. In this talk, I will provide a real world perspective on applying machine learning algorithms, including deep learning, to online advertising. I will also present examples that illustrate the importance of striving for a deeper understanding beyond model accuracy, and thinking about the context and ecosystem in which the models are used.
Enterprises are increasingly adopting APIs as the interfaces with which applications leverage their capabilities. On the one hand, this opens up demand for their capabilities, but on the other, it also opens up new interfaces for attacks. In this talk we will present some of the technical details of the data and ML pipeline for a bot detection engine called Apigee Sense that we have built that attempts to reduce the attack vector.
We will also talk of a couple of other places where we are leveraging ML to improve the API experiences.
The talk will be of interest for people who want to see data management and ML techniques applied to APIs.
Anant Jhingran (PhD Berkeley) leads Products for API Management @ Google. In his role, he looks after technologies to build API driven ecosystems and platforms. Prior to this role, he was the CTO at Apigee, where he helped build out the leading API Management platform used by hundreds of enterprise customers. Apigee got acquired by Google in November 2016.
He joined Apigee from IBM where he was an IBM Fellow, VP and CTO for IBM's Information Management Division, responsible for the technology in databases, information integration, big data, content management and analytics. In his earlier roles at IBM, he was the head of Computer Science at IBM's Almaden Research Center, and helped build out several of IBM's data technologies, including its data warehousing capabilities.
He has received several awards including IBM Fellow, IIT Delhi Distinguished Alumnus Award, President's Gold Medal for highest GPA at IIT Delhi, IBM Academy of Technology, and has authored over a dozen patents and over 20 technical papers, including frequent keynotes in industry and academic conferences.
Data from the World Health Organization reinforces the fact that lifestyle has an important role to play in wellness and good health. Medical practices are focused on treating symptoms of diseases and the incentive structures are not set up to manage healthy communities.
Over the last two years, we have worked closely with medical practitioners and care providers who are incented to manage population groups using data and collaboration. To assist with their efforts, we have built a set of tools – Vega and ShareInsights that leverage modern database technology to help manage healthy communities. I propose to share what we are building and what we have learnt in the process.
In working with medical providers in US like systems, we have observed the following:
Dr. Anand Deshpande is the Founder, Chairman and Managing Director of Persistent Systems since its inception and is responsible for the overall leadership, strategy and management of the Company.
Anand holds a B. Tech. (Hons.) in Computer Science and Engineering from the Indian Institute of Technology (IIT), Kharagpur, and a M.S. and Ph.D. in Computer Science from Indiana University, Bloomington, Indiana, USA. He has been recognized by his alma mater, IIT Kharagpur, as a Distinguished Alumnus in 2012 and by the School of Informatics of Indiana University with the Career Achievement Award in 2007.
Prior to founding Persistent Systems, Anand began his professional career at Hewlett-Packard Laboratories in Palo Alto, California, where he worked as Member of Technical Staff from May 1989 to October 1990.
Anand is married to Sonali and they have a daughter and a son. With members of his family, he has established DeAsra Foundation (http://www.deasra.in), a non-profit entity which focuses on creating self-employment at scale.
As the world’s largest e-commerce platform, Alibaba heavily relies on massive data analysis of many kinds to collect data insights and drive business decisions in real time.
The underlying computing infrastructure unifies different computation paradigms such as batch, interactive, progressive, and stream computation, and supports a large variety of workloads, including various big data analysis, large-scale machine learning, complex computation on massive heterogeneous information network (or graphs). The platform not only supports Alibaba’s internal businesses but also provides solid services to enterprise customers via Alibaba Cloud.
Jingren Zhou is Vice President at Alibaba Group. He is responsible for driving Big Data and AI infrastructure development at Alibaba. Specifically, he leads work to develop cloud-scale distributed computing platform, data analytic products, and various business solutions. He is also leading the search engineering team to develop advanced techniques for personalized search and recommendation at Alibaba's e-commerce platforms, including Taobao and Tmall. His research interests include cloud-computing, distributed systems, databases and large scale machine learning. He received his PhD in Computer Science from Columbia University.