The data age has arrived. From crowd-sourced product reviews to real-time traffic alerts, “Big Data” has become a regular part of our daily lives … the vast majority of existing data has been generated in the past few years, and today’s explosive pace of data growth is set to continue. In this setting, “data science” -- the ability to extract knowledge and insights from large and complex data sets -- is fundamentally important.
- DJ Patil, Forment US Chief Data Scientist
Historically, the two dominant paradigms for scientific discovery have been theory and experiments, with large-scale computer simulations emerging as the third paradigm in the 20th century. Over the past decade, a new paradigm for scientific discovery is emerging due to the availability of exponentially increasing volumes of data from a variety of sources, including people (e.g., online social networks, mobile devices and Internet of Things), businesses (e.g., business analytic tools), large scientific instruments (e.g., high-throughput DNA sequencing devices, telescopes, and colliders), etc. This trend is popularly referred to as “Big Data”. However, generation of data by itself is not of much value unless the data can also lead to knowledge and actionable insights. Thus, the fourth paradigm dubbed “Data Science”, which seeks to exploit information buried in massive data sets to drive scientific discovery, has emerged as an essential complement to the three existing paradigms, with a widespread and growing set of applications in almost all disciplines, including health informatics, personalized medicine, computational biology, intelligent edge computing and IoT, intelligent transportation, urban and geospatial analytics, and business analytics, to name a few. The complexity and challenge of the fourth paradigm arises from the increasing velocity, variety, and volume of data. Data science is considered as a new science, in development, to address such Big Data challenges; while it borrows principles from computer science, mathematics, and other disciplines, it is considered a new science with independent principles and novel methodologies.
In this talk, we will first review some basics of data science. Thereafter, I will provide an overview of a number of data science research projects at my laboratory, Big Data Management and Mining (BDLab).
Farnoush Banaei-Kashani is currently an associate professor at the Department of Computer Science and Engineering, University of Colorado Denver, where he directs three US DoEd GAANN PhD Fellowship Programs in "Big Data Science and Engineering", “Data-Driven Cyber Security”, and “Infrastructure Informatics” as well as an MS track in "Data Science in Biomedicine". Dr. Banaei-Kashani is passionate about performing fundamental research toward building practical, large-scale data-intensive systems, with particular interest in Data-driven Decision-making Systems (DDSs), i.e., systems that automate the process of decision-making by applying data scientific solutions to (big) data. Toward this end, he has organized his research and education activities around two tracks: a Data Science track and a Data Management, Mining and Modeling (i.e., machine learning/AI) track. With the Data Science track, his team engages with real-world problems that can benefit from data scientific solutions, consisting of all data life-cycle components from data collection and extraction, to management and querying, to learning and mining, to visualization and storytelling, all given various combinations of the V3 Big Data challenges. In particular, his lab has experienced with a number of data-driven decision-making systems (DDSs) from various application areas, such as health informatics, personalized medicine, computational biology, intelligent edge computing and IoT, intelligent transportation, smart infrastructures, and geospatial analysis. The Data Science track complements the Data Management, Mining and Modeling track by providing practical real-world problems, which Dr. Banaei-Kashani's team generalizes, formalizes, and rigorously studies as novel data management and mining as well as machine learning/AI problems. Dr. Banaei-Kashani has published more than 70 referred papers and his research has been supported by grants from both governmental agencies (NIH, NSF, DOE, DOD, DOT, DoEd, DOJ and NASA) and industry (Google, IBM, Chevron, Intel and UnitedHealth).