The Center for Complex Engineering Systems (CCES) is a multidisciplinary research center where data scientists investigate problems from different domains. The problem-solving process can be staged into three phases: data exploration, data analytics and simulation, and data visualization. In each of these phases, there are challenges that data scientists must overcome to successfully perform the desired analysis task.
First, many of today’s datasets are of large sizes and have different formats such as structured data, semi-structured data, geo-spatial data, graph data, and others. Second, answering complex computation questions may require bringing together different simulation and analytical models from across domains. Models are usually built independently without the intention of it being a part of a bigger whole. For this purpose, data scientists would have to manage data formatting issues, understand data context, and resolve dependencies of the data and of the tools, and finally make sure the parts from a coherent whole. Third, developers and even novice users should be able to intuitively interact with our data and models. This requires data scientists, researchers, and designers to visualize the data or the answers for the complex computation questions using different visualizations such as charts, geospatial maps, tree maps, networks, and others.
In this project, we are developing a platform that incorporates a set of data management and analytics tools to facilitate the process of joining the independent works of the data scientists, addressing both usability and performance issues. The goal of this project as such is twofold. First, the platform aims to bridge the gap between the three-phases in the workflow that is typically used when performing data analytics: managing the data, analyzing it, and visualizing the analysis results. We plan to do that by streamlining those three phases to alleviate the burden from data scientists, and to use parallel processing on shared-nothing architecture to enhance performance. Second, the platform aims to provide standards and infrastructure to simplify the process of combining different computational models to answer complex questions.
Complex systems are inherently multidisciplinary in which interested entities approach problems differently and look at the same data from different perspectives. More often than not, entities manage their data and format is differently. We, therefore are building a general-purpose computation platform that leverages and draws ideas from existing systems to meet researchers' and data scientists' needs. This platform can be used to build several models that are computational and data intensive. We also plan to develop an application (built on top of the platform) to easily answer complex questions that may involve many complex models.