The core objective of the DataWay project is to develop generic principles, methods and algorithms encapsulated in a real-time processing platform for Big Data, which supports Smart City applications that will be gathered / routed / stored via mobile devices and processed / diffused via a more standard Cloud. The main scope of this platform is to extract valuable data to be used in real-time decision taking. We describe these data obtained by different techniques of data reducing, data cleaning, and data aggregation. The validation of the proposed processing platform we will use applications that assist transportation system in cities (e.g. traffic optimization) and crowd management (reliable communication, multimedia data sharing, path formation, message delivery, etc.).

The project is expected to bring substantial innovative contributions with respect to the following specific objectives:

  • O1: Real-Time Analytics and Ad-hoc Analytics based on context data collected from Smart Cities Applications, by exploring large scale stream data processing in Cloud-based systems architectures, able to scale up for a big amount of data;
  • O2: Pattern recognition and pattern extraction considering different aspects about produced data: data locality, data context, data value computed by different ranking methods, user feedback, the relevance of data for a specific processing requests, volatile data that must be process as they are produces and aggregate the extracted knowledge;
  • O3: Analytics for Big Data focusing on reduction and cleaning methods. The main techniques are based on: statistical models used in data analysis (e.g. kernel estimation); machine learning techniques: supervised and un-supervised learning, classification and clustering, k-means, multi-dimensional scaling; ranking techniques: PageRank, recursive queries, etc.; latent semantic analysis; filtering techniques: collaborative filtering, multi-objective filtering; self-* techniques (self-tuning, self-configuring, self adaptive, etc.), and data mining techniques;
  • O4: Data management and sharing architecture enabling high-throughput data processing, high concurrent accesses, mobility support, real-time access guarantees and persistency for collaborative, context-aware smart-cities applications on large-scale infrastructures;
  • O5: Cloud-based storage layer for data collected from a various amount of sources. Cloud will be considered in the context of DataWay as a platform for data collecting and storage directly form user devices; We will investigate the suitable solution of using a specific Cloud platform.
  • O6: Specific mechanisms for data management self-optimization based on topology awareness. The model of decentralized Peer-to-Peer network structured by overlays to support data distribution and data processing in dynamic environments will be considered.

To achieve these objectives we will consider that DataWay platform will be connected to multiple streams of data, analyze data as flows and deliver data to a Cloud infrastructure to be processed and stored (Hadoop, NoSQL, dashboards, etc.) The following scientific aspects will be considered:

  • (i) new data management methods supported by elastic data processing;
  • (ii) decomposition of processing queries offering solutions for many task computing: workload scheduling, load-balancing, data distribution and, after processing data assembling;
  • (iii) dynamic resource provisioning;
  • (iv) machine learning techniques that will assist the decision taking actions.