Distributed Object Management

Primary tabs

Managing objects from the datacenter to the edge to facilitate application development and improve performance.

Summary

Traditionally, in datacenters persistent storage has been placed behind a storage area network (SAN) for scaling and management purposes. However, in order to reduce the increasing costs of data transfers between compute and storage resources, a current trend is to bring computation closer to the storage nodes, in order to improve performance and reduce energy footprint.

Object stores are considered to be the future of storage systems for exascale deployments, given that conventional file systems run up against severe scalability limitations. From this perspective, the goal is to extend the functional features of existing systems by providing objects with semantics and enabling the addition of associated user-defined behavior to support in-situ data transformations (e.g. decrypt on read, filtering...) to reduce data movements and improve application performance. By encapsulating data into objects, which can range from blobs to complex data structures with relationships and methods, data can be managed by applications as if it was in memory. Under this abstraction, a single platform can be used to seamlessly manage both persistent and volatile data, and to enable applications to transparently access and manipulate data regardless of its location, making them agnostic of data distribution and communication details while exploiting data locality.

This idea of bringing computing and data closer within a datacenter can be extrapolated to the edge-to-cloud continuum, considering the whole set of available resources from the edge to the cloud as a single infrastructure. In this way, the same ideas that are successful within the bounds of a datacenter can be applied to the whole set of resources from the edge to the cloud. Processing data near its source becomes even more important in this setting, since it reduces communications and data transfers in an environment where communication channels are much more unstable and slower than those in a datacenter.

In this context, the goal is to provide a data management solution adapted to the edge-to-cloud requirements (unstable and slow communications, heterogeneous and constrained devices,...) and fully integrated with the programming model, in order to provide applications with an homogeneous interface to access heterogeneous data (from different sources and in different formats), agnostic of the state of the data (persistent, volatile, streaming) and of its current location (from a device at the edge to a node in a datacenter).

Objectives

  • Facilitate application development by providing transparent access to data, regardless of its location (local or remote) and its state (persistent or in-memory).
  • Improve performance of applications accessing large datasets by working in combination with parallel programming models such as (Py)COMPSs, and further facilitating programmability with extended features such as smart iterators or HDF5 interfaces.
  • Exploit the benefits of new storage technologies such as NVRAMs, leveraging object stores to provide a friendly interface to applications and to transparently handle in-memory compute capabilities across the various layers of the memory and storage hierarchy.
  • Take advantage of concepts and techniques traditionally used in HPC to leverage the vast amount of computing and storage resources potentially available in edge-to-cloud ecosystems.