The big data amount collected by stations has to be stored and analyzed by researchers. In order to do so, it is possibile to use the cloud’s resources.
The cloud is an IT infrastructure aimed to store data on which software can operate to process and analyze data.
From an IT point of view, the Cagliari2020 project can be described as a social application, This means that there is a big production of raw data to store and analyze in a long period, with the consequent production of a big amount of information.
In order to manage this type of problems, the final infrastructure should show the following features:
– scalability: data and users grow and change;
– elasticity: the infrastructure should adapt to the temporary users request of managing and analysis of data;
– portability: applications must be able to interoperate and / or be easily “moved” between different cloud infrastructures.
The greatest efforts are concentrated in the development of specific tools for the integration and management of the data stored and sent by the stations. This can be done by means of cloud infrastructures.
Cloud resources are not free and they are quite expensive, expecially for what concerns the storage. For this reason, the usage of cloud resources has to be planned in a precise way to guarantee functionality and confidentiality in the processing of personal data stored and managed in private infrastructures.
In the case of Cagliari2020 project, mixed clouds are used: personal data are stored on a private cloud, whereas the analysis and fusion on public cloud. The development of the code for the data acquisition and processing (see image on the left) for the project is based on the creation of an architecture oriented towards microservices with the deployment of applications on containers and interoperability and orchestration performed with kubernets.
The private cloud infrastructure is realized with openstack on INFN servers. The public infrastructure is based both on platforms by Amazon Web Services, and Google cloud.
For what concerns the interoperability and the orchestration of all the actions required to the software, it has been realized a kubernetes cluster with resources distributed on openstack, Amazon Web Services and bare metal. At the same time it has been realized and tested the container of raw data storage, end user data storage and sensorgw.
At the moment the testbed and the data acquisition system are in function.