[ONLINE] PATC: Introduction to Big Data Analytics

Date: 07/Feb/2022 Time: 09:30 - 11/Feb/2022 Time: 13:30


Online via Zoom

Target group: For trainees with some theoretical and practical knowledge;

Cost: There is no registration fee. The course is free of charge.

Primary tabs

Day 1 (Feb 7th)

9:30 – 13:00 Introduction to Big Data (Josep Lluis Berral, Computer Sciences - Data Centric Computing, BSC)
In this session we will introduce the students in the technologies associated with Big Data: data challenges, cloud computing, processing, and internet of things. An overview of the technologies will be provided, both from a technical and from a business model point of view.

11:00 - 11:30 Coffee break
13:00 – 14:00 Lunch Break

14:00 – 16:00 Practical Data Analytics for Solving Real World Problems (Patricio Reyes, Researcher, BSC; Maria Teresa Grifa, Data Scientist, Bridgestone EMA)
Data analytics has changed the way we make decisions. We see the benefits and the advances in many fields that go from financial to medical and industrial applications due to the integration of advanced data analytics. In this course we will propose practical tips gained through our experience at BSC in data analytics projects. We will also discover how to overcome some of the most challenging tasks in practical data analytics.

16:00 – 16:30 Coffee break

16:30 – 18:00 Hands-on (Patricio Reyes, Researcher, BSC; Maria Teresa Grifa, Data Scientist, Bridgestone EMA)
In this session you will learn how to structure a data analytics project, by following the methodology and the concepts introduced in the previous session. We will guide you through a step-by-step process to set up data science projects and start collaborating with the members of a team.

Day 2 (Feb 8th)

9:30 – 13:00 Big Data Management (Albert Abelló, UPC, inLab FIB and Petar Jovanovic, UPC)
Big Data has many definitions and facets, we'll pay attention to the problems we have to face to store it and how we can process it. More specifically, we'll focus on the Apache Hadoop ecosystem and its two basic components, namely HBase and MapReduce engine.

11:00 - 11:30 Coffee break

Hands-on exercise
13:00 – 14:00 Lunch Break

14:00 - 16:00 NoSQL databases (Oscar Romero, Dept. of Service and Information System Engineering, UPC-BarcelonaTech)
The relational model has dominated data storage systems since the mid 1970s. However, the changing storage needs over the past decade have given rise to new models for storing data, collectively known as NoSQL. In this presentation, we will focus on two of the most common types of NoSQL databases: document-oriented databases and graph databases and explain the use cases suitable for each of them.

16:00 - 16:30 Coffee break

16:30 - 18:00 Multidisciplinary research and data analytics: Cultural Heritage (Maria Cristina Marinescu / Joaquim More / Artem Rashetnikov, Computer Applications in Science&Engineering, BSC)
This session will focus on Cultural Heritage as an example of a field that can really take advantage of integrating, analyzing, and reasoning with large amounts of data from many heterogeneous sources. We will explain how to improve the quality and quantity of open metadata associated with European Cultural Heritage (CH) imagery, starting (mostly) from images of paintings and text. Our ultimate goal is to transcribe insights about culture, symbols and traditions in a knowledge representation accessible to machine learning and artificial intelligence.

Day 3 (Feb 9th)

9:30 – 13:00 Data Analytics with Apache Spark. Part 1(Josep Lluis Berral, Computer Sciences - Data Centric Computing, BSC)
Apache Spark has become a consolidated technology for large-scale processing in a fast and general way, with “programmer-friendly” interfaces and official bindings for many of the most used languages (Java, Scala, Python and R), extensive documentation and development tools. This course introduces Apache Spark, as well as some of its core libraries for data manipulation, machine learning, data streams and graph analytics.

11:00 - 11:30 Coffee break
13:00 – 14:00 Lunch Break

14:00 – 15:30 Data Analytics with Apache Spark. Part 2 (Josep Lluis Berral, Computer Sciences - Data Centric Computing, BSC)

15:30 – 17:00 Hifi-Turb: high-fidelity les/dns data for innovative turbulence models (A H2020 European Project) (Oriol Lehmkuhl, and Arnau Miró, CASE - Large-scale Computational Fluid Dynamics) The presentation will cover BSC experience in the H2020 project HIFI-TURB: HIGH-FIDELITY LES/DNS DATA FOR INNOVATIVE TURBULENCE MODELS, dealing with big data set exploration, data reduction and the use of novel ML algorithms for turbulence modelling. Modelling turbulent flows using computational fluid dynamics (CFD) has progressed rapidly over the last decades and given rise to significant changes in the design processes of aircraft, cars and ships. The EU-funded HIFI-TURB project is using high-fidelity CFD together with new artificial intelligence and machine learning algorithms to identify important correlations between turbulent quantities with the aim of proposing novel turbulence models. Improved models for complex fluid flows will offer the potential of further reducing energy consumption, emissions and noise of aircraft, ships and cars.

Day 4 (Feb 10th)

9:30 – 13:00 Practical Introduction to programming Artificial Intelligence (Jordi Torres, Emerging Technologies for Artificial Intelligence Group Manager - Computer Sciences, BSC)

The next generation of Artificial Intelligence applications impose new and demanding computing infrastructures. How are the computer systems that support artificial intelligence? How to program it?


Artificial Intelligence is a Supercomputing Problem

Programming Artificial Intelligence

  • Getting Started with Deep Learning
  • Deep Learning basic concepts
  • Learning Process of a Deep Neural Network

Scaling Artificial Intelligence applications

  • Scalable AI on Parallel and Distributed Infrastructures
  • Training on Multiple GPUs

(*) Essential prerequisites to enroll in this course: It is assumed that the student has a basic knowledge of Python and Linux before starting the course.


Day 5 (Feb 11th)

9:30 – 13:00 Data Visualization Theory (Luz Calvo, User Experience And Interaction Designer, BSC and Juan Felipe Gomez Celis, FrontEnd Developer, BSC)

Data Visualization Theory (1h 30m)

  1. Basic concepts
  2. Human perception
  3. Design
  4. Colour
  5. Audience / Validation / Bad practices
  6. Visualisation design process

[11:00 - 11:30 Coffee break]

Tools for data visualization (30m)

  1. Tableau
  2. Data Wrapper
  3. RawGraphs
  4. Flourish
D3.js  (1h30m)
       D3.js Basics (Theory)
       Case studies