Getting Started with High-Performance Data Analytics

Inter-company training

Who is the training for?

Anyone interested in understanding the basics of HPDA.

Level reached

Specialisation

Duration

12,00 hours(s)

Language(s) of service

EN

Next session

14.05.2024
Location
Esch-sur-Alzette

Price

1360,00€

Goals

Bringing Data Analytics to the Next Level with MeluXina Supercomputer

This course is about obtaining working knowledge of some of the core python libraries used in proof-of-concept and prototyping and understand the structure of a Data Science project. Furthermore, to gain hands-on knowledge of TensorFlow library for machine learning, Deep learning, as well as statistical visualization (Seaborn). Finally, to become familiar with distributed computing and Big Data concepts and their implementation using Horovod.

At the end of this course, the successful attendee will:

  • Have knowledge about:
    • Python notebooks and how the computation is mapped onto hardware infrastructure
    • Effective data-science project workflows
    • Common data analytics Python libraries and their strengths and weaknesses
    • Big Data problems using distributed computing
  • and be able to:
    • Work with Python notebooks on MeluXina
    • Read in a data set from file or object storage for analysis
    • Make statistical analysis on data in a NumPy array or in a Panda dataframe
    • Make visualizations of data using modern libraries
    • Define, train and evaluate simple machine learning models TensorFlow
    • Choose the suitable data analytics library for the job to be done
  • in order to:
    • Independently analyze and visualize data sets of any size on MeluXina

Contents

  • Introduction to Data Analytics
    • Intro to Jupyter Notebooks
    • Load data with Pandas
    • Clean data and automate web download
    • Separate datasets into train-validation-test
    • Visualize variables
    • Run lineal regression on data
    • Visualize and Interpret results
  • Machine Learning
    • Intro to ML regression algorithms (linear, SVM, regularization, random forest)
    • Perform PCA to reduce dimensionality
    • Run SVM regression on data
    • Run the model in inference-mode
    • Creating python scripts and executing from terminal
  • Distributed Computing
    • Intro to distributed computing
    • Delayed computations and computing graphs
    • Setting up Dask client
    • Distributed load of large dataset
    • Principal Component Analysis for dimensionality reduction
  • Accelerated Machine Learning
    • Read with CuDF and comparison with Pandas
    • ML algorithms with CuML and comparison with sklearn
    • Display html with Plotly
  • Deep Learning
    • Load and preprocess dataset
    • Construct DL model using the Sequential API in TF2
    • Compile model
    • Define EarlyStop and CheckPoint callbacks
    • Train model
    • Evaluate model
  • GPU-Accelerated Deep Learning
    • Intro to distributed DL (Sharing gradients)
    • Initialize Horovod
    • Pin processes to (available) GPU
    • Use distributed optimizer and broadcasting
    • Call script using the "horovod" MPI-wrapper
    • Deploy the TF model with TensorRT

Next session

Datum
City
Language and price
14.05.2024

15.05.2024
Esch-sur-Alzette
EN 1360,00€

These courses might interest you