Course Data Analysis with PySpark

Target group Data Analysis Course with PySpark

The Data Analysis course with PySpark is intended for developers and upcoming Data Analysts who want to learn how to use Apache Spark from Python.

Prior knowledge training Data Analysis with PySpark

To participate in this course, some experience with programming is conducive to understanding. Prior knowledge of Python or big data handling with Apache Spark is not required.

Implementation course Data Analysis with PySpark

The theory is treated on the basis of presentations. Illustrative demos are used to clarify the concepts discussed. There is ample opportunity to practice and alternate theory and practice. The course times are from 9.30 am to 4.30 pm.

Certification course Data Analysis with PySpark

Participants receive an official certificate Data Analysis with PySpark after successful completion of the course.

In the Data Analysis course with PySpark, participants learn to use Apache Spark from Python. Apache Spark is a Framework for parallel processing of big data. With PySpark, Apache Spark is integrated with the Python language. The following is discussed: the architecture of Spark, the Spark Cluster Manager and the difference between Batch and Stream Processing. After a discussion of the Hadoop Distributed File System, parallel operations and working with RDDs, Resilient Distributed Datasets are discussed. The configuration of PySpark applications via SparkConf and SparkContext is also discussed. Extensive consideration is given to the possible operations on RDDs, including map and reduce. The use of SQL in Spark is also discussed. The GraphX library is discussed and DataFrames is discussed. Iterative algorithms are also discussed. Finally attention is paid to machine learning with the Mlib library.

Request for Information

Personal Details

Address