|
Apache Spark Course Curriculum Walkthrough
|
|
|
|
PySpark Course Pre-Intro ppt
|
Preview
|
|
|
Apache Spark Course Syllabus
(11 pages)
|
Preview
|
|
|
PySpark Pre Intro Session (Part-1)
32:00
|
Preview
|
|
|
PySpark Pre Intro Live Q&A Session (Part-2)
24:00
|
|
|
|
Join the WA Community
|
|
|
[01 WEEK] PySpark Session
|
|
|
|
Agenda
4:00
|
Preview
|
|
|
History of Bigdata (Part-1)
14:00
|
|
|
|
Monolithic Vs Distributed Systems (Part-2)
8:00
|
|
|
|
What is Hadoop and Its History (Part-3)
9:00
|
|
|
|
Hadoop Advantages and Disadvantages (Part-4)
6:00
|
|
|
|
What is Apache Spark and its features (Part-5)
7:00
|
|
|
|
Why Apache Spark is faster than MapReduce (Part-6)
3:00
|
|
|
|
Is Apache Spark replacement of Hadoop (Part-7)
2:00
|
|
|
|
Apache Spark Eco System (Part-8)
9:00
|
|
|
|
Spark Context Vs Spark Session (Part-9)
10:00
|
|
|
|
Spark Worker Node-Container-Executor (Part-10)
26:00
|
|
|
|
Spark Worker Node-Container-Executor (Part-11)
8:00
|
|
|
|
What is On Heap Memory (Part-12)
6:00
|
|
|
|
What is Off Heap Memory (Part-13)
9:00
|
|
|
|
What is Garbage Collector (Part-14)
3:00
|
|
|
|
Bonus Q&A
3:00
|
|
|
|
Apache Spark Week-1 Notes
|
|
|
[02 WEEK] PySpark Session
|
|
|
|
Previous Session Recap
12:00
|
|
|
|
Agenda
3:00
|
|
|
|
Types of Spark Execution Modes (Part-1)
7:00
|
|
|
|
Different Types of Cluster Managers (Part-2)
3:00
|
|
|
|
Spark Execution-Standalone Mode (Part-3)
2:00
|
|
|
|
Spark Cluster-Key Components (Part-4)
5:00
|
|
|
|
Spark Runtime Architecture-Cluster Mode (Part-5)
15:00
|
|
|
|
Spark Runtime Architecture-Client Mode (Part-6)
6:00
|
|
|
|
Spark Runtime Architecture-Cluster Mode Vs Client Mode (Part-7)
2:00
|
|
|
|
Spark Driver and Executor Overhead Memory Calculations (Part-8)
19:00
|
|
|
|
Spark Executor Memory Deep Dive (Part-9)
55:00
|
|
|
|
Spark Submit - Different Scenario's (Part-10)
8:00
|
|
|
|
Spark Executor Cores and Memory Calculations (Part-11)
17:00
|
|
|
|
Bonus
1:00
|
|
|
|
Apache Spark Week-2 notes pdf
|
|
|
[03WEEK] PySpark Session
|
|
|
|
Previous Week Recap
9:00
|
|
|
|
Agenda
2:00
|
|
|
|
Fat Executors in Spark (Part-1)
14:00
|
Preview
|
|
|
Thin Executors in Spark (Part-2)
9:00
|
|
|
|
Optimally Sized Executors in Spark (Part-3)
12:00
|
|
|
|
Spark Transformations and Actions (Part-4)
6:00
|
|
|
|
Spark Narrow and Wide Transformations (Part-5)
8:00
|
|
|
|
Spark Application-Jobs-Stages-Tasks (Part-6)
10:00
|
|
|
|
Spark Application-Jobs-Stages-Tasks -Practical Example (Part-7)
9:00
|
|
|
|
History of Apache Spark API’s (Part-8)
11:00
|
|
|
|
Differences Between Spark RDD Vs DataFrame Vs DataSet (Part-9)
4:00
|
|
|
|
Databricks Introduction and its Key features (Part-10)
10:00
|
|
|
|
Bonus Q&A
4:00
|
|
|
|
Apache Spark Week-3 Notes pdf
|
|
|
[04WEEK] PySpark Session
|
|
|
|
Previous Session Recap
10:00
|
|
|
|
Agenda
3:00
|
|
|
|
Databricks Workspace Walkthrough
|
|
|
|
Databricks Notebook Magic Commands Overview (Part-1)
18:00
|
|
|
|
DBUtils File System Commands (Part-2)
6:00
|
|
|
|
DBUtils Widget Commands (Part-3)
7:00
|
|
|
|
DBUtils Notebook Commands (Part-4)
10:00
|
|
|
|
Ways of Creating RDD's (Part-5)
11:00
|
|
|
|
Ways of creating of RDD's (Part-6)
4:00
|
|
|
|
Ways of creating DataFrame's (Part-7)
23:00
|
|
|
|
Spark SQL Operations (Part-8)
11:00
|
|
|
|
Temp Vs Global Views in Spark (Part-9)
6:00
|
|
|
|
Data Analysis by using the Spark SQL (Part-10)
9:00
|
|
|
|
Managed Tables Vs External Tables (Part-11)
16:00
|
|
|
|
Lineage Vs DAG in Spark (Part-12)
10:00
|
|
|
|
04 WEEK Bonus
1:00
|
|
|
|
Apache Spark Week-4 PPT pdf
|
|
|
|
Databricks Notebooks
|
|
|
[05WEEK] PySpark Session
|
|
|
|
Previous Session Recap
3:00
|
|
|
|
Agenda
3:00
|
|
|
|
Delta Tables (Part-1)
11:00
|
|
|
|
Data Warehouse (Part-2)
10:00
|
|
|
|
Data Lake (Part-3)
7:00
|
|
|
|
Data Lakehouse (Part-4)
9:00
|
|
|
|
Data Warehouse VS Data Lake Vs Data Lakehouse (Part-5)
4:00
|
|
|
|
Different Ways of Selecting a DataFrame Columns (Part-6)
20:00
|
|
|
|
DataFrame Nested Columns Selection (Part-7)
13:00
|
|
|
|
DataFrame Basic Operations (Part-8)
31:00
|
|
|
|
Regular Expressions and String Functions (Part-9)
20:00
|
|
|
|
Data and Time Functions (Part-10)
6:00
|
|
|
|
Pivot and UnPivot (Part-11)
10:00
|
|
|
|
Bad Records Handling (Part-12)
8:00
|
|
|
|
Repartition Vs Coaleasc (Part-13)
15:00
|
|
|
|
Apache Spark Week-5 PPT pdf
|
|
|
|
Databricks Notebooks
|
|
|
[06WEEK] PySpark Session
|
|
|
|
Previous Session Recap
3:00
|
|
|
|
Agenda
2:00
|
|
|
|
Joins in PySpark Theory (Part-1)
14:00
|
|
|
|
Joins in PySpark Practicals (Part-2)
|
|
|
|
xxhash64 function theory and practicals (Part-3)
16:00
|
|
|
|
union and union all (Part-4)
5:00
|
|
|
|
RANK-DENSE_RANK-ROW_NUMBER Theory (Part-5)
5:00
|
|
|
|
RANK-DENSE_RANK-ROW_NUMBER Practicals (Part-6)
6:00
|
|
|
|
Lead and Lag Functions - Theory (Part-7)
6:00
|
|
|
|
Lead and Lag Functions - Practicals (Part-8)
6:00
|
|
|
|
Aggregate Functions (Part-9)
3:00
|
|
|
|
Data Masking Schenario-1 (Part-10)
6:00
|
|
|
|
UDF and Data Masking Schenario-2 (Part-11)
5:00
|
|
|
|
UDF and Data Masking Schenario-3 (Part-12)
10:00
|
|
|
|
Apache Spark Week-6 PPT pdf
|
|
|
|
06WEEK databricks notebooks
|
|
|
[07WEEK] PySpark Session
|
|
|
|
Previous Session Recap
2:00
|
|
|
|
07 WEEK - Agenda
1:00
|
|
|
|
Cache and Persist in Spark - Theory (Part-1)
7:00
|
|
|
|
Cache in Spark- Practical's (Part-2)
28:00
|
|
|
|
Persist in Spark- Practical's (Part-3)
8:00
|
|
|
|
Partitining - Practical's (Part-4)
10:00
|
|
|
|
BroadCastHash Join - Theory and Practical's (Part-5)
33:00
|
|
|
|
SortMerge Join - Theory and Practical's (Part-6)
10:00
|
|
|
|
Full Load Implementation - Practical's (Part-7)
5:00
|
|
|
|
Incremental Load SCD Type-1 - Practical's (Part-8)
7:00
|
|
|
|
Incremental Load SCD Type-2 Implementation Steps (Part-9)
28:00
|
|
|
|
Incremental Load SCD Type-2 Implementation - Practical's (Part-10)
9:00
|
|
|
|
Bonus
1:00
|
|
|
|
PYSPARK_07WEEK_RESOURCES
|
|
|
[08WEEK] PySpark Session
|
|
|
|
Previous Session Recap
2:00
|
|
|
|
PySpark Originals - Agenda
3:00
|
|
|
|
Predicate Pushdown in Spark (Part-1)
12:00
|
|
|
|
Projection Pushdown in Spark (Part-2)
3:00
|
|
|
|
Dynamic Partition Pruning (DPP) in Spark (Part-3)
16:00
|
|
|
|
Static Partition Vs DPP in Spark (Part-4)
8:00
|
|
|
|
Catalyst Optimizer in Spark (Part-5)
17:00
|
|
|
|
Salting Technique in Spark (Part-6)
23:00
|
|
|
|
Adaptive Query Execution (AQE) in Spark - Theory (Part-7)
14:00
|
|
|
|
Adaptive Query Execution (AQE) in Spark - Practical (Part-8)
5:00
|
|
|
|
Accumulators in Spark - Theory and Practical (Part-9)
16:00
|
|
|
|
Broadcast Variables in Spark - Theory and Practical (Part-10)
5:00
|
|
|
|
Bonus Q&A
6:00
|
|
|
|
PYSPARK_08WEEK_RESOURCES
|
|
|
[09WEEK] PySpark Session
|
|
|
|
Previous Session Agenda
1:00
|
|
|
|
Agenda
3:00
|
|
|
|
Spark Structured Streaming With Manual Schema Evaluation - Theory
8:00
|
|
|
|
Spark Structured Streaming Manual Schema Evaluation - Practical
15:00
|
|
|
|
Spark Structured Streaming Automatic Schema Evaluation - Practical
7:00
|
|
|
|
[Min Project] Introduction (Part-1)
15:00
|
|
|
|
[Min Project] Implementation Steps (Part-2)
5:00
|
|
|
|
[Min Project] Infra Setup-Creation of Resource Group and SQL database (Part-3)
9:00
|
|
|
|
[Min Project] Infra Setup-Keyvault-Secret Scope-Secrets Creation (Part-4A)
9:00
|
|
|
|
[Mini Project]-Creating Azure Databricks Workspace (Part-4B)
3:00
|
|
|
|
[Min Project] Setting up source tables in SQL database (Part-5)
6:00
|
|
|
|
[Min Project]-Metadata Table Setup in Databricks (Part-6)
13:00
|
|
|
|
[Mini Project]-Establish JDBC Connection from Databricks to SQL Server (Part-7)
17:00
|
|
|
|
[Mini Project]-Data Full Load Implementation (Part-8)
10:00
|
|
|
|
[Mini Project]-Delta Load SCD Type-1 Implementation (Part-9)
13:00
|
|
|
|
[Mini Project]-Delta Load SCD Type-2 Implementation (Part-10)
7:00
|
|
|
|
[Mini Project]-View Creation in Gold Schema (Part-11)
2:00
|
|
|
|
[Mini Project]-Bonus
1:00
|
|
|
|
PYSPARK_09WEEK_RESOURCES
|
|
|
Clever Gift
|
|
|
|
Goodwill Gesture
|
|