arrow_back
Apache Spark Course Curriculum Walkthrough
PySpark Course Pre-Intro ppt
Apache Spark Course Syllabus
PySpark Pre Intro Session (Part-1)
PySpark Pre Intro Live Q&A Session (Part-2)
Join the WA Community
[01 WEEK] PySpark Session
Agenda
History of Bigdata (Part-1)
Monolithic Vs Distributed Systems (Part-2)
What is Hadoop and Its History (Part-3)
Hadoop Advantages and Disadvantages (Part-4)
What is Apache Spark and its features (Part-5)
Why Apache Spark is faster than MapReduce (Part-6)
Is Apache Spark replacement of Hadoop (Part-7)
Apache Spark Eco System (Part-8)
Spark Context Vs Spark Session (Part-9)
Spark Worker Node-Container-Executor (Part-10)
Spark Worker Node-Container-Executor (Part-11)
What is On Heap Memory (Part-12)
What is Off Heap Memory (Part-13)
What is Garbage Collector (Part-14)
Bonus Q&A
Apache Spark Week-1 Notes
[02 WEEK] PySpark Session
Previous Session Recap
Agenda
Types of Spark Execution Modes (Part-1)
Different Types of Cluster Managers (Part-2)
Spark Execution-Standalone Mode (Part-3)
Spark Cluster-Key Components (Part-4)
Spark Runtime Architecture-Cluster Mode (Part-5)
Spark Runtime Architecture-Client Mode (Part-6)
Spark Runtime Architecture-Cluster Mode Vs Client Mode (Part-7)
Spark Driver and Executor Overhead Memory Calculations (Part-8)
Spark Executor Memory Deep Dive (Part-9)
Spark Submit - Different Scenario's (Part-10)
Spark Executor Cores and Memory Calculations (Part-11)
Bonus
Apache Spark Week-2 notes pdf
[03WEEK] PySpark Session
Previous Week Recap
Agenda
Fat Executors in Spark (Part-1)
Thin Executors in Spark (Part-2)
Optimally Sized Executors in Spark (Part-3)
Spark Transformations and Actions (Part-4)
Spark Narrow and Wide Transformations (Part-5)
Spark Application-Jobs-Stages-Tasks (Part-6)
Spark Application-Jobs-Stages-Tasks -Practical Example (Part-7)
History of Apache Spark API’s (Part-8)
Differences Between Spark RDD Vs DataFrame Vs DataSet (Part-9)
Databricks Introduction and its Key features (Part-10)
Bonus Q&A
Apache Spark Week-3 Notes pdf
[04WEEK] PySpark Session
Previous Session Recap
Agenda
Databricks Workspace Walkthrough
Databricks Notebook Magic Commands Overview (Part-1)
DBUtils File System Commands (Part-2)
DBUtils Widget Commands (Part-3)
DBUtils Notebook Commands (Part-4)
Ways of Creating RDD's (Part-5)
Ways of creating of RDD's (Part-6)
Ways of creating DataFrame's (Part-7)
Spark SQL Operations (Part-8)
Temp Vs Global Views in Spark (Part-9)
Data Analysis by using the Spark SQL (Part-10)
Managed Tables Vs External Tables (Part-11)
Lineage Vs DAG in Spark (Part-12)
04 WEEK Bonus
Apache Spark Week-4 PPT pdf
Databricks Notebooks
[05WEEK] PySpark Session
Previous Session Recap
Agenda
Delta Tables (Part-1)
Data Warehouse (Part-2)
Data Lake (Part-3)
Data Lakehouse (Part-4)
Data Warehouse VS Data Lake Vs Data Lakehouse (Part-5)
Different Ways of Selecting a DataFrame Columns (Part-6)
DataFrame Nested Columns Selection (Part-7)
DataFrame Basic Operations (Part-8)
Regular Expressions and String Functions (Part-9)
Data and Time Functions (Part-10)
Pivot and UnPivot (Part-11)
Bad Records Handling (Part-12)
Repartition Vs Coaleasc (Part-13)
Apache Spark Week-5 PPT pdf
Databricks Notebooks
[06WEEK] PySpark Session
Previous Session Recap
Agenda
Joins in PySpark Theory (Part-1)
Joins in PySpark Practicals (Part-2)
xxhash64 function theory and practicals (Part-3)
union and union all (Part-4)
RANK-DENSE_RANK-ROW_NUMBER Theory (Part-5)
RANK-DENSE_RANK-ROW_NUMBER Practicals (Part-6)
Lead and Lag Functions - Theory (Part-7)
Lead and Lag Functions - Practicals (Part-8)
Aggregate Functions (Part-9)
Data Masking Schenario-1 (Part-10)
UDF and Data Masking Schenario-2 (Part-11)
UDF and Data Masking Schenario-3 (Part-12)
Apache Spark Week-6 PPT pdf
06WEEK databricks notebooks
[07WEEK] PySpark Session
Previous Session Recap
07 WEEK - Agenda
Cache and Persist in Spark - Theory (Part-1)
Cache in Spark- Practical's (Part-2)
Persist in Spark- Practical's (Part-3)
Partitining - Practical's (Part-4)
BroadCastHash Join - Theory and Practical's (Part-5)
SortMerge Join - Theory and Practical's (Part-6)
Full Load Implementation - Practical's (Part-7)
Incremental Load SCD Type-1 - Practical's (Part-8)
Incremental Load SCD Type-2 Implementation Steps (Part-9)
Incremental Load SCD Type-2 Implementation - Practical's (Part-10)
Bonus
PYSPARK_07WEEK_RESOURCES
[08WEEK] PySpark Session
Previous Session Recap
PySpark Originals - Agenda
Predicate Pushdown in Spark (Part-1)
Projection Pushdown in Spark (Part-2)
Dynamic Partition Pruning (DPP) in Spark (Part-3)
Static Partition Vs DPP in Spark (Part-4)
Catalyst Optimizer in Spark (Part-5)
Salting Technique in Spark (Part-6)
Adaptive Query Execution (AQE) in Spark - Theory (Part-7)
Adaptive Query Execution (AQE) in Spark - Practical (Part-8)
Accumulators in Spark - Theory and Practical (Part-9)
Broadcast Variables in Spark - Theory and Practical (Part-10)
Bonus Q&A
PYSPARK_08WEEK_RESOURCES
[09WEEK] PySpark Session
Previous Session Agenda
Agenda
Spark Structured Streaming With Manual Schema Evaluation - Theory
Spark Structured Streaming Manual Schema Evaluation - Practical
Spark Structured Streaming Automatic Schema Evaluation - Practical
[Min Project] Introduction (Part-1)
[Min Project] Implementation Steps (Part-2)
[Min Project] Infra Setup-Creation of Resource Group and SQL database (Part-3)
[Min Project] Infra Setup-Keyvault-Secret Scope-Secrets Creation (Part-4A)
[Mini Project]-Creating Azure Databricks Workspace (Part-4B)
[Min Project] Setting up source tables in SQL database (Part-5)
[Min Project]-Metadata Table Setup in Databricks (Part-6)
[Mini Project]-Establish JDBC Connection from Databricks to SQL Server (Part-7)
[Mini Project]-Data Full Load Implementation (Part-8)
[Mini Project]-Delta Load SCD Type-1 Implementation (Part-9)
[Mini Project]-Delta Load SCD Type-2 Implementation (Part-10)
[Mini Project]-View Creation in Gold Schema (Part-11)
[Mini Project]-Bonus
PYSPARK_09WEEK_RESOURCES
Clever Gift
Goodwill Gesture
Preview - Apache Spark - Tutorial
Discuss (
0
)
navigate_before
Previous
Next
navigate_next