Important Resources and concepts to prepare
Apache Spark is most used parallel processing engine used in Data Engineering and Databricks is the best platform providing efficient version of Apache Spark with extra features. So, This Certification will help you a lot as a Data Engineering to crack interviews and in your Career.
In this blog, I have prepared complete guide mentioning Topics to prepare for the Exam and Resource I followed to crack this Exam!
Apache Spark is most used parallel processing engine used in Data Engineering and Databricks is the best platform providing efficient version of Apache Spark with extra features. So, This Certification will help you a lot as a Data Engineering to crack interviews and in your Career.
In this blog, I have prepared complete guide mentioning Topics to prepare for the Exam and Resource I followed to crack this Exam!
Important Note:
First of all, I want to mention, It’s not an easy exam to crack, So you need to learn so many concepts in Databricks and Apache Spark and I would Highly suggest you to complete Databricks Certified Data Engineer Associate Certification.
Associate Certification is not an mandatory/pre-requisites, but you need all the skills of Associate Certification. Personally I did not attended for Associate Certification but I had learned all the skills of Associate and then started to learn required skills for Professional one.
Let’s break down all the important concepts of Spark and Databricks which you need to learn to easily crack this Certification.
First of all, I want to mention, It’s not an easy exam to crack, So you need to learn so many concepts in Databricks and Apache Spark and I would Highly suggest you to complete Databricks Certified Data Engineer Associate Certification.
Associate Certification is not an mandatory/pre-requisites, but you need all the skills of Associate Certification. Personally I did not attended for Associate Certification but I had learned all the skills of Associate and then started to learn required skills for Professional one.
Let’s break down all the important concepts of Spark and Databricks which you need to learn to easily crack this Certification.
Topics to prepare:
1. Databricks Tooling — 20%:
- Databricks Jobs
- Advanced Jobs Configurations
- Troubleshooting Jobs Failures
- REST API (configure and trigger production pipelines)
- Databricks CLI (deploying notebook-based workflows)
- Databricks SQL
- Databricks Jobs
- Advanced Jobs Configurations
- Troubleshooting Jobs Failures
- REST API (configure and trigger production pipelines)
- Databricks CLI (deploying notebook-based workflows)
- Databricks SQL
2. Data Processing — 30%:
- Change Data Capture (CDC)
- Processing CDC Feed
- Delta Lake CDF
- Stream-Stream Joins
- Stream-Static Joins
- Building batch-processed ETL pipelines
- Building incrementally processed ETL pipelines
- Deduplicating data
- Using Change Data Capture (CDC) to propagate changes
- Optimizing workloads
- Structured Streaming
- Incremental Data Ingestion
- Auto Loader
- Databricks SQL
- Change Data Capture (CDC)
- Processing CDC Feed
- Delta Lake CDF
- Stream-Stream Joins
- Stream-Static Joins
- Building batch-processed ETL pipelines
- Building incrementally processed ETL pipelines
- Deduplicating data
- Using Change Data Capture (CDC) to propagate changes
- Optimizing workloads
- Structured Streaming
- Incremental Data Ingestion
- Auto Loader
- Databricks SQL
3. Data Modeling — 20%:
- Medallion/Multi-hop Architecture
- Bronze, Silver, Gold Layer of Medallion Architecture
- Slowly Changing Dimensions (SCD)
- Constraints
- Lookup Tables
- Medallion/Multi-hop Architecture
- Bronze, Silver, Gold Layer of Medallion Architecture
- Slowly Changing Dimensions (SCD)
- Constraints
- Lookup Tables
4. Security and Governance — 10%:
- Dynamic Views
- Propagating Deletes
- Managing clusters and jobs permissions with ACLs
- Creating row- and column-oriented dynamic views to control user/group access
- Securely delete data as requested according to GDPR & CCPA
- Unity Catalog
- Dynamic Views
- Propagating Deletes
- Managing clusters and jobs permissions with ACLs
- Creating row- and column-oriented dynamic views to control user/group access
- Securely delete data as requested according to GDPR & CCPA
- Unity Catalog
5. Monitoring and Logging — 10%:
- Managing Cluster
- Recording logged metrics
- Debugging errors
- Managing Cluster
- Recording logged metrics
- Debugging errors
6. Testing and Deployment — 10%:
- Data Pipeline Testing
- Relative Import
- Scheduling Jobs
- Orchestration Jobs
- Data Pipeline Testing
- Relative Import
- Scheduling Jobs
- Orchestration Jobs
7. Performance Tuning:
- Partitioning Delta Lake Tables
- Delta Lake Transaction Log
- Auto Optimize Feature
- Partitioning Delta Lake Tables
- Delta Lake Transaction Log
- Auto Optimize Feature
Resources I followed
Very Important Note: Before proving you with resources, let me mention that these resources are provided to you keeping in mind that you have good understanding of all the databricks concepts required to crack Associate Certification:
Very Important Note: Before proving you with resources, let me mention that these resources are provided to you keeping in mind that you have good understanding of all the databricks concepts required to crack Associate Certification:
Paid Resources (Personally Recommended):
- To prepare for this Certification I had followed on Udemy Course by derar-alhussein : Course Link
- I practiced these exam sets before attempting for Certification: Course Link
- To prepare for this Certification I had followed on Udemy Course by derar-alhussein : Course Link
- I practiced these exam sets before attempting for Certification: Course Link
Free Resources:
You can refer public documentation by Databricks to learn all below concepts:
- Spark Performance Tuning
- Delta best practices
- Spark streaming Joins
- Spark streaming windowing
- Delta CDC
- Delta live table CDC
- Orchestration of jobs/Workflows, JobID, RunID
- access control for jobs
- Secret access control
- Data Object Privileges'
- DB SQL — Dashboard Alerts
- Git integration with Repos: Link one , Link 2
- Learn Unity catalog
You can refer public documentation by Databricks to learn all below concepts:
- Spark Performance Tuning
- Delta best practices
- Spark streaming Joins
- Spark streaming windowing
- Delta CDC
- Delta live table CDC
- Orchestration of jobs/Workflows, JobID, RunID
- access control for jobs
- Secret access control
- Data Object Privileges'
- DB SQL — Dashboard Alerts
- Git integration with Repos: Link one , Link 2
- Learn Unity catalog
Exam Details:
- Total number of questions: 60
- Time limit: 120 minutes
- Registration fee: $200 USD
- Question types: Multiple choice
- Test aides: None allowed
- Languages: English
- Delivery method: Online proctored
- Prerequisites: None, but related training highly recommended
- Recommended experience: 1+ years of hands-on experience performing the data engineering tasks outlined in the exam guide
- Validity period: 2 years
- Recertification: Recertification is required to maintain your certification status. Databricks Certifications are valid for two years from issue date.
I hope this blog helps:
- Total number of questions: 60
- Time limit: 120 minutes
- Registration fee: $200 USD
- Question types: Multiple choice
- Test aides: None allowed
- Languages: English
- Delivery method: Online proctored
- Prerequisites: None, but related training highly recommended
- Recommended experience: 1+ years of hands-on experience performing the data engineering tasks outlined in the exam guide
- Validity period: 2 years
- Recertification: Recertification is required to maintain your certification status. Databricks Certifications are valid for two years from issue date.
I hope this blog helps:
Thanks for reading!!!
Best of luck with your journey!!!
Follow for more such content on Data Analytics, Engineering and Data Science.
Resources used to write this blog:
- Learn from YouTube Channels
- Udemy
- Databricks
- I used Google to research and resolve my doubts
- From my Experience
- I used Grammarly to check my grammar and use the right words
Best of luck with your journey!!!
Follow for more such content on Data Analytics, Engineering and Data Science.
Resources used to write this blog:
- Learn from YouTube Channels
- Udemy
- Databricks
- I used Google to research and resolve my doubts
- From my Experience
- I used Grammarly to check my grammar and use the right words
No comments:
Post a Comment