Thursday, October 26, 2023

Convert your PySpark code to Snowpark code using SnowConvert!

 Are you a data practitioner working with Spark?

  • How fun is it to calculate the executor memory, driver memory, number of executors, #parallelism for every spark job you run?
  • Do you enjoy running into Out of Memory errors?
  • How thrilling is it to debug your failed spark job?
Image by Author

Challenges working with Spark

Debugging the multi-page stack trace from a failed job, troubleshooting why a job failed are hard. Especially because Spark jobs are memory-resident, a job failure makes the evidence disappear.

As we discussed above, the number of configurations we need to understand and set for a job to run optimally is a never ending challenge. Overall, capacity management & resource sizing are difficult.

Managing the infrastructure and keeping up with Spark version upgrades and dependency management are hard. This takes the focus away from business problems to the underlying infrastructure challenges for data engineers.

While some of us enjoy tinkering with and engineering a spark job, it gives nightmares for most of us.

Enter Snowpark!

Snowpark is the savior we didn’t know we needed. Snowpark is a set of libraries and runtimes that allows us to run Python, Java and Scala code within Snowflake.

With Snowpark, we don’t have to deal with 100s of configs and infrastructure setup. No capacity planning or resource sizing needed.

And the best part? We write Python code and work with DataFrames. Simplicity and ease of use at its best.

Check how to start using Snowpark, if you are a Spark user:

What is SnowConvert?

If you are further along the journey and are looking to migrate to Snowpark, you should check out SnowConvert.

SnowConvert is a software that understands your source code (Python) by parsing and building a semantic model of your code’s behavior.

SnowConvert is not a find-and-replace or regex matching tool.

For Spark, SnowConvert identifies the usages of the Spark API, inventories them, and finally converts them to their functional equivalent in Snowpark.

IQVIA Spark to Snowpark Migration Case Study

Check out this detailed video to learn how IQVIA migrated from Spark to Snowpark, how to use SnowConvert and more.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...