How a custom developed Python utility tool simplified and automated the process of ingesting data from SAP replicated tables into Snowflake.
A significant scenario emerged during one of our data integration projects. An existing Enterprise Data Warehouse, built on Snowflake with transformations handled using dbt, required the integration of a new data source—SAP. This inclusion was essential to enhance our analytics capabilities and to ensure the development of a comprehensive data landscape.
The initial phase involved the replication of SAP table objects in Snowflake and the execution of basic transformations which demanded creation of over 75 tables in Snowflake (DDL), along with their corresponding dbt models, facilitating the transition of raw data into a more operationally refined layer. Executing this task manually required a lot of labour and time which made automation necessary right away.
So, to efficiently tackle this challenge, we developed a Python-based utility capable of creating these objects in one go. Moreover, the utility established a process to fast-track future requests - inclusion of new tables or even a new data source.
Solution and Use
The primary input to the utility would be the table metadata* containing information such as table names, associated column names, essential table keys, and data types that define each table's structure. So, the first and most important step is to get this metadata. In the case of SAP, this information is frequently retrievable from the SAP metadata tables. Below is an example for one of the SAP tables – MARA.
*A capability developed by indigoChart-specific that shows the generation of SAP-specific metadata and will be discussed in a future blog.
The retrieved metadata is stored in a Pandas DataFrame. Once stored, few operations and data manipulations are performed on the DataFrame to prepare the metadata for further processing. The utility notebook contains dedicated cells for each requirement and each cell consumes the final metadata DataFrame to generate the required objects in bulk. For each table in the metadata DataFrame, the following objects are generated:
DDLs: CREATE TABLE SQL files,
dbt: transformation SQL models,
a schema.yml: a dbt config block for a group of models.
How indigoChart's custom Python Utility can help:
Ease of use:
The Python utility that we created is intended to be user-friendly. We've encapsulated its functionality in a Python Notebook (.ipynb) format, making it easy to use and adaptable. For example, if the user only needs to generate dbt models, they can easily navigate to the dedicated cell for dbt model generation and execute it. This modular approach streamlines the process, saving time and effort.
Tailoring your needs:
The heart of this utility lies in its flexibility. We've designed a set of database object templates, including dbt models, table DDLs, and schema.yml configurations, the core structures of these objects are pre-defined and stored within functions as string variables. These structures serve as templates that, when invoked, readily accept column values and table names as inputs to generate the required files. As a result, the utility adapts to specific project needs with ease.
Data Type Harmony:
SAP, like many other systems, comes with its own set of data types, some of which may not align with the data types used in Snowflake. To address this, we've incorporated a Snowflake SAP data type mapping configuration. As the utility generates objects, it smartly replaces SAP data types with their corresponding Snowflake equivalents. This automated mapping ensures seamless integration and accurate data representation.
Fine-Tuning at the Column Level:
The utility's data transformation capabilities extend to the granular level of individual columns. As we traverse the input CSV, the iterative process runs over both - the table and column level. This empowers us to apply custom operations at the column level. For instance, consider a common date column shared across multiple tables, requiring a date transformation. This task is seamlessly handled by incorporating a straightforward "IF" code block within the column iteration. The result is fine-grained control and adaptability to address unique data requirements.
In summation, the Python utility offers a comprehensive solution that not only streamlines the task of replicating bulk data objects but also empowers users to customize the process to meet their specific project requirements. It isn't limited to SAP data alone; it can integrate data from various source systems, making it a tool for a wide range of data integration and analytics projects. It significantly contributes to enhanced productivity and resource allocation, freeing up human resources for more strategic data tasks.
With our extensive knowledge of SAP and Snowflake, we can assist your business in development and implementations of SAP specific analytical solutions. We execute on areas of Data and Analytics by embracing and organically expanding your current skills, whether it's developing new solutions or supporting your operations. Contact us right off the bat and we'll take it from there.