Continuous Integration and Continuous Delivery, collectively referred to as CI/CD remains one of the most popular and widely used practices for code versioning, management, and deployment to Production instances on a continuous and consistent basis.
To ensure effectiveness of User Acceptance Testing (UAT) under a CI/CD practice, it is preferable that validation of test cases occur in a “Production like” instance. In Data projects, this proves challenging as compared to Application development projects, with database objects being tightly coupled with data itself, requiring data within Development/Test instances to be always in constant sync with Production instances.
Part 1 of this write up describes how Snowflake’s features and functionality can be leveraged to implement an effective strategy for setup of Data warehouse environments for Development (D), Acceptance (A), and Production (P) under two different scenarios – first one under a single Snowflake account with all environments (D, A, and P) as different databases under this account and second in which each environment is a separate Snowflake account, but all accounts under the same organization*.
Both scenarios are addressed by using Snowflake Data Cloning feature across databases within the same Snowflake account and using Snowflake Data Replication functionality across separate Snowflake accounts, respectively. Azure Europe West region is used as an example for creating snowflake accounts.
This part of the write up also compares Data Cloning vs Data Replication approaches.
Scenario 1: Single Account/Snowflake Subscription for Development, Acceptance, and Production Environments
With the Snowflake Data Clone feature, ACCEPTANCE and DEVELOPMENT databases are created from PRODUCTION database in matter of seconds as only metadata is created for new databases. Any change to data in ACCEPTANCE database or DEVELOPMENT database does not have any effect on PRODUCTION data. The changes are recorded separately and available only to those Clone databases.
Database cloning can be done only within same account. PRODUCTION, ACCEPTANCE and DEVELOPMENT databases will reside in same account and are not physically separated. ACCEPTANCE and DEVELOPMENT database must be updated with new RBAC (Role Based Access Control) and Virtual Warehouses so as not to interfere with PRODUCTION database.
Scenario 2: Different Accounts for Development, Acceptance, and Production Environments
In this scenario, data warehouse environments will be setup for efficient Business Continuity and IT Operations when enterprises strictly require Physical and Logical separation of data from Production database to Acceptance and or Development database.
PRODUCTION snowflake data warehouse will be setup in Azure West Europe region environment will have separate Snowflake subscription/account.
ACCEPTANCE environment will have separate Snowflake subscription/account.
DEVELOPMENT/TEST environments will be hosted on same snowflake.
Note that all three environments are separated logically and physically with separate Snowflake subscriptions. The Production environment is not accessible to Acceptance and Development and same with Acceptance and Development environments.
Data copy from PRODUCTION to ACCEPTANCE and DEVELOPMENT environment is managed via Snowflake Replication features. To understand replication of PRODUCTION data to other environments, Data Replication approach will be used to setup ACCEPTANCE and DEVELOPMENT environments.
Acceptance/Development Database Data ingestion approach
Production databases can be replicated using one of the following approaches:
Top-Down Approach: In this approach, ACCEPTANCE and DEVELOPMENT databases are created from PRODUCTION database with data in that point of time. Once the data is copied, the sharing link is stopped. This approach is advisable when entire PRODUCTION data is required in Acceptance or Development database.
Bottom-Up Approach: In this approach, the sources feeding PRODUCTION data warehouse should also feed data into ACCEPTANCE or DEVELOPMENT environment. ACCEPTANCE/DEVELOPMENT data warehouse will not have all data available from PRODUCTION in this approach. This approach is advisable for faster testing and for smaller data warehouses.
Compare Data Clone vs Data Replication
Part 2 of this article details a Step-by-Step process for Data Replication approach, Replication Checklist, and deciphering charges for Replication.
We can assist with your Snowflake implementation using a Single vs. Distributed Account approach and assist in setting up an ideal DevOps strategy. Visit us at www.indigoChart.com or drop us a line at email@example.com
* Snowflake Documentation on Organization and Accounts https://docs.snowflake.com/en/user-guide-organizations.html
Snowflake Zero Copy Cloning https://docs.snowflake.com/en/sql-reference/sql/create-clone.html