Part 1 of this write up described how Snowflake’s features and functionality can be leveraged to implement an effective strategy for setup of Data warehouse environments for Development (D), Acceptance (A), and Production (P) under two different scenarios. Let's look at a Step-by-Step process for Data Replication approach across Development, Acceptance, and Production instances, Replication Checklist, and deciphering charges for Replication.
STEP 1: CREATE PRODUCTION REPLICATION ON ACCEPTANCE ACCOUNT
Assumption: To ensure data concurrency, it is recommended that replication be done at time when there no data load activities occur in Production database. During the replication time window, all data loading batch jobs must be stopped and all Snowpipe auto ingestions must be paused.
PRODUCTION database current state replication can be conducted as per Option 1 below.
It is also possible to create replication from CLONE, per Option 2 at “Point In Time” (up to 90 Days) of PRODUCTION database as per specific requirements and then use that clone to Replicate into ACCEPTANCE database.
Option 1: Current PRODUCTION state replication. Direct replication to Acceptance
Option 2: Point in Time PRODUCTION replication. Cloned replication to Acceptance
Create CLONE database from PRODUCTION with following options for Point in time copy
a. TIMESTAMP AT/BEFORE – mention timestamp at which data state is required
create table restored_table clone my_table
at(timestamp => 'Mon, 09 May 2015 01:01:00 +0300'::timestamp);
b. OFFSET AT/BEFORE – mention offset in negative (-60*5 is 5 min before, -60*60 is 1 hour before)
create schema restored_schema clone my_schema at(offset => -3600);
c. STATEMENT At/BEFORE – mentioned statement/sql id at or before which data state is required
create database restored_db clone my_db
before(statement => '8e5d0ca9-005e-44e6-b858-a8f5b37c5726');
Replicate Production to Acceptance
Data replication from Production to Acceptance is done via snowflake linked accounts for organization. Snowflake accounts are not linked by default and must be linked by creating service request with Snowflake support team.
Once the accounts are linked, the data replication link can be setup. Replication can be enabled for any existing permanent or transient database. Enabling replication designates the database as a primary database (Production). Any number of databases in an account can be designated a primary database. Likewise, a primary database can be replicated to any number of accounts in your organization. This involves creating a secondary database (Acceptance) as a replica of a specified primary database in each of the target accounts.
These accounts are typically located in other regions, on the same or a different cloud platform (AWS or Microsoft Azure). All DML/DDL operations are executed on the primary database. Each read-only, secondary database can be refreshed periodically with a snapshot of the primary database, replicating all data as well as DDL operations on database objects (i.e. schemas, tables, views, etc.).
STEP 2: CREATE CLONE OF REPLICATED PRODUCTION DATABASE
Replicated databases cannot be used as they are always in READ mode and thus it is required to create CLONE of REPLICATED PRODUCTION database in Acceptance account.
RBAC for Cloned ACCEPTANCE database to be updated per requirements.
Option 1: Direct Replication of Production database
Option 2: Cloned replication of Production
STEP 3: CREATE ACCEPTANCE REPLICATION ON DEVELOPMENT ACCOUNT
Replicate Acceptance to Development
After completion of Data replication from Production to Acceptance, the Acceptance environment can be used for creating Pre-Development Database in same account after masking sensitive data as per defined standards. The masked data warehouse then can be used for Development DWH for replication.
STEP 4: CREATE CLONE OF REPLICATED ACCEPTANCE DATABASE
Replicated databases cannot be used as they are always in READ mode and thus hence it is required to create CLONE of REPLICATED ACCEPTANCE database in Development account. RBAC for Cloned DEVELOPMENT database to be updated per requirements.
Post cloning, data masking rules per enterprise security and privacy standards on selected tables, fields, or rows will be applied on Development database.
Post masking updates, the database will be available for Developers, and Dev business users to connect and use.
Use the checklist below to ensure which objects get replicated vs. not replicated.
What is REPLICATED ?
What is NOT REPLICATED
Other types of objects in an account cannot be replicated. This list includes:
4. Resource monitors
Charges based on database replication are divided into two categories: data transfer and compute resources. Both categories are billed on the target account (i.e., the account that stores the secondary database that is refreshed).
Data transfer: The initial database replication and subsequent synchronization operations transfer data between regions. Cloud providers charge for data transferred from one region to another within their own network. The data transfer rate is determined by the location of the source account (i.e., the account that stores the primary database).
Compute resources: Replication operations use Snowflake-provided compute resources to copy data between accounts across regions.
Acceptance usage can be found in Acceptance Account – Billing and Usage with REPLICATION virtual warehouse as below.
Similarly, when replicating from Acceptance to Development, the usage of compute is charged to Development account. The credits used for replication can be found in Account – Billing and Usage with Warehouse named as REPLICATION.
We can assist with your Snowflake implementation using a Single vs. Distributed Account approach and assist in setting up an ideal DevOps strategy. Visit us at www.indigoChart.com or drop us a line at email@example.com
* Snowflake Documentation on Organization and Accounts https://docs.snowflake.com/en/user-guide-organizations.html
* Snowflake DB Replication https://docs.snowflake.com/en/user-guide/database-replication-failover.html