Teradata Migration into Amazon Redshift

Abstract

In recent months, the advantages of moving big data and databases to the cloud have become obvious. There would be a strong demand for cloud services, because lots of specialists working remotely need to access data from everywhere. Accurate data and all the workload required will be available in the warehouse in real-time.

Cloud analytics has become an essential service model to analyze big data and deliver insights for organizations to adapt to changing business needs. Organizations can run big data analysis in the cloud without significant expenditure of resources and gain the benefits needed to adapt to changing business needs quickly. It’s no surprise that the cloud analytics market is projected to rise to 24.3% CAGR until 2026.

You have chosen one of the most talked-about cloud system – Amazon Redshift. Redshift offers you a set of advantages, e.g. easy data management horizontal scalability, and 10x higher performance than other data warehouses.

Let's take a quick look inside our approach to migration from Teradata to Redshift. Source Teradata logic will be accurately migrated to the Redshift cloud data warehouse. And a blend of features it offers for your business intelligence needs becomes available.

Strategy

Once a business decision to change the environment of business-critical data to the cloud has been taken, it’s time to determine the best migration solution. Without the well-arranged migration strategy of moving the indispensable enterprise data can be risky. The incredible number of benefits cloud tech creates a demand for migration projects to Amazon Web Services, Microsoft Azure and other cloud environments. We are eager to share our experience to whom it may concern. One of frequently asked migration directions is the Teradata database transfer to Amazon Redshift. Beginning with an insightful analysis of source database features we map out a course of action.

Data and data types

A lot of Redshift SQL elements have other performance characteristics and use syntax and semantics that differ from the equivalent Teradata database. Byte data types, interval, and period are not supported in Amazon Redshift. All of these elements are converted automatically to VARCHAR by default.

TIME can be replaced by TIMESTAMP.

TO_TIMESTAMP converts a TIMESTAMP string to TIMESTAMPTZ.

Source queries can be case-sensitive and really sophisticated. They deserve careful attention while working in stored procedures.

Migration of GRAPHIC character set should be performed manually due to its specifics.

What concerns data migration, Teradata supports various character sets, for example Latin, Unicode, KANJISJIS or KANJI1.

In Redshift SQL the VARCHAR data type supports UTF-8 multibyte characters up to a maximum of four bytes. This is sufficient to transform the source character set we have. Once data is loaded in UTF-8 and all characters are saved. VARCHAR does not support some invalid UTF-8 codepoints. Amazon Redshift provides some ways of loading data from various sources.

Note that the data cannot be loaded from a local work-machine.

  1. Redshift bulk loading using the COPY command
  2. Redshift data loading using INSERT INTO command
  3. Redshift data loading using AWS Services
  4. Using fully-managed data integration platforms

A COPY command is the most efficient way to load a table. You can also add data to your tables using INSERT commands, though it is much less efficient than using COPY. The COPY command is able to read from multiple data files or multiple data streams simultaneously. Amazon Redshift allocates the workload to the cluster nodes and performs the load operations in parallel, including sorting the rows and distributing data across node slices.

The files with the conversion results can be loaded manually into the Amazon S3 basket and then imported.

Database objects and business logic

Tables, views, stored procedures, functions, macros may be converted with the help of the tool.

Conversion of database objects depends on data and datatypes. Ispirer Toolkit automatically checks what data is being processed and select the appropriate built-in functions or, if necessary, add type casts.

Each Teradata database object is unique and specific. Many source objects lack equivalents in the Redshift SQL. For example, Redshift does not support MERGE operator and macros.

Migration tools may be capable of analyzing the structure of source statements and separate them into different commands in Redshift.

Below you can see how we update rows using condition WHERE a1 = a2 and insert rows if values are not matched, by checking them with the help of LEFT OUTER JOIN.

Teradata:
MERGE INTO merged_table USING source_table ON a1 = a2 WHEN MATCHED THEN UPDATE SET b1 = b2 WHEN NOT MATCHED THEN INSERT (a2, b2, c2);
Amazon Redshift:
UPDATE merged_table SET b1 = b2 FROM source_table WHERE a1 = a2; INSERT INTO merged_table SELECT a2, b2, c2 FROM source_table LEFT OUTER JOIN merged_table ON a1 = a2 WHERE a1 IS NULL;

Macros are replaced with a stored procedure in the following manner:

Teradata:
CREATE MACRO macro_example (ins_val VARCHAR(30)) AS ( INSERT INTO table_name VALUES (1, :ins_val); );
Amazon Redshift:
CREATE OR REPLACE PROCEDURE macro_example (ins_val VARCHAR(30)) AS $$ BEGIN INSERT INTO table_name VALUES (1, ins_val); END; $$ LANGUAGE plpgsql;

The Teradata system contains lots of built-in functions and it can be hard to find the match in the target Redshift SQL. For example, the equivalent for case insensitive REGEXP_SIMILAR solution shall be CASE statement with LOWER function.

Teradata:
SELECT REGEXP_SIMILAR('MIke Bird', '(Mike B(i|y)rd)| (Michael B(i|y)rd)', 'i');
Amazon Redshift:
SELECT CASE WHEN LOWER('MIke Bird') ~ LOWER('(Mike B(i|y)rd)| (Michael B(i|y)rd)') THEN 1 ELSE 0 END AS REGEXP_SIMILAR;

Essential amount of Teradata database objects mismatch with the target SQL. Triggers, indexes, check constraints and some specific statements of Teradata are not supported in Amazon Redshift. If you use Ispirer migration tool, it can be able to find out automatically challenging cases and comment them. See below:

Teradata:
INSERT INTO tb_name (c1) VALUES (12); COLLECT STATISTICS tb_name COLUMN (c1);
Amazon Redshift:
INSERT INTO tb_name (c1) VALUES (12); /* Ispirer Comment - Collect Statistics not supported: */ ----COLLECT STATISTICS tb_name COLUMN (c1);

We successfully transfer dynamic SQL, data type’s conversion functions, datetime and interval functions, conditional statements and other objects.

Analyze. Automate. Take advantages of cloud technology.

Tech teams should analyze and find a balance between which tasks should be automated and which ones need human intervention. Seamlessly transmit data and database objects from your existing data sources into Amazon Redshift using tools designed by Ispirer migration professionals.

Due to customization in line with your project needs, the tool can reach the highest automation rate. Use an intelligent software and get all the advantages of cloud technology.

Have a project you are interested in discussing with us? Drop us a line!

Get in touch