Advanced Big Data Testing using Hive and HQL
Course Summary
This one day course of lectures and hands-on training is designed to provide students with advanced techniques necessary for testing big data environments. The course covers advanced HQL transformations and the challenges these issues cause in testing big data scenarios.
Intended Audience
- Data Quality Teams
- Data Warehouse Analysts
- Automation Engineers
- Quality Assurance Analysts
- Project Managers
- anyone involved with providing software quality for big data projects
- Course Objectives
- Course Outline
- Prerequisites
At the end of the course, you will be able to:
- understand big data structures and architectures
- implement a successful process for big data testing
- create and execute more sophisticated transforation tests
- utilize regular expressions for data comparisons
- create and utilize subqueries
- work with derived tables and inlined views
- take advantage of advanced techniques for big data techniques
- create tests for unstructured or semi-structured data
Big Data Overview
- Understanding Big Data Architecture
- Understand the challenges of Big Data Testing
- Understanding ETL Mapping Documents
- Overview of Transformation Types
- Big Data Comparison Methods
Calculated Fields Transformation Test
- Aggregate Functions with Group By statement
- Compare Calculated Source fields with grouping to target field.
Derived Fields Transformation Test
- Discuss the differences between calculated and derived fields
- Implement variations of SubQueries (Nested, Scalar, Correlated, Non-Correlated, Inline)
- Compare a target field from a derived field from the source data.
Field Length Limits Transformation Test
- Get table information using Describe command
- Calculate maximum size of Field Mergers
- Calculate maximum size of Field Splits
- Validate maximum size of source data split into separate fields into the target database
Field Padding Transformation Test
- String Padding Functions
- SQL Regular Expression Functions
- Verify erroneous source data has been padded correctly in target table
XML Transformation Test
- Usage of the Extract function
- Discuss relevance of XPATH
- Database specific casting functions
- Utilizing XML functions to form result set from XML content
- Compare source tables to XML content in a target table
Transpose Transformation Test
- Utilization of Self Joins
- Compare transposed source data to a target table
Match and Merge Transformation Test
- Utilization of Unions
- Compare multiple source records that need to be matched and then merged into a target table.
- Understanding of basic ETL testing processes
- Basic HQL knowledge or have taken Introduction to Big Data Testing using HQL