OLAP Cube and ETL framework with AWS Managed Services
In this post, we'll go through how to create a cloud-based OLAP cube and ETL architecture that produces faster results at cheaper prices without losing performance by:
- Using the cloud to connect your on-premises database for relevant to the development, transformation and discovery.
- OLAP workloads may be run without the requirement for expensive third-party software licencing, specific infrastructure, or data migration.
- Machine learning is used to categorise and display data using AWS Glue Data Catalog, Amazon QuickSight, Amazon Athena, and Amazon SageMaker (ML).
AWS Managed Services data analytics pipeline
AWS Managed Services are used in the suggested architecture
in Figure 1. AWS Glue DataBrew is a no-code data transformation solution that
lets you create transformation tasks easily. AWS Glue crawlers extract metadata
from converted data and categorise it for Athena and QuickSight analytics and
visualisation. SageMaker will create, train, and deploy machine learning models.
This strategy will assist you in getting answers from your
collection to your consumers as quickly as possible without requiring you to
convert your data to AWS. Because no code is required, you can easily use data
transformation, cataloguing, analytics, and machine learning.
AWS Managed Services Benefits for Data Analytics
On-demand access to on-premises database
The online transaction processing (OLTP) database in your
company data centre is the starting point for the sample architecture in Figure
1. Figure 2 illustrates how to connect an OLTP database to DataBrew on AWS to
perform OLAP workloads using a Java database connectivity (JDBC) connection.
For typical data stores including MySQL, Microsoft SQL
Server, Oracle, and DataBrew supports JDBC data sources, PostgreSQL.
Data discovery is done automatically.
DataBrew summarises your data for discovery in Figures 3
through 6. You may profile your data to see if there are any trends or
abnormalities. You may also use over 250 built-in transforms to perform transformations
called "jobs" in DataBrew without writing any code.
Transformation of data and categorising without the use of codes
You may build jobs based on the transformation stages
illustrated in Figure 6 to conduct OLAP-type transactions. DataBrew recipe refer
to all of these procedures taken together. These recipe results can be saved to
an Amazon Simple Storage Services (Amazon S3) bucket by running them as a
job.
Scheduled DataBrew tasks work in a similar way as OLAP's
scheduled ETL pipelines. DataBrew may conduct a task on a recurrent basis based
on data refresh and business requirements (for example, every 11 hours). This
can be scheduled at a certain time of day or according to a valid
CRONs phrase. This makes it easier to automate your transformation
processes.
The OLAP catalogue is a collection of data that stands
between the OLAP data and the programs that use it. You may use AWS Glue
crawlers to automatically categorise your data and establish its format,
schema, and related characteristics to generate a Data Catalog. Figure 7
depicts the results of a crawler's findings published to Data Catalog as
metadata to assist data consumers in finding the information they require.
Data management without the use of software from third party
By using Athena to refer to the metadata definitions in the
Data Catalog as references to the real data in Amazon S3, you may conduct
analytics on your data. Without having to transfer data around, Athena is
ideally suited for executing one-time queries using conventional SQL to query
the converted data immediately in Amazon S3. Because Athena is serverless, you
don't have to worry about maintaining infrastructure, and you just pay for the
queries you execute.
Separate visualisation and management intelligence (MI)
technologies are frequently used to enhance OLAP workloads. These programmes
frequently have their own licencing, server administration, and security
requirements.
A scalable, QuickSight, virtualization,
embeddable, ML-powered of ML BI solution, may be used to view curated
data. As demonstrated in Figure 8, QuickSight makes it simple to design and
publish immersive BI dashboards that feature ML-powered insights. These
dashboards may be shared with several other users and integrated into your own
software.
Conclusion
In this article, we'll teach you how to use a JDBC
connection to connect your on-premises database to DataBrew for data profiling,
discovery, and transformation. We looked at how DataBrew recipes and processes
may be used to execute OLAP workloads without the need for expensive
third-party software licencing, specialised infrastructure, or data migration.
Without needing to maintain any servers, we looked at AWS capabilities in data
cataloguing, visualisation, and Data Catalog through machine
learning, Athena, QuickSight, and SageMaker.
Author: BIKKI SHAW
0 Comments
If you have any doubt, Please let me know.