Skip to content

Microsoft Azure Data Engineering Associate (DP-203) Study Guide

Menu
  • Contact Us
Menu

Author: Benjamin Goodwin

The Ingestion of Data into a Pipeline – Data Sources and Ingestion

Posted on 2022-10-292024-08-05 by Benjamin Goodwin

The Big Data pipeline has been touched on numerous times; if you need a refresher, refer to Figure 2.30. Data ingestion is the first phase of that pipeline (refer to Figure 3.1). The pipeline itself is a series of purposefully sequenced activities. The activities can and often do span all the Big Data stages, from…

Read more

Design Analytical Stores – Data Sources and Ingestion

Posted on 2022-07-082024-08-05 by Benjamin Goodwin

Before reading any further, ask yourself what an analytical store is. If you struggle for the answer, refer to Table 3.2, which provides the analytical datastores available in Azure, as well as the data model that works optimally with those products. Table 3.1 provides a list of different ingestion types mapped to the most suitable…

Read more

Azure Databricks – Data Sources and Ingestion

Posted on 2022-05-052024-08-05 by Benjamin Goodwin

Azure Databricks generates a metastore by default with the provisioning of a cluster. After creating a cluster, you can query the metastore. Doing so produces a visualization of any table. Figure 3.22 illustrates the retrieval from the metastore using the show tables in <databaseName> command. Executing the command lists all the tables in the targeted…

Read more

Manage – Data Sources and Ingestion

Posted on 2022-03-232024-08-05 by Benjamin Goodwin

The Manage hub includes the enablers to provision, configure, and connect many Azure Synapse Analytics features to other products. From the creation of SQL pools to the configuration of GitHub, read on to learn about each possibility. Analytics Pools When you run a SQL script in Azure Synapse Analytics Studio, some form of compute machine…

Read more

DATA EXPLORER POOLS – Data Sources and Ingestion

Posted on 2022-02-132024-08-05 by Benjamin Goodwin

This feature is in preview at the time of writing, but it should be mentioned for future reference. A Data Explorer pool is a data analytics service for analyzing real‐time, large‐scale data. When you select the Data Explorer Pools link, the page shown in Figure 3.31 appears. FIGUER 3.31 Azure Synapse Analytics Data Explorer pool…

Read more

AZURE PURVIEW – Data Sources and Ingestion

Posted on 2022-01-252024-08-05 by Benjamin Goodwin

Azure Purview was introduced in Chapter 1. Azure Purview is very useful for governance, data discovery, and exploration. Keeping tabs on the data sources you have and what they contain is key to being able to securely manage them. There will be more on this in Chapter 8, which discusses data security and governance. When…

Read more

Design for Incremental Loading – Data Sources and Ingestion

Posted on 2021-12-132024-08-05 by Benjamin Goodwin

Data is ingested in many forms and from a variety of different sources. From a streaming perspective, the data is being generated, sent to a subscriber, and ingested in real time. There is no incremental loading context in the streaming scenario when the data is coming directly from the data producer. Data that is captured…

Read more

Design a Dimensional Hierarchy – Data Sources and Ingestion

Posted on 2021-11-142024-08-05 by Benjamin Goodwin

A dimensional hierarchy is set of related data tables that align to different levels (see Figure 3.19). The tables commonly have one‐to‐many or many‐to‐one relationships with each other. FIGUER 3.19 A dimensional hierarchy The hierarchy shown in Figure 3.19 could be named the “brainjammer dimension.” The frequencies roll up to electrodes, and the electrodes roll…

Read more

Type 2 SCD – Data Sources and Ingestion

Posted on 2021-10-062024-08-05 by Benjamin Goodwin

This SCD type provides a means for viewing historical records of the value prior to an update. This is commonly called versioning. Notice the columns in the top dimension table of Figure 3.16. The SK_ID column is a surrogate key, which is an internal key that identifies the version history of a unique dimensional member….

Read more

Design Slowly Changing Dimensions – Data Sources and Ingestion

Posted on 2021-09-222024-08-05 by Benjamin Goodwin

Managing changes to the data stored on dimension tables over time is an important factor to consider and be handled by designing slowly changing dimensions (SCD). Recognize that the data on the fact table will contain a large amount of data. Making an update to data on that scale is not a prudent approach to…

Read more
  • Previous
  • 1
  • 2
  • 3
  • 4
  • Next

Archives

  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • July 2022
  • May 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • May 2021

Categories

  • ARM TEMPLATE
  • Create an Azure Data Factory
  • DATA EXPLORER POOLS
  • Design Analytical Stores
  • MANAGED PRIVATE ENDPOINTS
  • Microsoft DP-203
© 2025 Microsoft Azure Data Engineering Associate (DP-203) Study Guide All Rights Reserved