Skip to content

Microsoft Azure Data Engineering Associate (DP-203) Study Guide

Menu
  • Contact Us
Menu

Design Analytical Stores – Data Sources and Ingestion

Posted on 2022-07-082024-08-05 by Benjamin Goodwin

Before reading any further, ask yourself what an analytical store is. If you struggle for the answer, refer to Table 3.2, which provides the analytical datastores available in Azure, as well as the data model that works optimally with those products. Table 3.1 provides a list of different ingestion types mapped to the most suitable Azure data analytics product. The reason to call attention to the information in Table 3.1, which is focused on ingestion, is that you will notice that many of the same analytical datastores in Table 3.2 are also optimal for certain types of ingestion. Combine those two tables with Table 3.3, and you can narrow down which Azure data analytics product to use in your scenario.

Consider a few additional scenarios using the information you have learned in this chapter. For example, does your data analytics solution need to supply data to a hot path serving layer? Or does your solution require massively parallel processing (MPP)? Table 3.6 provides some information about these options, which, when used in addition to the other tables, should be very helpful in picking the necessary analytical store for your solution.

TABLE 3.6 Hot path serving layer and MPP products

ProductHot path servingMassively parallel processing (MPP)
Azure Synapse SQL poolYesYes
Azure Synapse Spark poolYesYes
Azure Data ExplorerYesYes
Azure Cosmos DBYesNo
Azure Analysis ServicesNoYes

An analytical store is a place where you store data used by your data analytics solutions, from end to end. An analytical store houses data regardless of the Big Data stage (as illustrated by Figure 2.30) and regardless of the data landing zone (as identified in Table 3.4).

Design Metastores in Azure Synapse Analytics and Azure Databricks

A metastore is a place to store metadata. Earlier in this chapter you learned about the metadata associated with files, such as creation date, size, and update date. Metadata is also available for the objects stored within a database, for example, views, relationships, schemas, and tables. You can access the metadata for your database objects in the metastore. There are numerous tools that will graphically illustrate the objects in a database, including Azure Data Studio and Microsoft SQL Server Management Studio. However, not all DBMSs or analytical datastores have such features. Querying the metastore is the only means for discovering the existence and structure of tables. Metastores also help you begin to visualize database details that help you figure out what is contained in the database and how the data can be used. Remember that data is simply organized characters in a flat file, managed by a DBMS. Without any means for analyzing the data and the structure of the data in a file, it would be hard to get value from it. Queries such as select * from sys.tables or describe formatted <tableName> will render table and schema information on a SQL database.

Since Apache Spark workloads are most often concentrated on data files, there needs to be some kind of mechanism for creating virtual databases and tables on top of your file data. The product that stores metadata and creates this virtualization is called Apache Hive. The database and table listings shown in Figure 3.21 are made possible by the metadata stored in Hive. Azure Databricks, for example, uses Hive to build the content for the Data menu item, as shown in Figure 3.21.

FIGUER 3.21 A Hive metadata metastore

The list of databases and tables are retrieved from the default file‐based Hive metastore loaded on a data share. See the next section for more about this Hive metastore using Azure Databricks.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • July 2022
  • May 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • May 2021

Categories

  • ARM TEMPLATE
  • Create an Azure Data Factory
  • DATA EXPLORER POOLS
  • Design Analytical Stores
  • MANAGED PRIVATE ENDPOINTS
  • Microsoft DP-203
© 2025 Microsoft Azure Data Engineering Associate (DP-203) Study Guide All Rights Reserved