Skip to content

Microsoft Azure Data Engineering Associate (DP-203) Study Guide

Menu
  • Contact Us
Menu

Azure Databricks – Data Sources and Ingestion

Posted on 2022-05-052024-08-05 by Benjamin Goodwin

Azure Databricks generates a metastore by default with the provisioning of a cluster. After creating a cluster, you can query the metastore. Doing so produces a visualization of any table. Figure 3.22 illustrates the retrieval from the metastore using the show tables in <databaseName> command. Executing the command lists all the tables in the targeted database.

FIGUER 3.22 A Hive metadata metastore database

To show the details of a table, use the command describe formatted <databaseName>.<tableName>, as shown in Figure 3.23. Notice that the table column names and data types are rendered as well as other metadata, such as the location, owner, and provider.

The metastore is file‐based by default and accessible during and from your Azure Databricks session. Other individuals who use the same workspace will not have access by default. The default Hive metastore is not optimal for team or enterprise Big Data analytics. Instead, you should create an external metastore that runs in a database such as Azure SQL or an Azure Database for MySQL. This way, other team members can discover these tables and use them in their analytics efforts. Creating an external Apache Hive metastore makes its contents more discoverable and scalable for large data analytics workloads.

FIGUER 3.23 A Hive metadata metastore table

Azure Synapse Analytics

Azure Synapse Analytics includes support for accessing an Apache Hive metastore. You can access a Hive metastore by using a Spark pool.

Spark Pool

After learning in Exercise 3.4 how to configure a linked service, you will be able to execute the query shown in Figure 3.24.

The spark.sql queries are similar to those shown in the previous figures concerning metastores. The output of the queries renders the list of databases found in the namespace and the tables within the targeted database.

FIGUER 3.24 A Hive metadata metastore spark pool

 The metastore in your chosen external database must exist before you can perform queries on it. Create the external metastore first, and then you can access it from the Spark pool.

The metadata is retrieved from the same data source when running on an Azure Synapse Analytics Spark pool as when running on an Azure Databricks cluster. This is true when both the Azure Databricks cluster and the Spark pool have been configured to do so. This will become clearer after you complete Exercise 3.4 and Exercise 3.14.

SQL Pool

You also can query metadata in a SQL database, as shown in Figure 3.25.

You can discover which tables are available on a targeted database by selecting the list of databases and tables from the sys schema. It is possible to query files on a SQL pool; however, this most likely will not be on the exam, because that kind of analytics is mostly approached using Spark pools.

FIGUER 3.25 Querying metadata in a SQL database

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • July 2022
  • May 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • May 2021

Categories

  • ARM TEMPLATE
  • Create an Azure Data Factory
  • DATA EXPLORER POOLS
  • Design Analytical Stores
  • MANAGED PRIVATE ENDPOINTS
  • Microsoft DP-203
© 2025 Microsoft Azure Data Engineering Associate (DP-203) Study Guide All Rights Reserved