Latest Posts

How to use Get Metadata Activity in ADF?


The Get Metadata activity in Azure Data Factory (ADF) is a powerful tool for obtaining metadata about data stored in various locations. This activity can be used to extract information such as file existence, file size, file count, and so on, before performing data integration tasks. This capability is crucial for conditional operations, like checking if a file exists before attempting to copy it, or for iterating over files in a directory.

ADF- What is Get Metadata activity in ADF?

Definition:

This is an activity in Azure Data Factory, using which you can get the metadata of any data.

  • You can use the output of the Get Metadata activity in subsequent activity or condition.
  • You should have LIST/EXECUTE permission on the folder while using Get Metadata activity on that folder.
  • Wildcard filters are not supported for files or folders in Get Metadata activity.

ADF – Get Metadata activity implementation

Requirement:

- Create a Get Metadata activity in ADF

Implementation:

  • Go to Data Factory and create a new Get Metadata activity.
  • As you can see in the screenshot above, there are a few things that you need to set up.
    • General: Assign a name and description to the activity.
    • Dataset: Choose or create a dataset that points to the data store you wish to get metadata from.
    • Field List: Specify the metadata fields you want to retrieve. Options include:
      • Child Items: Retrieves a list of files and folders in the specified dataset.
      • Exists: Checks if the file/folder exists. When you want to validate that a file, folder, or table exists, specifyexistsin the Get Metadata activity field list. You can then check theexists - true/falseresult in the activity output. Ifexistsis not specified in the field list, the Get Metadata activity will fail if the object is not found.
      • Size: Obtains the size of the file.
      • Last modified: Gets the last modified timestamp of the file/folder.
      • ContentMD5: Provides the MD5 hash of the file content for Azure Blob Storage.
      • Structure: Returns the column structure for tabular datasets.
    • Filter by last modified: The files with last modified time in the range [Start time, End time) will be filtered for further processing. These properties can be skipped which means no file attribute filter will be applied
      Note:
    • If you set up this property, then only the child items of the selected path will be returned. It will not give metadata from the subfolder items.
  • Use the Output in Subsequent Activities: You can reference the output of the Get Metadata activity in subsequent activities for dynamic behaviors. For example, you can use an If Condition activity to perform actions based on whether a file exists.
  • ADF - Get Metadata activity Use Cases

    • Example 1: Check if a File Exists Before Copying Use the Get Metadata activity to check the existence of a file in a source location. If the file exists (Exists = true), then proceed with a Copy activity to move the file to a destination.
    • Example 2: Get List of Files for Processing If you need to process multiple files in a folder, first use the Get Metadata activity to get a list of files (Child Items). Then, use a ForEach activity to iterate over these files, processing each one individually.
    • Example 3: Conditional Execution Based on File Size Retrieve the size of a file using the Get Metadata activity. Then, use an If Condition activity to execute different branches of your pipeline based on the file size (e.g., only process files larger than a certain threshold).

    We value your Feedback:

    Page URL:

    Name:

    Email:


    Suggestion:

    © 2024 Code SharePoint