Latest Posts

How to use Delete Activity in ADF?


In Azure Data Factory (ADF), a Delete Activity is used to remove files or folders from a file-based store (like Azure Blob Storage, Azure Data Lake Storage, Amazon S3, SFTP, etc.) based on specified conditions. This activity is useful for cleaning up data that is no longer needed as part of a data processing workflow.

ADF – Important Considerations for Delete Activity

Definition:

This is an activity in Azure Data Factory, using which you can delete files and folders from on-premises or online data stores. In short, you can use this activity for the cleanup.

  • Deleted files cannot be restored unless the soft delete is enabled for storage.
  • Be cautious and backup your files before running this activity. Before applying in production, test the Delete Activity in a development environment to ensure it only deletes the intended files or folders and does not affect other data.
  • Make sure you are not deleting and creating the file at the same time.
  • Make sure the ADF managed identity or the service principal you are using has the necessary permissions to delete files in the storage account.

ADF – Delete activity implementation

Requirement:

- Delete files from Azure BLOB storage.

Implementation:

  • Create a dataset. This is the source location from where the files/folders will be deleted. As shown in the screenshot below, you can either select a single file or folder.
  • Add a Delete activity.
  • Go to the Source tab. There are a few properties you need to set.
    • Dataset: This is the location of the dataset that needs to be deleted entirely/partially.
    • File path type:
      • File path in dataset: Consider the selected dataset as the location of the files to be deleted.
      • Wildcard file path: Select specific files from the selected dataset using wildcards (e.g., *.csv, will pick all the csv files from the selected path.).
      • Prefix: File or folder name starting with specific name (e.g., Files starting with account_).
      • List of files: Point to a text file that lists each file (relative path to the path configured in the dataset) that you want to delete.
    • Filter by last modified: The files with last modified time in the range [Start time, End time) will be filtered for further processing.
    • Recursively: Process all files in the input folder and its subfolders recursively or just the ones in the selected folder. This setting is disabled when a single file is selected.
    • Max concurrent connections: The upper limit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.
    • Retention time: Set a retention period, so files older than this period will be deleted.
  • Publish Changes: After configuring the Delete Activity, remember to save and publish your changes to make the activity part of your ADF pipeline.

ADF Delete Activity - Advanced Options

  • Dynamic Content: You can use dynamic content in the Delete Activity to make its operation more flexible. For instance, you can dynamically set the folder path or file name to be deleted based on pipeline parameters or activity outputs. This is particularly useful in scenarios where the data to be deleted varies from execution to execution.
  • Dependency Conditions: In a complex pipeline, the Delete Activity might depend on the successful completion of previous activities. ADF allows you to configure dependency conditions to ensure the Delete Activity only runs after certain conditions are met, enhancing your pipeline's robustness.
  • Logging and Monitoring: Configure diagnostic settings for your Data Factory to capture detailed logs of your Delete Activity executions. Monitoring these logs can help you audit data deletions and troubleshoot issues if the activity does not behave as expected.

Usage Scenarios of ADF Delete Activity

  • Cleanup Temporary Files: After processing data, use the Delete Activity to remove temporary files that were generated during the process.
  • Maintain Folder Size: Regularly delete old files based on the retention policy to prevent storage from growing uncontrollably.


We value your Feedback:

Page URL:

Name:

Email:


Suggestion:

© 2024 Code SharePoint