Azure Data Factory is a tool provided by Microsoft Azure, making the ETL process hassle-free. One does not need to learn to code for data integration and data transformation, Azure Data Factory offers a code-free UI for the user, enabling the user to drive business and IT-led Analytics/BI.
In the article, we are going to demonstrate how to integrate GitHub with Azure Data Factory, followed by how to restore an ADF instance using GitHub.
A burning question might come to your mind, i.e., why integrate with GitHub? It's because whenever we publish the changes in ADF, if it is connected with a GitHub account a backup of the ADF is created on GitHub. You can also find versioning options on GitHub.
Integrating GitHub with Azure Data Factory enables version control and backs up our Data Factory Instance even without validation is the biggest plus point. To integrate your GitHub with Azure Data Factory, just follow these steps.
Step 1. Log in to your GitHub account. Click on “New.”
Step 2. In the repository name, type “ADFBackup.” Then select private or public as per your preference.
Step 3. Click on “Create Repository.”
Step 4. Log into your Azure account. Open Azure Data Factory Instance. Click on “Launch Studio.”
Step 5. From the left side vertical menu, click on “Manage.”
Step 6. Click on “Git configuration” and then “Configure.”
Step 7. Open the drop-down menu under the heading “Repository type” and select “GitHub.”
Step 8. In the GitHub repository owner, enter your GitHub username and click on “Continue.”
Step 9. Click on the drop-down menu under Repository Name and select the repository we created in Step 3.
Step 10. Select the Collaboration branch as “main” and select Import resource into this branch as “main” and click on “Apply.”
Step 11. In the next step, select the Working branch as “main” and click on “Save”. You can create a new Working branch here or later on as well.
Step 12. Then, head back to the author mode from the left-hand side vertical menu. You have two options, you can either click on “Save all” or “Publish.”
Note: By clicking on “Save all” you can save the pipelines, datasets, etc. without validating them. They will be saved on the GitHub repository. By clicking on “Publish”, all the changes will be pushed to the Live Data Factory. But, you have to validate all (means no error) before publishing it. There will be a new branch created on the GitHub repository named adf_publish, which will contain a JSON file to restore our current data factory instance.
Step 13. By clicking on the top left-hand side drop-down menu, you can switch between GitHub repository and live data factory mode.
After the integration, it's important to understand how this integration helps us restore our Azure Data Factory. Follow the steps below:
Now, we have learned how to integrate GitHub with Azure Data Factory and use that integration to restore the Azure Data Factory instance. It has a lot of underlying advantages like version control, collaboration, continuous integration and continuous deployment, branching strategies, code reuse, and biggest of them all disaster recovery.