Contact Us

Kockpit is here to help you

Business Form

How to Integrate and Restore Azure Data Factory via GitHub

Azure Data Factory is a tool provided by Microsoft Azure, making the ETL process hassle-free. One does not need to learn to code for data integration and data transformation, Azure Data Factory offers a code-free UI for the user, enabling the user to drive business and IT-led Analytics/BI.

In the article, we are going to demonstrate how to integrate GitHub with Azure Data Factory, followed by how to restore an ADF instance using GitHub.

A burning question might come to your mind, i.e., why integrate with GitHub? It's because whenever we publish the changes in ADF, if it is connected with a GitHub account a backup of the ADF is created on GitHub. You can also find versioning options on GitHub.

Steps to Integrate GitHub With Data Factory

Integrating GitHub with Azure Data Factory enables version control and backs up our Data Factory Instance even without validation is the biggest plus point. To integrate your GitHub with Azure Data Factory, just follow these steps.

Step 1. Log in to your GitHub account. Click on “New.”

Step 2. In the repository name, type “ADFBackup.” Then select private or public as per your preference.

Step 3. Click on “Create Repository.”

Step 4. Log into your Azure account. Open Azure Data Factory Instance. Click on “Launch Studio.

Step 5. From the left side vertical menu, click on “Manage.

Step 6. Click on “Git configuration” and then “Configure.”

Step 7. Open the drop-down menu under the heading “Repository type” and select “GitHub.

Step 8. In the GitHub repository owner, enter your GitHub username and click on “Continue.

Step 9. Click on the drop-down menu under Repository Name and select the repository we created in Step 3.

Step 10. Select the Collaboration branch as “main” and select Import resource into this branch as “main” and click on “Apply.”

Step 11. In the next step, select the Working branch as “main” and click on “Save”. You can create a new Working branch here or later on as well.

Step 12. Then, head back to the author mode from the left-hand side vertical menu. You have two options, you can either click on “Save all” or “Publish.”

Note: By clicking on “Save all” you can save the pipelines, datasets, etc. without validating them. They will be saved on the GitHub repository. By clicking on “Publish”, all the changes will be pushed to the Live Data Factory. But, you have to validate all (means no error) before publishing it. There will be a new branch created on the GitHub repository named adf_publish, which will contain a JSON file to restore our current data factory instance.

Step 13. By clicking on the top left-hand side drop-down menu, you can switch between GitHub repository and live data factory mode.

Steps To Restore Data Factory Instance

After the integration, it's important to understand how this integration helps us restore our Azure Data Factory. Follow the steps below:

  • Open Data Factory Instance. Click on “Launch Studio”.

  • From the left-hand side vertical menu, click on “Manage”, then “ARM template”.

  • Click on “Import on Azure portal.”

  • Tap on “Build your own template in the editor.”

  • Open browser. Log into GitHub. Click on the repository we selected for our data factory. And open the branch and select adf_publish.

  • Click on the folder having the name same as your Data Factory as shown below.

  • Click on “ARMTemplateForFactory.json”.

  • Click on the three dots on the right and side. Then tap on “Download.”

  • Go back into the browser where the Data Factory is open, having the following screen.

  • Click on Load File. Browse into the Downloads and select “ARMTemplateForFactory.json”. Click Open.

  • Click on “Save”.

  • Then head back to the browser where GitHub is open. And download “ARMTemplateParametersForFactory.json”.

  • Open the browser of Data Factory. Click on “Edit parameters”.

  • Click on Load File. Browse to the Downloads folder and select “ARMTemplateParametersForFactory.json”. Click on Open.

  • Click on Save.

  • Select “Resource group” the same as the resource group in which the Data Factory is created.

  • Now, for the connection string, open the storage account in which linked services are created. Also, all the resources should be in the same Resource group. Scroll down the left-hand side menu and select Access keys.

  • Click on the Show box under the Connection string (key1) and click on “Copy.”

  • Paste this connection string in the data factory’s linked services connection string. Click on “Review + Create.

  • Click on “Create.”

  • As you can see in the image below, you will see the message ‘Deployment Succeeded on the top right of the page.

Conclusion

Now, we have learned how to integrate GitHub with Azure Data Factory and use that integration to restore the Azure Data Factory instance. It has a lot of underlying advantages like version control, collaboration, continuous integration and continuous deployment, branching strategies, code reuse, and biggest of them all disaster recovery.