Calling Logic Apps from Data Factory securely

This one is going to be about a setup I recently tested with a customer: an Azure Data Factory pipeline needs to send email notifications via Azure Logic Apps, and it needs to do so by securing the network between the Integration Runtime and the Logic App that sends the email.

The challenge here is that Azure Data Factory doesn’t support native managed private endpoints to Logic Apps, but we will use the template for Azure Functions endpoints, and we will see that it works fine. I am not sure if this is officially supported by Microsoft though (trying to find out). We will use some CLI commands (no blog post without Azure CLI) to verify that the DNS information supplied by the managed private endpoint is enough to make our setup work.

Using public endpoints

The initial architecture we are starting from is the following:

A Microsoft-managed integration runtime (IR) does some work with data stored in Azure Blob Storage (I used this tutorial), and when it is done it triggers a Logic App via HTTP to send an email notification.

This design might be perfectly fine for many customers, but others might want to reduce the exposure surface of the Logic App and the Blob Storage:

  • Both PaaS services offer firewalling functionality on their public endpoints, to limit network access.
  • However, given the fact that the default Microsoft-managed Integration Runtimes are multitenant, other IRs might have network access to those services (of course, they wouldn’t be able to authenticate).

The pipeline

To give you a better idea of what we are doing here, here is a screenshot of the very basic pipeline of my test. It consists of two steps:

  • Do some operations on blob storage
  • Invoke the Logic App via a HTTP call. Here, notice the URL of the HTTP call: adftest12345.azurewebsites.net, it will be important later

Using IRs in managed VNets

To satisfy these more stringent requirements, and still not having to deploy our own Self-Hosted Integration Runtimes (that would mean deploying VMs and having to install stuff there), there is a step in between offered by Azure Data Factory and Synapse: a single-tenant, Microsoft-managed Integration Runtime, that is deployed in a Microsoft-managed VNet. To put it on a diagram, it would look like this:

The following changes will have to be made to the design:

  • The Integration Runtimes should be provisioned in a managed VNet, and are single-tenant.
  • The managed VNet is not visible in the user’s subscription, but there is still certain control to it. For example you can create so-called “managed private endpoints”, which are exactly like the standard private endpoints in your own VNets, only that you cannot see them either. But there are some workarounds to this lack of visibility, as we will see later.
  • The Azure Storage account and the Logic App will have their public endpoints disabled

After deploying a new Integration Runtime in a managed VNet, we have two IRs at our disposal:

Disabling Public Endpoint in Azure Storage

We start by associating the storage account linked service to the new Integration Runtime, to make sure that traffic to and from the storage account is going to be sourced in the managed VNet. Now we will disable the public endpoint of our Azure Storage account:

The pipeline will now fail, and the error message allows to identify what went wrong:

❯ az datafactory pipeline-run show -g $rg --factory-name $adf_name --run-id $run_id                                                                                                                                                         
{ 
  [...]                                                                                                                                                                                                                                          
  "message": "Operation on target CopyFromBlobToBlob failed: ErrorCode=AzureBlobOperationFailed,
  'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Blob operation Failed.
  [...]
  Message=The remote server returned an error: (403) Forbidden.
  [...]
  "status": "Failed"
}

Obviously, even if the traffic is coming from the managed VNet, it is still trying to reach out the storage account over the Internet. We need to create a managed private endpoint to the Azure Storage account:

Now we get through the first pipeline task!

Disabling Public Endpoint in the Logic App

Let’s now disable the Logic App public endpoint:

Now the second step of the Azure Data Factory pipeline will fail, since it cannot reach the Logic App endpoint:

❯ az datafactory pipeline-run show -g $rg --factory-name $adf_name --run-id $run_id
{
  [...]
  "message": "Operation on target Web1 failed: 
  [...]
  

Error 403 - Forbidden

The web app you have attempted to reach has blocked your access.

[...] "status": "Failed" }

We need to create a managed private endpoint for our Integration Runtime. However, the portal only offers the option for Azure Functions!

If you continue with the wizard though, you will be able to select your Logic App and create the endpoint:

Inspecting the managed private endpoint

As you might know, DNS plays a very important role in Azure Private Link. In essence, our Integration Runtime needs to be able to resolve the Logic App FQDN adftest12345.azurewebsites.net to a private IP address, and that depends on DNS configuration in the managed VNet, that we can’t see. However, Azure CLI tells us some interesting stuff about the endpoint:

❯ managed_vnet_name=$(az datafactory managed-virtual-network list -g $rg --factory-name $adf_name --query '[0].name' -o tsv)

❯ az datafactory managed-private-endpoint list -g $rg --factory-name $adf_name --managed-virtual-network-name $managed_vnet_name -o table
Reference and support levels: https://aka.ms/CLI_refstatus
Name                 ResourceGroup
-------------------  ---------------
AzureBlobStorage682  adf
AzureFunction544     adf

❯ az datafactory managed-private-endpoint show -g $rg --factory-name $adf_name --managed-virtual-network-name $managed_vnet_name -n AzureFunction544
{
  [...]
    "fqdns": [
      "adftest12345.azurewebsites.net",
      "adftest12345.scm.azurewebsites.net"
    ],
  [...]
}

The most important piece of information here is that the managed endpoint call tells us that it mapped exactly the FQDN that we need to call our app (adftest12345.azurewebsites.net) to the managed VNet, so if we now run the pipeline again, it should be able to call the Logic App and complete successfully.

So what?

Hopefully, in this blog post I was able to give you some hints of how to troubleshoot managed private endpoints, and a way to network Azure Data Factory and Azure Logic Apps with private connectivity. The last goody are the CLI commands to retrieve additional information about the managed private endpoint, which is not available in the ADF Studio portal, such as the FQDNs.

Did I forget anything important? Let me know!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: