Posts on Journey through Cloud & Code

HandsOn — Building Hybrid Cloud Environment — Part 5— Connectivity— Site-to-Site VPN establishing…

Sat, 06 Jun 2026 00:00:00 +0000

In the previous part, we established an on-premises identity foundation. The on-premises setup consists of a virtual network with Windows and Linux VMs joined to an on-premises Active Directory domain hosted on two domain controllers. In this part, we will create a VPN Gateway in Azure and a StrongSwan IPsec gateway on-premises and establish the Site-to-Site VPN tunnel — the foundation of our hybrid lab.

Implementing a Site-to-Site (S2S) tunnel is simple — so rather than walking through the steps procedurally, I want to focus on what each component is actually doing.

A Site-to-Site VPN connects two networks over the public internet using an encrypted IPsec tunnel. Each end has a gateway that authenticates the other using a Pre-Shared Key (PSK). Only traffic destined for the remote subnet goes through the tunnel — everything else uses the normal internet route. The tunnel is managed by IKE (Internet Key Exchange) which negotiates the Security Association (SA) — the agreed encryption parameters — before any traffic flows.

Before walking through the steps, here are the key addresses we’ll reference throughout

192.168.122.0/24— On-prem network hosting KVM VMs (virbr0)
192.168.1.106 — on-premises VPN gateway (Strongswan)
192.168.1.0/24 — on-premises Wi-Fi LAN
61.69.136.49 — **On-premises public IP
**20.219.67.227 — **Azure VPN Gateway Public IP
**10.66.0.0/16 — Azure VNet
10.66.0.0/24 — Azure GatewaySubnet
10.66.5.0/24 — Azure WorkloadSubnet

Some of these will be created in the steps below; others are already in place from earlier parts.

Azure Configurations

1. Create a virtual network in Azure

While creating a vnet using Azure portal, decide on an address range and create two subnets named GatewaySubnet and WorkloadSubnet as show below. In WorkloadSubnet we will create VMs that want to talk to on-premises.

Once the VNet is provisioned, we need three components at Azure end that enable the connectivity with on-premises — a VPN gateway, a Local Network Gateway and a link between these two components

2. Azure VPN Gateway

This is the component that is responsible for establishing a secure tunnel with the on-premises. VPN Gateway is deployed in the virtual network you have chosen to pair with your on-premises network and has to be deployed in GatewaySubnet only — this is a hard requirement. Azure reserves this subnet name specifically for gateway infrastructure and rejects deployment attempts to any other subnet name. All traffic comes in and goes out via this VPN Gateway if force tunneling is enabled. Incoming traffic lands in the GatewaySubnet and from there it will be routed to the destination within the VNet.

For our purpose, a Basic tier VPN gateway would suffice. The Azure portal no longer shows a VPN type selector — all new gateways are route-based by default which support IKEv2. This is what we will use.

Remember that on top of fixed monthly charge, there are costs associated with traffic entering and leaving the network via VPN Gateway — https://azure.microsoft.com/en-us/pricing/details/vpn-gateway/

Once the VPN Gateway is created, take a note of the public IP assigned to it. In my case it is — 20.219.67.227

3. Local Network Gateway

Now that we have a gateway in our Azure VNet, we need a way to identify the on-premises network. For this, an Azure service called Local Network Gateway is used. This is the representation of on-premises network. When you create a Local Network Gateway provide the static IP of your on-premises as the IP address and the network ranges that you want to include in the tunnel as address ranges —

IP address — 61.69.136.49 (public IP of your Wi-Fi router), you can confirm this by running the below command

curl -s ifconfig.me # 61.69.136.49

Note: This address can change when your ISP reassigns it, which typically happens on router restart or DHCP lease expiry. For home lab, this is fine but if you are having a serious setup you must consider getting static IP for yourself.

Address Space(s) —

**192.168.122.0/24** (the libvirt virtual network) &**192.168.1.0/24** (your Wi-Fi network)

Note: You can skip the Wi-Fi range if you do not intend to have other devices in your Wi-Fi to participate in the S2S tunnel

4. Connection

And the final bit of the Azure end of configuration is a Connection. A connection is the link between the VPN Gateway and the Local Network Gateway.

From VPN Gateway, create a connection of type “Site-to-Site (IPSec)” and choose IKEv2, provide the shared key (PSK), connection mode and leave the rest as it is.

On-premises configuration

On-premises needs VPN Gateway configurations similar to the Azure site. The on-premises configuration is simpler by comparison. We will use StrongSwan as the VPN Gateway and the following sections walk through the necessary configurations to enable a site-to-site tunnel

5. Install / Configure StrongSwan

StrongSwan is an open-source IPsec implementation for Linux. It runs as a daemon on the on-premises host and is responsible for IKE negotiation, SA establishment, and installing the resulting XFRM policies and keys into the Linux kernel.

Install StrongSwan and ensure it is running

sudo apt install strongswan
sudo systemctl enable strongswan-starter
sudo systemctl start strongswan-starter
sudo systemctl status strongswan-starter

6. ipsec.conf

This is StrongSwan’s main configuration file — it defines the tunnel connection parameters including peer identities, the subnets to advertise on each side, the encryption proposals, and the connection behaviour on startup and failure.

sudo nano /etc/ipsec.conf

Minimal config:

config setup
charondebug=“ike 2, knl 2, cfg 2”

conn azure-s2s
keyexchange=ikev2
left= #Linux box with Strongswan
leftid= #Linux box with Strongswan
leftsubnet=,
right=
rightid=
rightsubnet=
authby=secret
auto=start
ike=aes256-sha256-modp1024! (acceptable for lab environments)
esp=aes256-sha256!
dpdaction=restart
dpddelay=30s
dpdtimeout=120s

Each attribute controls a specific aspect of how StrongSwan negotiates and maintains the tunnel:

**keyexchange=ikev2** — specifies IKEv2 as the key exchange protocol. IKEv2 is more efficient than IKEv1 (fewer round trips to establish the SA) and handles NAT traversal natively, which matters here since the on-premises side is behind a home router.

**left** / **leftid** — identifies the local end of the tunnel. left is the IP StrongSwan binds to; leftid is how it identifies itself to the remote peer during IKE negotiation. Both are set to the StrongSwan host’s LAN IP here.

**leftsubnet** — defines what on-premises ranges StrongSwan advertises through the tunnel. These must match the address spaces configured in the Azure Local Network Gateway — Azure uses the LNG configuration to inject routes into the VNet.

**right** / **rightid** — the mirror of left, identifying the remote peer — in this case the Azure VPN Gateway’s public IP.

**rightsubnet** — the network ranges behind the Azure VPN Gateway that on-premises should route through the tunnel. Traffic destined for these ranges will be intercepted by XFRM and encrypted.

**authby=secret** — use a Pre-Shared Key for authentication, as configured in ipsec.secrets.

**auto=start** — bring the tunnel up automatically when StrongSwan starts. Setting this to add instead would make StrongSwan a passive responder only.

**ike=aes256-sha256-modp1024!** — the Phase 1 (IKE SA) proposal: AES-256 encryption, SHA-256 integrity, and Diffie-Hellman group 2 (modp1024 — acceptable for lab environments). The trailing ! means this is the only proposal offered — StrongSwan will not fall back to weaker algorithms. Azure must match this exactly.

**esp=aes256-sha256!** — the Phase 2 (ESP) proposal governing how actual data packets are encrypted inside the tunnel. Same strict-match semantics as the ike line.

**dpdaction=restart** — Dead Peer Detection behaviour. If the remote peer goes silent, StrongSwan will attempt to re-establish the tunnel rather than leave a stale SA. dpddelay and dpdtimeout control how long it waits before declaring the peer dead.

Example —

config setup
charondebug=“ike 2, knl 2, cfg 2”

conn azure-s2s-manual
keyexchange=ikev2
left=192.168.1.106
leftid=192.168.1.106
leftsubnet=192.168.122.0/24,192.168.1.0/24
right=20.219.67.227 # (hce-d01-vpngw-pip)
rightid=20.219.67.227 # (hce-d01-vpngw-pip)
rightsubnet=10.66.5.0/24
authby=secret
auto=start
ike=aes256-sha256-modp1024!
esp=aes256-sha256!
dpdaction=restart
dpddelay=30s
dpdtimeout=120s

**_rightsubnet_** = where your workloads live = what you want to reach.

_GatewaySubnet_ is infrastructure — it’s where Azure’s VPN Gateway itself runs. You never deploy VMs there, and you never put it in _rightsubnet_. It’s not a destination, it’s a transit point.

So the mental model:

rightsubnet = subnets behind the remote gateway
= where the actual VMs/services are
= NOT the gateway’s own subnet

Same logic applies symmetrically to _leftsubnet_ on your side — it’s the subnets behind your StrongSwan (your VM network, your LAN), not StrongSwan’s own IP.

The gateway subnet on both sides is implied — both ends know the gateways exist because they’re talking to each other. What they need to tell each other is “what’s behind me that you can reach.”

7. ipsec.secrets

The ipsec.secrets file is read by Charon — StrongSwan’s IKEv2 keying daemon — at startup and on ipsec reload. It holds the Pre-Shared Key used to authenticate both peers during IKE Phase 1. The format is:

: PSK “shared-secret”

The two IPs identify the tunnel endpoints — they must match the leftid and rightid values in ipsec.conf exactly, because charon looks up the secret by matching the peer identities presented during IKE negotiation. The PSK itself must match what was configured in the Azure Connection resource, character for character.

This file does not change structure over the lifetime of the tunnel. The only reason to update it is if you rotate the PSK — in Azure you set a new shared key on the Connection, then update the value here and run sudo ipsec reload secrets to pick it up without restarting the daemon or dropping the tunnel.

One important operational note: this file contains a plaintext secret and should be owned by root with permissions 600. StrongSwan will warn if it is world-readable.

sudo nano /etc/ipsec.secrets

192.168.1.106 20.219.67.227 : PSK “your_shared_key”

We have setup all necessary infrastructure to bring up the tunnel now. But before that, let us understand a bit about how tunnels are established

Under the Hood: The Mechanics of Tunnel Initiation

The S2S tunnel was initiated from on-premises —
When StrongSwan sends the initial IKE packet outbound (src: 192.168.1.106:500 → dst: 20.219.67.227:500), the Wi-Fi router performs source NAT — replacing 192.168.1.106 with the public IP 61.69.136.49 — and records this translation in its conntrack table. When Azure replies, the router matches the inbound packet against that entry and reverses the translation, forwarding the packet to 192.168.1.106.

The S2S tunnel is initiated from Azure —
in this scenario, the Azure VPN Gateway is the initiator of the tunnel. This will fail to even establish a tunnel unless a port-forwarding is configured on the Wi-Fi router. If you are curious, here are the steps to have Azure initiate the tunnel.
1. Set auto=add in ipsec.conf — tells StrongSwan that it should not initiate tunnel and just be a responder, StrongSwan won’t initiate the tunnel on startup and won’t re-initiate if it drops.

2. Set Connection Mode to _InitiatorOnly_ in Azure Local Network Gateway » Connection » [Your Connection] » Configuration blade properties.

3. Enable port forwarding in your Wi-Fi router with these values
_1. InternalIP:192.168.1.106 — InternalPort:500 — Protocol:UDP — ExternalPort:500 2. InternalIP:192.168.1.106 — InternalPort:4500 — Protocol:UDP — ExternalPort:4500_
Port 500 is used for the initial IKE handshake. Port 4500 is used for NAT Traversal (NAT-T) — once both sides detect a NAT device on the path, all subsequent IKE and ESP traffic moves to UDP:4500.

These rules tell the Wi-Fi router to forward any inbound UDP:500 and UDP:4500 traffic arriving on the WAN interface to _192.168.1.106_, regardless of the source. Without these rules, the router has no NAT mapping for unsolicited inbound IKE packets and drops them.

IMPORTANT — Who initiates the tunnel has no bearing on data packet routing between the sites as long as the tunnel is UP. Once the IPsec SA is established, the tunnel is a symmetric pipe — packets flow freely in both directions regardless of which side initiated IKE. Once the tunnel is up, the success and failure points are identical in both cases.

Bring the tunnel up

With both sides configured, bring the tunnel up from the StrongSwan host. Follow the below steps to bring the tunnel up.

# Check current state
sysctl net.ipv4.ip_forward

# Enable if 0
sudo sysctl -w net.ipv4.ip_forward=1

ip_forward must be enabled for the StrongSwan host to forward packets between interfaces — we will cover exactly why in Part 6.

#Start strongswan - if is just installed or not running else run ipsec restart and skip the second step
sudo ipsec start

#Bring connection up
sudo ipsec up azure-s2s-manual

#Verify
sudo ipsec status

ipsec status output should show the tunnel as ESTABLISHED

Security Associations (1 up, 0 connecting):
azure-s2s-manual[1]: ESTABLISHED 13 minutes ago, 192.168.1.106[192.168.1.106]…20.219.67.227[20.219.67.227]
azure-s2s-manual{1}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c0a6aaf5_i 90cbbb64_o
azure-s2s-manual{1}: 192.168.1.0/24 192.168.122.0/24 === 10.66.5.0/24

Summary

Now that the tunnel is established, it’s worth mapping out the traffic flows it enables — and one it doesn’t, yet. There are 3 scenarios of bi-directional traffic flow in our setup between —

Azure Virtual Machines and On-premises Virtual Machines
Azure Virtual Machines and On-premises VPN Gateway
On-premises VPN Gateway and the On-premises Virtual Machines

Except the packet flow from Azure virtual machine destined for on-premises KVM virtual machines (which sit behind StrongSwan on the libvirt network), all of the above will work without any additional configuration.

In the next part of the series we will discuss why they work and why one of them doesn’t.

API Specification and Policy Updates in Azure APIM Are Zero Downtime

Fri, 05 Jun 2026 00:00:00 +0000

Does APIM support zero downtime deployment? — To answer this question, multiple factors need to be ascertained, like, What is the SKU? Have you opted for Availability zones? etc. In fact, the question needs to be qualified further. What do you mean by zero downtime deployment?

In the case of APIM, there are infrastructure changes and then there are gateway configuration changes like API specifications and policies. So, the answer depends on — SKU, AZ, “what” kind of changes

From the official documentation:

“When you change availability zone configuration, the changes can take 15 to 45 minutes or more to apply. The API Management gateway can continue to handle API requests during this time.”

Gateway configuration, such as APIs and policy definitions, regularly synchronizes between the availability zones that you select for the instance. Propagation of updates between the availability zones normally takes less than 10 seconds.

Active requests: When an availability zone is unavailable, any requests in progress that are connected to an API Management unit in the faulty availability zone are terminated and need to be retried.

Automatic: You can expect instances that use automatic availability zone support to have no downtime during an availability zone outage. Units in the unaffected zone or zones continue to work.

“You can also expect instances that use automatic availability zone support, but have a single unit, to have no downtime.” In this case, API Management distributes the unit’s underlying compute resources to two zones. The resource in the unaffected zone continues to work.

Zone-redundant: You can expect zone-redundant instances to have no downtime during an availability zone outage.

My personal view based on this is —API specifications and Policy updates won’t cause any non-recoverable failures to the consumers; provided retry strategy is in place.

Is it zero downtime? Zero downtime need not mean every request succeeds on the first attempt. If the system remains available and failures are recoverable, it meets the zero-downtime requirement. So — Yes.

Confirmation from Microsoft Question and Answer Forum To validate my understanding, I reached out to the MS Q&A forum and got a response consistent to the above understanding.

Here is the link to the question in the forum that has the official response.

Bottom line — API specification and Policy updates are zero downtime.

Choosing the Right TokenCredential and How AZURE CLIENT ID Influences Identity Selection — A…

Wed, 20 May 2026 00:00:00 +0000

Photo by Matt Halls on Unsplash

Introduction

I have been using the DefaultAzureCredential class for a long time without understanding how it works. So, I jotted down my notes and learnings in this write-up for future me — and maybe you will find it useful too.

TokenCredential

TokenCredential is the abstract base class representing a source of authentication tokens for Azure services. Many classes derive from TokenCredential but the most interesting ones are DefaultAzureCredential and ChainedTokenCredential.

I’m using package version :Azure.Identity v1.20.0

DefaultAzureCredential

This class is a pre-built chain covering the most common authentication methods. When using DefaultAzureCredential to acquire a token, the class attempts to acquire a token via each of the below credentials, in the following order, stopping when one provides a token:

EnvironmentCredential
WorkloadIdentityCredential
ManagedIdentityCredential
VisualStudioCredential
VisualStudioCodeCredential (enabled by default for SSO with VS Code on supported platforms when Azure.Identity.Broker is installed)
AzureCliCredential
AzurePowerShellCredential
AzureDeveloperCliCredential
InteractiveBrowserCredential (not included by default; can use brokered authentication if Azure.Identity.Broker is installed)

source: DefaultAzureCredential Class (Azure.Identity) — Azure for .NET Developers | Microsoft Learn

DefaultAzureCredential with Exclusion

DefaultAzureCredential also supports options that allow you to exclude credentials from evaluation. This is useful if you don’t want to use certain credentials, for example, when running my function locally, I don’t want to use the VisualStudio or VisualStudioCode credential as I prefer AzureCliCredential.

new DefaultAzureCredential(
new DefaultAzureCredentialOptions
{
ExcludeVisualStudioCodeCredential = true,
ExcludeVisualStudioCredential = true
});

ChainedTokenCredential

In some cases, you will know exactly which credentials you want to use. ChainedTokenCredential is very useful in such cases. It evaluates only the credentials you explicitly specify, in the order provided. I use this locally. For example, below I choose to use only CLI and VS credentials when my function is running locally.

new ChainedTokenCredential(
new AzureCliCredential(),
new VisualStudioCredential()
));

Why Managed Identity fails locally but works in Azure

DefaultAzureCredential attempts ManagedIdentityCredential (which is unavailable locally — IMDS timeout) and falls through to developer credentials — VS, CLI, etc (refer the table above). The first credential in the chain that can successfully acquire a token is used.

Note: DefaultAzureCredential evaluates
EnvironmentCredential, then WorkloadIdentityCredential,
followed by ManagedIdentityCredential.

There is no native way to emulate or impersonate a managed identity locally. IMDS (169.254.169.254) is a hypervisor-level endpoint that only exists on Azure compute. It is physically not present on your laptop. The common alternative would be to use a service principal with similar privileges as the UAMI to test your function.

In Azure: The chain gets to ManagedIdentityCredential, IMDS responds, token acquired. Everything below it never runs.

Locally: IMDS doesn’t exist, so ManagedIdentityCredential times out and falls through.

When DefaultAzureCredential is used, the evaluation would like this (assuming none of the credentials are able to provide a token) —

//When running locally there is no IMDS to supply managed identity token
//Assuming VS and other credentials don’t have access to the resource.
//This is how DefaultAzureCredentials evaluates the chain
EnvironmentCredential → skipped (env vars not set)
WorkloadIdentityCredential → skipped (not configured)
ManagedIdentityCredential → unavailable/failure (no IMDS endpoint locally)
VisualStudioCredential → failed
VisualStudioCodeCredential → failed
AzureCliCredential → failed
AzurePowerShellCredential → failed
AzureDeveloperCliCredential → failed
InteractiveBrowserCredential → Not included by default (must be explicitly enabled)

How to configure for local debugging

One approach is to use a factory that returns different credential implementations depending on the execution environment.

For example, when the environment is local, a factory can return a DefaultAzureCredential where you can exclude Visual Studio and Visual Studio Code credentials if you favour AzureCliCredential. Or, better still, if you want to use only Azure CLI or VS credentials, it can return a ChainedTokenCredential with just those two credentials, as shown below

new ChainedTokenCredential(
new AzureCliCredential(),
new VisualStudioCredential()
)

When the environment is Azure, it can just return a DefaultAzureCredential instance or a ChainedTokenCredential as discussed earlier if you are sure about the credential you want to use. If you want to use a specific credential, it can be used directly without DefaultAzureCredential or ChainedTokenCredential. For example, here I’m using a specific credential class —

new ManagedIdentityCredential(
ManagedIdentityId.FromUserAssignedClientId(«your-uami-ClientId»)))

A sample flow will look as below when you use a ChainedTokenCredential as shown previously —

AzureCliCredential → acquire token - SUCCESS
VisualStudioCredential → skipped

Notice that only the two credentials mentioned in the ChainedTokenCredential chain are evaluated.

AZURE_CLIENT_ID influence in identity selection

Many Azure resources can have both System Assigned Managed Identity (SAMI) and User Assigned Managed Identity (UAMI). It is crucial to understand how the AZURE_CLIENT_ID environment variable influences how Azure SDK authentication selects a managed identity. This is not always obvious, and I could not find it clearly documented anywhere

AZURE_CLIENT_ID set?
│
├── YES
│ │
│ └── Which credential in code?
│ │
│ ├── DefaultAzureCredential()
│ │ └── ✅ UAMI (via AZURE_CLIENT_ID)
│ │
│ ├── ManagedIdentityCredential(id)
│ │ └── ✅ UAMI (via explicit id, ignores AZURE_CLIENT_ID)
│ │
│ └── ManagedIdentityCredential()
│ └── ⚠️ SAMI (ignores AZURE_CLIENT_ID)
│
└── NO
│
└── Which credential in code?
│
├── DefaultAzureCredential()
│ └── ⚠️ SAMI
│
├── ManagedIdentityCredential(id)
│ └── ✅ UAMI (via explicit id)
│
└── ManagedIdentityCredential()
└── ⚠️ SAMI (no id provided)

Key rule: ManagedIdentityCredential() does not use
AZURE_CLIENT_ID to select a user-assigned managed identity.
Only DefaultAzureCredential does

Note: the flowchart assumes a System Assigned Managed Identity is present. In cases where SAMI is absent and no UAMI is explicitly provided, token acquisition will fail

Authentication between a Function App and its AzureWebJobsStorage is an independent flow not covered by the flowchart above. See [Using Managed Identity for Function App Authentication with its Storage Account] for a detailed walkthrough.

Summary

DefaultAzureCredential is environment-aware by design — the same code uses managed identity in Azure and falls through to developer credentials locally. This means local failures don’t always predict Azure failures, and the identity that succeeds locally may be in a different tenant than your Azure resources. For local testing against tenant-specific resources, ensure az login --tenant is used explicitly, not just any az login. To test managed identity behaviour you must deploy, or substitute a service principal with matching roles via EnvironmentCredential.

Reference — Authentication best practices with the Azure Identity library for .NET — .NET | Microsoft Learn

Using Managed Identity for Function App Authentication with its Storage account

Tue, 19 May 2026 00:00:00 +0000

Recently, while setting up a Function App to use User Assigned Managed Identity (UAMI) to authenticate to its AzureWebJobsStorage I encountered SyncTriggerfailure.

I checked whether the UAMI had necessary RBAC roles to work on AzureWebJobsStorage — it had. So, I wasn’t sure what the issue was.

Analyzing further, I realized I had skipped a few mandatory variable settings to enable UAMI based authentication to AzureWebJobsStorage (setting the environment variable AzureWebJobsStorage__accountName alone does not suffice)

Steps to enable UAMI access to AzureWebJobsStorage

Enabling UAMI access to AzureWebJobStorage involves changes in Terraform (when the Function App is created), the App Settings (Environment variables) and finally the Role Based Access.

Terraform

If for some reason you want to use UAMI to authenticate with AzureWebJobsStorage, then Terraform block **functionAppConfig.deployment.storage.authentication**: should look like below

Note: I am using Flex Consumption tier

authentication = {
type = “userassignedidentity”
userAssignedIdentityResourceId = “”
}

This tells the platform to use UAMI for the deployment package blob container — the part that isn’t controlled by app settings.

App settings

Once the Function App is deployed with usermanagedidentity as authentication type (terraform), ensure the below variables are set in the Function App’s Environment variables

AzureWebJobsStorage__accountName =
AzureWebJobsStorage__credential = managedidentity
AzureWebJobsStorage__clientId =

All three settings are mandatory.

RBAC

This is the final bit. We have the Function App deployed, environment variables set, next, the UAMI needs privilege to access the storage account.

Provide Storage Blob Data Owner owner role to the UAMI on the storage account

With these three changes, your Function App will authenticate with its AzureWebJobsStorage using UAMI.

Caveat: Although this works, the issue with this approach is all services that are assigned this UAMI will gain access to the function’s storage account. This is not ideal if many services share the same UAMI. The better option will be to use System Assigned Managed Identity (SAMI) for authentication between Function App and its storage account. For the rest of the outbound calls that the functions might make, use UAMI.

Using System Assigned Managed Identity

To use SAMI just setAzureWebJobsStorage__accountName — SAMI is the default, no additional settings needed. Next, give SAMI Storage Blob Data Owner on the storage account. If you are using Terraform to deploy the authentication block of the Function App will look like this —

authentication = {
type = “systemassignedidentity”
}

SAMI is my preferred method for authentication with the AzureWebJobsStorage for the reasons already discussed in the caveat section.

Summary

Configuring a Function App to authenticate with its AzureWebJobsStorage using managed identity requires changes at three levels — Terraform, app settings, and RBAC — and all three must be consistent with each other. For UAMI, all three AzureWebJobsStorage__* settings are mandatory; omitting any one of them will cause the runtime to fail. However, personally I feel UAMI for AzureWebJobsStorage is rarely the right choice — since UAMI is a shared identity, every service assigned to it inherits access to the storage account. SAMI, which requires only AzureWebJobsStorage__accountName and a single role assignment, is the simpler and safer default for this use case.

Reference — Use User managed identity to replace connection string in”AzureWebJobsStorage” for function apps | Microsoft Community Hub

HandsOn — Building Hybrid Cloud Environment — Part 4— Identity — Domain-Joining a Linux VM and…

Sat, 02 May 2026 00:00:00 +0000

In the previous parts, we created a primary and secondary domain controller and tested the domain join from Windows client VM. In this part, we will domain-join a Linux VM to the domain controllers we created. The main purpose is to introduce a non-Windows system into the domain to test Kerberos authentication against Active Directory. We will —

Provision a new Linux VM
Assign the DC IP
Install Linux Kerberos client tool
Join the domain
Validation

The fundamental domain join mechanics are the same for Windows and Linux. The underlying authentication protocol (Kerberos) is identical for both — the difference is integration depth. Windows has native AD support built in, whereas Linux requires explicit configuration via tools like realmd, SSSD, and the Kerberos client utilities. So, let’s get started with a new VM.

Provision a new Ubuntu VM

We will create a VM based on Ubuntu 22.04 LTS for our virtual network. You need to download the Ubuntu iso and create a VM using virt-manager following the regular VM creation process. Once the VM is up and running, let’s start with some connectivity checks.

Connectivity Checks:

Ping DCs:

ping 192.168.122.10
ping 192.168.122.11

This will work because the Linux VM will be created in the same network range as the Windows client or the DCs. If this is not the case, ensure you map the VM to the relevant network using the virt-manager (This can happen if you have multiple virtual networks running in your system)

Add the DC IP to the resolv.conf

This step is like what we do for a Windows client, just that we do it a bit differently. The Linux VM should use the Domain Controller’s IP as its DNS server. The fresh Linux VM will have the DNS pointing to itself at 127.0.0.53

If you remember, we used the GUI to change the preferred DNS for the VM in Windows. In Linux, the DNS server details reside in /etc/systemd/resolved.conf

On modern Ubuntu systems, the file /etc/resolv.conf is not meant to be edited directly because it is automatically generated and managed by the systemd-resolved service. Any manual changes you make will be overwritten. Instead, you should configure DNS settings in the source of truth, typically /etc/systemd/resolved.conf (or via Netplan/NetworkManager depending on your setup), and then restart the service.

Note — Even if /etc/resolv.conf appears stable after manual edits, it is still managed by the system in most modern Ubuntu setups and may be overwritten on reboot or network changes. Always configure DNS through systemd-resolved or Netplan for reliability

Edit the resolved.conf to set the DC’s IP as the DNS server for the Linux VM

sudo nano /etc/systemd/resolved.conf

[Resolve]
# Some examples of DNS servers which may be used for DNS= and FallbackDNS=:
# Cloudflare: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4>
# Google: 8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.go>
# Quad9: 9.9.9.9#dns.quad9.net 149.112.112.112#dns.quad9.net 2620:fe::fe#d>
DNS=192.168.122.10 192.168.122.11
#FallbackDNS=
Domains=hybrid.local
#DNSSEC=no
#DNSOverTLS=no
#MulticastDNS=no
#LLMNR=no
#Cache=no-negative
#CacheFromLocalhost=no
#DNSStubListener=yes
#DNSStubListenerExtra=
#ReadEtcHosts=yes
#ResolveUnicastSingleLabel=no
#StaleRetentionSec=0

Restart and check that value has persisted.

sudo systemctl restart systemd-resolved

Verify DNS resolution:

Now that the DNS has been updated try nslookup hybrid.local The expected output is

Server: 127.0.0.53
Address: 127.0.0.53#53

Name: hybrid.local
Address: 192.168.122.10
Name: hybrid.local
Address: 192.168.122.11

The server shown as 127.0.0.53 is the systemd-resolved stub — this is expected, as systemd-resolved intercepts all DNS queries locally before forwarding them upstream. Queries are forwarded to the configured upstream DNS servers (your DCs). The returned addresses confirm that DC DNS is now authoritative for hybrid.local

Install Kerberos Client Tools

From this step onwards, the process of domain join is different from Windows. While the Windows client has necessary Kerberos modules, Linux client should be enabled to communicate with Active Directory using Kerberos. To do that install the Kerberos client tool, [krb5-user](https://web.mit.edu/kerberos/krb5-1.4/krb5-1.4/doc/krb5-user.html)

What does krb5-user really do? It installs client end tools that enable communication with a server using Kerberos. In Kerberos, the password is never sent over the wire. Instead, it is converted into a cryptographic key, which is then used to encrypt a timestamp during pre-authentication, which will subsequently be decrypted by the server. The validation is the ability of the server to “decrypt” the client request using its version of the stored key — and this is precisely why clock skew breaks Kerberos. If the timestamp is outside the allowed window, the DC rejects it regardless of whether decryption succeeded.

Notes:

When the DC was promoted, the admin password you provided was hashed using DC’s preferred ‘method(s)’ (etype) and stored in ntds.dit (e.g. AES256 key stored against account)

When a Linux VM is created the krb5.conf file defines supported etypes

When the Linux VM wants to authenticate with the Windows AD, you initiate the Kerberos flow by running kinit

Client VM’s kinit sends → AS-REQ to DC saying, “I am administrator, The Client VM supports AES256, AES128, RC4”

DC responds saying “I need pre-auth, and for this account I use AES256”

Client then derives a key from the password (using the required encryption type, e.g., AES256) → uses this key to encrypt timestamp in Client VM. This is sent as AS-REQ (with pre-auth): username + AES256-encrypted timestamp

The DC decrypts this AS-REQ with the stored AES256 key against this user, validates the timestamp is within the allowed clock skew window (default 5 minutes) → issues TGT

The default TTL of a Kerberos TGT is 10 hours

Let’s proceed with the setup.

sudo apt update
sudo apt install krb5-user -y

Prompt: Enter default realm → HYBRID.LOCAL (uppercase).

In the next prompt provide the FQDN on your DCs separated by space

And finally when asked for primary DC, provide your primary DC’s FQDN

Run dpkg -l | grep krb5-user It should list krb5-user as installed.

Configure /etc/krb5.conf

We have installed the necessary client tools to enable Kerberos based exchange from the Linux VM. Next, the Linux VM’s Kerberos client must point to the Key Distribution Centers in the Active Directory. Update sudo nano /etc/krb5.conf as described below

[libdefaults]
default_realm = HYBRID.LOCAL
dns_lookup_realm = false
dns_lookup_kdc = true
[realms]
HYBRID.LOCAL = {
kdc = 192.168.122.10
kdc = 192.168.122.11
admin_server = 192.168.122.10
}
[domain_realm]
.hybrid.local = HYBRID.LOCAL
hybrid.local = HYBRID.LOCAL

Confirm your changes have persisted — cat /etc/krb5.conf

we will validate if AD is issuing Kerberos tickets to us by requesting a Ticket Granting Ticket from AD KDC. Run kinit Administrator@HYBRID.LOCAL and enter Administrator password.

Next run klist

You should be seeing a ticket as below:

Default principal: Administrator@HYBRID.LOCAL
Valid starting Expires Service principal
03/06/26 19:11:11 03/07/26 05:11:11 krbtgt/HYBRID.LOCAL@HYBRID.LOCAL

If you are curious about the attributes of the ticket, run klist -f # shows flags like forwardable, renewable or klist -e # shows encryption types

Join Linux to Domain

We have the foundation required for a domain join. Now, we will install the required packages for the domain join

sudo apt install realmd sssd sssd-tools adcli samba-common-bin oddjob oddjob-mkhomedir -y

realmd — discovers which domains or realms it can use or configure. It can discover and identify Active Directory domains by looking up the appropriate DNS SRV records.

sssd — System Security Services Daemon. After the join, this is what runs continuously to handle authentication requests — it talks to the DC for login, group membership, sudo rules etc. The long-running engine.

sssd-tools — CLI utilities for sssd (sssctl, sss_override etc.) — useful for cache flushing and diagnostics.

adcli — Active Directory CLI. realmd uses this under the hood to perform the low-level AD join operations (creating the computer object in AD, setting up the machine account).

samba-common-bin — provides tools like net and wbinfo that realmd/sssd lean on for certain AD operations.

oddjob — a D-Bus service that runs privileged helper tasks on behalf of other services. sssd uses it to do things it can’t do as its own user.

oddjob-mkhomedir — the specific oddjob helper that automatically creates a home directory the first time a domain user logs into the Linux machine. Without this, a domain user authenticates successfully but lands with no home directory.

realmd + adcli → join-time (one-off operation)
sssd + sssd-tools → runtime (ongoing authentication)
oddjob + mkhomedir → login-time helper (home dir creation)
samba-common-bin → shared plumbing both layers use

Verify the configuration and connectivity to the domain controller

sudo realm discover hybrid.local

hybrid.local
type: kerberos
realm-name: HYBRID.LOCAL
domain-name: hybrid.local
configured: no
server-software: active-directory
client-software: sssd
required-package: sssd-tools
required-package: sssd
required-package: libnss-sss
required-package: libpam-sss
required-package: adcli
required-package: samba-common-bin

The above proves network connectivity to the DC, correct DNS resolution, and that AD is responding to discovery queries.

Join the domain as Administrator

sudo realm join --user=Administrator hybrid.local

Post domain join, verify the configuration and connectivity to the domain controller

sudo realm list

hybrid.local
type: kerberos
realm-name: HYBRID.LOCAL
domain-name: hybrid.local
configured: kerberos-member
server-software: active-directory
client-software: sssd
required-package: sssd-tools
required-package: sssd
required-package: libnss-sss
required-package: libpam-sss
required-package: adcli
required-package: samba-common-bin
login-formats: %U@hybrid.local
login-policy: allow-realm-logins

Run, id testuser1@HYBRID.LOCALyou will see that the testuser1 is looked up from the Domain Controller by the Linux Client.

Enable automatic home directory creation:

sudo pam-auth-update
# Enable “Create home directory on login”

Enable “Create home directory on login”

Verification

As a final test, you should be able to successfully login using one of the test users you had created and used for the Windows client (testuser1@HYBRID.LOCAL). Notice that the home directory for testuser1 is created.

Also, login to the DC and see a new Linux VM getting added there under hybrid.local > Computers

Summary

This concludes the part of the series where we established an on-premises identity foundation. In this series, so far, we have established

A functioning Active Directory forest (hybrid.local) with two domain controllers
Multi-master replication verified across both DCs — SYSVOL, NETLOGON, and directory objects
FSMO roles identified and accounted for
A Windows client and a Linux VM both domain-joined and authenticated via Kerberos
DNS working end-to-end: internal resolution via the DC, external resolution via the forwarder

Up next, a S2S VPN tunnel with Azure which would complete the hybrid connectivity foundation

HandsOn — Building Hybrid Cloud Environment — Part 3— Identity — Additional DC and Replication

Fri, 24 Apr 2026 00:00:00 +0000

Previously, we created a domain controller (DC), joined a test virtual machine to the newly created domain and verified the authentication of a test user from client VM. In this part, we will build redundancy into our environment by introducing a second domain controller.

Active Directory (AD) is designed for multi-master replication, meaning multiple domain controllers hold a copy of the directory database.

Adding a second DC provides:

High Availability — authentication continues if one DC fails
Load Distribution — clients can authenticate against different DCs
Replication Redundancy — AD database changes replicate automatically

In this part, we will:

Provision a secondary domain controller
Assign a static IP
Join the domain
Install AD DS and promote the server
Verify replication and health

Let’s get started and add the second domain controller.

Provision Secondary domain controller

For the second DC, create another Windows Server 2022 VM using the same process outlined in the first part of this series.

Ensure that the newly created VM is attached to the same libvirt virtual network so it can reach the primary DC

Run ipconfig /all to confirm the network range, and gateway IP are the same as the primary DC. If you are following along then the gateway should be 192.168.122.1

Assign Static IP

Running ipconfig /all, you will notice a preferred IPv4 address for this VM. It is 192.168.122.145in my case. This IP was handed out by DHCP (dnsmasq) when the VM was created and attached to the virtual network. It can change the next time you restart the virtual network, and the VMs that have joined the domain will not be able to reach the DC. To avoid this, we will assign a static IP to this VM.

Open Server Manager

Click Local Server
Before setting the static IP, let us rename the computer to HCE-DC02and restart.
After restart, go to Local Server and Click the link next to Ethernet
A pop-up Right-click your Ethernet adapter → Properties
Double-click Internet Protocol Version 4 (TCP/IPv4)
Select Use the following IP address option
Enter the below values:

IP address: 192.168.122.11 - The static IP we have chosen for secondary DC
Subnet mask: 255.255.255.0
Default gateway: 192.168.122.1 - Bridge’s IP

8. Select Use the following DNS server addresses

Note: Before executing this step, do a small test. Run nslookup and you will notice DNS query is sent to the gateway _192.168.122.1_. dnsmasq, which runs on the gateway, cannot answer query about hybrid.local and forwards it up the chain — to your WiFi router, then to your ISP’s DNS. None of them have ever heard of _hybrid.local_ because it is not a public domain — it exists only inside the primary DC’s DNS. The query times out somewhere in that chain and returns nothing useful

Now, the secondary DC needs to rely on primary DC for any name resolution for domains managed by primary DC. In our case, hybrid.local is visible only in the context of primary DC and for secondary DC to reach other virtual machines in hybrid.local domain, it must consult primary DC’s DNS. That’s why you must set the Preferred DNS server as 192.168.122.10

Set Preferred DNS server: 192.168.122.10

Once the above steps are completed, check if the static IP is updated by running ipconfig

Note around VM rename — Rename the server before promotion — renaming a DC after the fact is not recommended.

Now, we can verify domain controller discovery. Run nslookup in interactive mode

nslookup
set type=SRV
_ldap._tcp.dc._msdcs.hybrid.local

The SRV record should return the hostname of DC1.

Join the Domain

At this point, the VM is able to resolve the primary DC, and it has a static IP assigned. We just need one more step before promoting this VM as secondary DC. It must first be a domain member server.

Unlike the primary DC which created the domain during promotion, the secondary DC is joining a domain that already exists. It needs a computer object, and a secure channel established before the promotion wizard can authenticate against the existing domain and begin replication.

Run PowerShell as Administrator:

Add-Computer -DomainName hybrid.local -Credential HYBRID\Administrator -Restart

After reboot, log in as hybrid\Administrator

To verify domain membership, run whoamiThe current domain\user should be displayed hybrid\administrator

Install Active Directory Domain Services Role

Now that the VM (it’s not a DC yet) has joined the domain, the next step is to make it a domain controller.

Open Server Manager > Click Manage (top right) > Click Add Roles and Features > Click Next until you reach Server Roles > Check: Active Directory Domain Services> When prompted: Click Add Features > Click Next until Install & Click Install

Promote domain controller

After installation completes, click the notification flag. Select: Promote this server to a domain controller and follow the wizard. The server will reboot after installation automatically.

Note on deployment configuration —

While installing pay attention to these attributes

when the wizard prompts for a forest, Select: Add a domain controller to an existing domainProvide domain name as hybrid.local
Domain controller options —
a. check Domain Name System (DNS) serverand Global Catalog
b. uncheck Read only domain controller (RODC)
c. Set a Directory Services Restore Mode (DSRM) password
Ignore the DNS delegation warning.
In Additional Options choose Replicate from:to HCE-DC01.hybrid.local

Verify Both Domain Controllers Exist

The secondary domain controller is set up now. Run the following validation to confirm the promotion succeeded.

On both the DCs, open Active Directory Users and ComputersNavigate to Domain Controller ,You should now see HCE-DC01and HCE-DC02

This confirms both DCs are part of the domain. We have successfully set up High availability for the domain controllers.

Verify Active Directory Replication

Active Directory replicates directory changes between the two controllers automatically. You can check that by running repadmin /replsummary

Example output:

Beginning data collection for replication summary, this may take a while:
…..

Source DSA largest delta fails/total %% error
HCE-DC01 14m:44s 0 / 5 0
HCE-DC02 02m:40s 0 / 5 0

Destination DSA largest delta fails/total %% error
HCE-DC01 02m:40s 0 / 5 0
HCE-DC02 14m:44s 0 / 5 0

Interpretation:

largest delta → how long since last replication
fails/total → replication failures

A healthy environment shows: 0 failures This proves multi-master replication is working.

After promotion and replication stabilizes, update DNS settings so each DC points to itself as primary and the other DC as secondary.

Verify SYSVOL Replication

Group Policies live in the SYSVOL folder and must replicate between DCs. Check the share exists by running net shareIt is a quick sanity check.

SYSVOL shares are only published by Windows when the DC considers itself healthy and SYSVOL replication is complete. Their presence confirms that DFS-R has done its job and this DC is ready to serve Group Policy to domain members.

Look for SYSVOL, if they are present, the replication of GPO is working fine.

Identify FSMO Role Holders

Even though AD supports multi-master writes, some operations must be handled by a single role owner to avoid conflicts. These are the FSMO roles (Flexible Single Master Operations).

Check which server holds them netdom query fsmo

Example output:

Schema master HCE-DC01
Domain naming master HCE-DC01
PDC HCE-DC01
RID pool manager HCE-DC01
Infrastructure master HCE-DC01

In small environments like this, all roles may remain on the first DC.

Note: multi-master covers most directory writes; FSMO roles are for the specific operations where a single authority is required to avoid conflicts.

Why Multiple Domain Controllers Matter

With two DCs:

Both hold a replicated copy of the Active Directory database
Clients discover them through DNS SRV records
Clients choose a DC based on site proximity and priority
If one DC is offline, clients automatically fail over

Authentication flow now becomes:

Client
↓
DNS SRV query
↓
List of Domain Controllers
↓
Client selects reachable DC
↓
Kerberos authentication

This is why clients never hardcode a Domain Controller IP. DNS provides the dynamic discovery layer.

Summary

In this part, we introduced a second domain controller and validated replication, establishing high availability for Active Directory.

Current lab state:

HCE-DC01 → First domain controller
HCE-DC02 → Additional domain controller
Client VM → Domain joined

This completes the Active Directory redundancy layer making it resilient.

In the next part, we will integrate a Linux VM and validate Kerberos-based authentication, extending identity beyond Windows systems.

HandsOn — Building Hybrid Cloud Environment — Part 2— Identity — On-Premises Domain Controller

Sat, 18 Apr 2026 00:00:00 +0000

In the first part, we laid the foundation for the hybrid cloud environment. Now we have a virtual network with VM running Windows Server 2022 Evaluation. In this part, we will focus on adding the Identity plane to the hybrid cloud environment by introducing a domain controller and creating an Active Directory structure. We will create a client VM, domain join it and make sure a domain user is able to login

We will be following the below sequence.

Promote the Virtual Machine hce-dc01 , created in part 1 as primary domain controller and create domain, forest, and OU
Create user accounts
Domain join a Windows client

Primary Domain Controller Configuration

The official Microsoft documentation defines a domain controller as —
“A domain controller is a server that is running a version of the Windows Server® operating system and has Active Directory® Domain Services installed.”

For a simple hybrid cloud environment, a domain controller is not mandatory. So, why do we need this? Some hybrid scenarios depend heavily on the on-premises having an identity plane. Example, AD Connect. To introduce the identity plane in our virtual network, we need a server to manage the domain, forest, OU, users, policies, user authentication, and policy enforcement. This will be our domain controller, and the VM we created in part 1 will be used for this purpose.

Before starting this configuration we will run ip addr and take note down the IP range of the virtual bridge and Wi-Fi router. The output of the above command will be a list of all the interfaces running in your box with their IP ranges. Notice the virtual bridge, virbr0 (created when we set up the virtual network, refer to part 1) has a network range of 192.168.122.1/24, in my case

virbr0: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:d9:a6:2b brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever

Look for your Wi-Fi router range in the output. wlp58s0 is my Wi-Fi network and it has the range of 192.168.1.1/24

wlp58s0: mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 04:ed:33:e3:9d:f1 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.106/24 brd 192.168.1.255 scope global dynamic noprefixroute wlp58s0
valid_lft 84447sec preferred_lft 84447sec
inet6 fe80::d291:8b3e:4e0:cc87/64 scope link noprefixroute
valid_lft forever preferred_lft forever

virbr0 gateway is 192.168.122.1
wifi router gateway is 192.168.1.1

I will pick an IP from the virbr0 range and assign it to the newly created VM, which is going to be our primary domain controller.

Static IP for Domain Controller

The reason we need a static IP for the domain controller is to ensure that the domain controller is reachable even if the virtual network restarts. dnsmasq — we briefly touched on this in part 1, is responsible for assigning IPs to the virtual machines. When the virtual network restarts, it will act as the DHCP and start handing out IPs to all the VMs connected to virbr0. If the domain controller gets a different IP when the virtual network restarts, it will break the domain join for all the VMs that were part of the domain controller. To avoid this, we will assign static IP for the domain controllers.

Now that we know the IP range of virtual network, I will pick an IP, say 192.168.122.10 as my primary domain controller’s IP.

Before setting the static IP, rename the computer to HCE-DC01 if you haven’t done it already and restart.

Open Server Manager

Click **Local Server** -> Click the link next to **Ethernet**
A pop-up Right-click your Ethernet adapter → **Properties**
Double-click Internet Protocol Version 4 (TCP/IPv4)
Select **Use the following IP address** option
Enter the below values:

IP address: 192.168.122.10 - The static IP we chose
Subnet mask: 255.255.255.0
Default gateway: 192.168.122.1 - Bridge’s IP

8. Select **Use the following DNS server addresses**

9. Set Preferred DNS server: 192.168.122.10

10. Click OK → Close all windows

Check if the static IP is updated by running ipconfig

When you set up the first Domain Controller, it must use itself as a DNS. Once the secondary domain controller is up, they should ideally cross reference each other

Note of VM rename — Rename the server before promotion — renaming a domain controller after the fact is painful.

Install Active Directory Domain Services Role

For a server to perform the role of a domain controller, it needs certain capabilities. These capabilities include a storage to persist the objects (forest, users, computers, groups, and policies), authentication layer, and policy enforcement mechanisms. This is not a exhaustive list of capabilities. I have listed only those relevant to our hybrid environment right now.

Active Directory Domain Services is a feature that you install on your Windows Server to make it a domain controller. Here is the official definition of AD DS — “A directory is a hierarchical structure that stores information about objects on a network. A directory service, such as Active Directory Domain Services (AD DS), provides methods for storing directory data and making this data available to network users and administrators. For example, AD DS stores information about user accounts, such as names, passwords, phone numbers, and so on. AD DS also provides a way for authorized users on the same network to access this information.”

Rough steps to install AD DS

Open Server Manager > Click Manage (top right) > Click Add Roles and Features > Click Next until you reach Server Roles > Check: Active Directory Domain Services> When prompted: Click Add Features > Click Next until Install & Click Install

I’m not going into the details of the installation as many resources document the process in detail.

With this, the virtual machine has a necessary feature to perform the role of a domain controller.

Promote This Server to a Domain Controller

Installing the AD DS role and promoting the server are two distinct steps, and it’s easy to miss why. Here is the distinction that matters. What makes a server a domain controller is not what’s installed — it’s whether a valid, initialised NTDS.dit exists, the NTDS service is running against it, and the network knows where to find it via SRV records. Promotion is the act of going from capable to instantiated.

Open Server Manager > You should see a yellow triangle notification at top right. Click it. > Click: Promote this server to a domain controller

Note on deployment configuration

While installing, pay attention to these attributes

when the wizard prompts for a forest, Select: Add a new forest Root domain name can be given as hybrid.local
Domain controller options —
a. Forest functional level: leave default
b. Domain functional level: leave default
c. DNS Server should already be checked
d. Global Catalog should be checked
e. Do NOT check Read-Only DC
f. Set a Directory Services Restore Mode (DSRM) password (Write this down somewhere safe.)
Ignore the DNS delegation warning.
When creating a new Active Directory forest, the setup process also asks for a NetBIOS name for the domain. NetBIOS is a legacy naming system that predates modern DNS-based Active Directory environments and is widely used in older Windows networks for computer and resource identification. It will auto-fill: HYBRID, Leave it.

Again, I’m not providing detailed steps on every screen of the wizard; this process is well documented. Apart from the attributes I have mentioned above, rest can be left with the default value. If you see any warning when checking the pre-requisites, you can ignore them. They will not have any effect on the environment we are building. We will revisit in future if needed.

⚠ After the installation, the server will reboot automatically.

After the reboot you will be able to login as Administrator to the new domain

Now, we the domain controller up and ready. As a first test, try nslookup google.com. You will see that the domain controller failed to resolve this query. Let’s fix this next.

Adding a DNS Forwarder

After installing Active Directory Domain Services, the Domain Controller also becomes the authoritative DNS server for the new domain (hybrid.local). At this point the DNS server knows how to resolve internal Active Directory records such as domain controllers, LDAP services, and domain-joined machines. However, it has no knowledge of external internet domains like google.com or microsoft.com. When a domain-joined machine sends a DNS query for an external address, the request reaches the domain controller but cannot be resolved. Configuring a DNS forwarder solves this by instructing the DNS server to pass any unknown queries to an upstream resolver (for example, the home router or a public DNS server such as 8.8.8.8). The domain controller therefore resolves internal names itself and forwards everything else, allowing domain clients to access the internet while still using the domain controller as their primary DNS server.

To set up the forwarder, Open Server Manager > Tools → DNS > Double click on your server name > Right-click → Properties > Go to Forwarders tab > Click Edit > Add the virbr0gateway IP 192.168.122.1

If try the lookup again nslookup google.com, it will resolve.

Without a forwarder, the DNS server attempts recursive resolution using root hints, which can introduce delays or timeouts in lab environments behind NAT. Configuring a forwarder provides a faster and more predictable path for resolving external names.

Creating User Accounts

Next, create normal domain users. In the domain controller, launch Active Directory Users and Computers, go to Users folder and create two test users. testuser1 and testuser2. Set password and enable the account.

Create a client VM to join the domain

Now that the domain controller is ready, we can test if client VMs are able to join the new domain we created. To test the domain join, spin up a new Windows VM , our client VM. I created another instance of Windows 2022 Server as I did not want to download another iso, just for the testing.

Join client VM to Domain

When you spin up a new client VM, its preferred DNS will be the gateway 192.168.122.1 (in line with the IP range of the virtual network).

Configure the Client VM’s DNS

Before joining the domain, its DNS server must point to the domain controller’s IP. The reason being the domain hybrid.local is not publicly resolvable like google.com — it exists only in the domain controller’s DNS. If the client VM uses any other DNS server, it will fail to locate the domain controller and the domain join will not proceed.

Set-DnsClientServerAddress -InterfaceAlias “Ethernet” -ServerAddresses 192.168.122.10

if the name “Ethernet” is not resolvable use the below command to look up the actual interface alias

Get-NetAdapter | select Name, InterfaceAlias, Status

Testing the Client

Step 1 — Check the DNS server (from Client VM)

On the client VM, open PowerShell as Administrator and run:

ipconfig /all

DNS Server should be 192.168.122.10

Step 2 — Verify if nslookup resolves (from Client VM)

On the client VM, test domain controller discovery via DNS.

nslookup _ldap._tcp.dc._msdcs.hybrid.local

It should return —

Above snip shows that the test VM now uses the domain controller as it’s DNS and DNS server is responding. We are good to join the domain.

Step 3 — Check if SRV record is correct (from Client VM)

Run nslookup in interactive mode: Inside the prompt, query the SRV record

set type=SRV
_ldap._tcp.dc._msdcs.hybrid.local

If you see your domain controller hostname listed under svr hostname DNS + SRV discovery is working.

Step 4 — Test LDAP connectivity (from Client VM)

From the client, test LDAP connectivity. Test-NetConnection 192.168.122.10 — Port 389.The response should say TcpTestSucceeded : True

Join the domain (from Client VM)

We are all set to join this VM to the domain. Run the below PowerShell command to join the domain

Add-Computer -DomainName hybrid.local -Credential HYBRID\Administrator -Restart

What happens internally:

Client queries DNS for domain controller (SRV record)
Client contacts domain controller via LDAP
Admin credentials authenticated (Kerberos/NTLM) to authorize the join
Domain controller creates a Computer Object in AD
Machine account password established — this becomes the secure channel secret
Client reboots as domain member

Verification

Domain membership verification — Log in as HYBRID\testuser1 - that you created earlier. Run whoami; should return: hybrid\testuser1.echo %logonserver% should show your domain controller hostname \\HCE-DC01.

This proves that Kerberos + secure channel + domain controller communication is working.

Validate AD Object Creation — On the domain controller, if you open **Active Directory Users and Computers** and go to **Computers** you should see your client machine listed. This proves that AD object lifecycle works.

Summary

In this part we set up a domain controller, domain joined a VM and tested the connectivity. Although single Domain Controller set up works, it is a single point of failure. If the only domain controller fails:

Authentication stops
Kerberos tickets cannot be issued
Group Policy stops applying
New logons fail

In the next part, we will build redundancy for the domain controller by adding a secondary domain controller.

HandsOn — Building Hybrid Cloud Environment — Part 1 — Identity & Connectivity Foundation

Sun, 12 Apr 2026 00:00:00 +0000

Introduction

In this series, I will take you through building an on-premises / Azure hybrid environment, with the on-premises network running entirely on a single machine. We will set up an on-premises Active Directory forest, create OUs and users, deploy domain controllers, join Windows and Linux VMs to the domain, and establish hybrid connectivity to Azure using an S2S VPN tunnel.

I want to clarify right at the outset that on-premises identity is not a mandatory starting point for a hybrid cloud environment. But I have chosen to build it from the ground up starting with the identity plane (on-premises Active Directory) .

To follow along you do not require deep networking or Linux expertise, but comfort with basic bash and networking concepts will help — and we will build the required knowledge as we go. Where appropriate, I will reference official documentation rather than re-explaining well-documented concepts.

At the end of this series, I will share my Github repo with the automation scripts for the infrastructure.

When I started exploring hybrid environments, I assumed it would require multiple machines, dedicated networking, and possibly additional hardware. That made it feel like something I couldn’t easily experiment with on my own setup. As I explored further, I realized those assumptions weren’t entirely true. I was able to build a working hybrid environment on my laptop, keeping everything contained and manageable. This series documents how I put it together. Here’s the setup I’m using: a laptop running Linux Mint 22.1 (Xia) with 16 GB RAM, a regular home Wi-Fi router, and an Azure subscription (PAYG).Let’s get started.

Building an On-premises virtual network

In this first part of the series, I will set up the virtual network on my Linux laptop. This is the foundation for the hybrid environment we are building. By the end of this article, we will have laid the foundation which will include a virtual network — our on-premises representation and a Virtual Machine within the virtual network.

The above diagram shows what we will have in place by the end of this article.

Software prerequisites and why you need them

The on-premises network is virtual. We will need libraries and applications that enable the creation and management of virtual networks and virtual machines. Follow the below steps to prepare the environment.

Installation steps for KVM, QEMU, libvirt and virt-manager vary by Linux distribution and version. Refer to your distribution’s documentation for the correct package names and commands

Verify virtualization is enabled — Run egrep -c '(vmx|svm)' /proc/cpuinfo A result greater than 0 means your CPU supports hardware virtualization and you’re good to go.
Install KVM hypervisor and QEMU — These two work as a pair. KVM is the Linux kernel module that provides hardware virtualization, allowing the Guest OS to execute instructions directly on the host CPU at near-native speed. QEMU handles the emulated hardware that the Guest OS interacts with — the virtual disk drives, the network card, and the VGA BIOS that Windows thinks it’s seeing.
Install libvirt — this is the virtualization manager I will be using to manage my virtual network. It is the control layer. It translates GUI actions into XML definitions and complex command-line instructions for the hypervisor. It manages storage pools, virtual networks, and VM lifecycle. For example, you add a new virtual hardware, say, a CDROM, libvirt will generate the XML configuration to support CDROM virtualization which will then be read and processed by QEMU.
Virtual Machine Manager (virt-manager) — a handy GUI for libvirt. You launch it with virt-manager, and it lets you create and manage VMs but does not run them.
Download Windows Server 2022 Evaluation version ISO file. The OS must be a server OS that supports Active Directory Domain Services and hence I’ve chosen Windows Server 2022
Download virtIO from https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso

Once the above steps are completed, run virsh net-list --all This should show the default network that libvirt just created.

At this stage, the virtual network is up and all the tools needed to create and manage VMs are in place.

Role of libvirt, virt-manager, KVM and QEMU

It helps to build a simple mental model of how these components fit together — specifically, how virt-manager interacts with the underlying hypervisor.

The Management Flow (Control Plane) — when you create or configure a VM:

virt-manager → libvirt → QEMU/KVM

virt-manager provides the GUI
libvirt acts as the control layer, translating actions into configurations
QEMU/KVM executes those configurations

The Execution Flow (Data Plane) — when something runs inside the VM:

User → Guest OS (Windows) → QEMU → KVM → Hardware

QEMU handles device emulation (disk, NIC, etc.)
KVM provides direct access to CPU virtualization features

This separation helps explain why VM configuration and VM execution are two different layers.

Virtual Networking

Before we create the virtual machine, a few networking concepts around virtual networking and libvirt are worth clarifying.

Virtual network — the private address space where your VMs live. They exist only inside the host and are not visible to the WiFi network that is connecting your laptop with other laptops, phones and other devices in your WiFi network.

Bridge (virbr0)— Layer 2 construct — the virtual switch that connects your VMs to each other and to the host, like a physical switch in a rack. Imagine VM1 wants to send a packet to VM2, it forwards traffic based on MAC address learning (similar to a Layer 2 switch) and will forward the packet to VM2. It acts like a Layer 2 switch inside your Linux host. Every VM connected to the default network, plugs into this switch.

Gateway (192.168.122.1) — Layer 3 construct — the door out of the virtual network; packets destined for anywhere outside the virtual network go here first and then get routed based on the defined route table.

Gateway vs bridge — At first, they looked similar to me. It took me some time to understand the difference. The bridge connects devices at Layer 2 by MAC address. If two VMs in the same virtual network want to talk, the bridge connects them; they bypass the gateway. The gateway is the IP address assigned to the bridge interface, acting as the Layer 3 entry/exit point for the virtual network. It is the address VMs use when sending traffic outside the virtual network

VMs talk to each other through the bridge, they talk to the outside world through the gateway, and the virtual network is the address space that gives them all a place to live.

When libvirt is first installed, it automatically creates a default virtual network with a virtual switch called virbr0 — visible via ip a. Behind this bridge, libvirt configures dnsmasq for DHCP and DNS, and uses iptables/nftables on the host to provide NAT routing. Any VM you create is connected to this network unless you specify otherwise.

libvirt modes

libvirt supports three modes, namely, NAT, Bridge and Internal. For our hybrid environment, the virtual network uses NAT mode described in https://wiki.libvirt.org/VirtualNetworking.html meaning that the WiFi router sees all packets from the virtual network as originating from the Linux host. It has no notion of the virtual network. This approach requires no changes to the home network and keeps the virtual environment isolated, while still allowing outbound internet access.

Use bridged networking if you want your virtual machines to obtain an IP address from your LAN. Or use Internal Network if you want a fully isolated lab.

Traffic flow in default (NAT) mode looks like this:

VM →
virtual NIC →
virbr0 (virtual switch) →
NAT (iptables/nftables on host) →
Linux host →
physical network →
internet

If you use bridged mode, the VM connects directly to your physical network through a Linux bridge (e.g., br0). In that case:

VM →
virtual NIC →
Linux bridge →
physical NIC →
real LAN

No NAT. The VM behaves like a real machine on your network.

Creating Virtual Machines

Now, we can start creating a Virtual Machine. This virtual machine will be the primary domain controller (will be covered in Part 2), so, let’s name it appropriately. I will call it hce-dc01. Give at least: 4 GB RAM, 2 vCPU, 60 GB disk for the virtual machine. This sizing is sufficient for a lightweight domain controller while keeping resource usage manageable on a single host machine.

Launch virt-manager from the terminal. It opens up the GUI of virtual manager. Select option to create a new VM, follow the wizard by providing configurations as outlined above. Use the downloaded ISO image and install Windows Server OS. The installation is quite straightforward.

Important — Make sure you install Windows Server 2022 Desktop Experience.

Once the Windows Server 2022 OS is installed, the VM will reboot allowing you to set the Administrator password.

Once the VM is ready, go to the details view and verify the below settings.

For Virtual NIC choose — e1000e. This is an emulated Intel NIC that Windows recognizes out of the box — it gets you network access during installation before VirtIO (discussed in the next section) drivers are in place.

For video select QXL — this is a paravirtualized display adapter that gives you a responsive desktop with better resolution support compared to the default VGA

Add VirtIO ISO as CD-ROM in virt-manager

What and Why? — [VirtIO](https://wiki.libvirt.org/Virtio.html) is a paravirtualized device interface used between Windows and QEMU. Instead of emulating physical hardware like SATA or Intel NICs, VirtIO provides purpose-built virtual drivers that both the guest and hypervisor understand directly — reducing CPU overhead and improving throughput.

Follow the below steps to install VirtIO

Open virt-manager → select your VM → Open → Show virtual hardware details
Click Add Hardware → Storage → CD-ROM
Choose Select or create custom storage → point to ~/ISOs/virtio-win.iso
Boot (or reboot) the VM

Now inside the VM you will see two CD-ROMs:

SATA CDROM1 → Windows Server ISO
VirtIO CD-ROM → drivers

Installing VirtIO
Now that we have the VirtIO ISO mounted to VM as a CDROM drive, navigate to the CDROM drive and run the exe installer (**virtio-win-gt-x64.exe**) It installs all the necessary VirtIO drivers for disk, network, and optional devices automatically. Once the drivers are installed, shut down the VM, change the NIC type to ‘virtio’ in virt-manager, and then start it back up for better throughput.

This step completes the configuration of the Windows Virtual Machine. When hce-dc01 was created, virt-manager automatically connected it to the default virtual network created by libvirt — Let’s verify that now

Verification

To confirm the virtual network is configured correctly, run virsh net-dumpxml default

default
73e80935-c747-4ad7-88a1-5417707abc02

In the above snippet you will notice that bridge virbr0 is configured with a DHCP range from 192.168.122.2 to 192.168.122.254.

Also, remember this range must not overlap with the range your Wi-Fi provides (it usually doesn’t, but worth noting)

Summary

With these steps, we have a working virtual network and a Windows Server 2022 VM ready to be configured. In the next part, we will turn this VM into a fully functional Active Directory domain controller — laying the groundwork for identity in our hybrid environment.

Adding application roles to Managed Identity

Fri, 27 Feb 2026 00:00:00 +0000

This guide outlines the process for assigning application roles to a Managed Identity (MI) in Entra ID. It covers observed behaviors, inherent limitations, and the necessary steps required when an MI must authenticate with another application (such as an API in APIM) using role-based access control (RBAC).

Scenario

In a typical architecture, a Logic App utilizes a Managed Identity (either System-Assigned or User-Assigned) to communicate with downstream resources. When that Logic App needs to call an API exposed via APIM, the following requirements usually apply:

The API is protected by its own App Registration in Entra ID.
The API expects the caller to possess specific app roles (e.g., API.Read or API.ReadWrite).
The Logic App must obtain an OAuth token containing these roles to successfully authorize against the API.

The Challenge

Managed Identities are automatically created Service Principals. A common point of confusion is that they do not appear in the App Registration section of the Azure portal; they are found exclusively under Enterprise Applications.

Because the Azure portal does not currently provide a UI for assigning app roles to Enterprise Applications directly, it is not possible to assign roles like API.Read through the standard “API Permissions” blade used for traditional App Registrations.

The Workaround — Assigning App Roles via PowerShell / Microsoft Graph

You can use the below Powershell to assign roles to your Managed Identity

# Install-Module Microsoft.Graph -Scope CurrentUser (If not done already)

# Your tenant ID (in the Azure portal, under Azure Active Directory > Overview).
$tenantID = ‘{tenantId}’

# The name of the server app that exposes the app roles.
$serverApplicationName = ‘{serverApplicationName}’

# The name of the app role that the managed identity should be assigned to.
$appRoleName = ‘{appRoleName}’ # For example, Api.Read

# Look up the Logic App / Function (Client application) managed identity’s object ID.
$managedIdentityObjectId = ‘{managedIdentityObjectId}’

# Connect-MgGraph -TenantId $tenantId -Scopes ‘Application.ReadWrite.All’,‘Directory.Read.All’
# or a more restricted set of permissions (recommended):
Connect-MgGraph -TenantId $tenantId -Scopes ‘Application.Read.All’,‘AppRoleAssignment.ReadWrite.All’

# Look up the details about the server app’s service principal and app role.
$serverServicePrincipal = (Get-MgServicePrincipal -Filter “DisplayName eq ‘$serverApplicationName’”)
$serverServicePrincipalObjectId = $serverServicePrincipal.Id
$appRoleId = ($serverServicePrincipal.AppRoles | Where-Object {$_.Value -eq $appRoleName }).Id

Write-Host ‘$serverServicePrincipal ’ $serverServicePrincipal
Write-Host ‘$managedIdentityObjectId ’ $managedIdentityObjectId
Write-Host ‘$serverServicePrincipalObjectId ’ $serverServicePrincipalObjectId
Write-Host ‘AppRoleId >’ $appRoleId

# Assign the managed identity access to the app role.
New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $serverServicePrincipalObjectId -PrincipalId $managedIdentityObjectId -ResourceId $serverServicePrincipalObjectId -AppRoleId $appRoleId

PrincipalId → Managed Identity object ID (Logic App)
ResourceId → API service principal object ID
AppRoleId → GUID of the role defined in the API registration

After this assignment, tokens requested by the Managed Identity will include the required roles claim, allowing successful authorization against the API.

Note

For clientId to be able to be used as an audience it must “own” App Roles. And the consumer-client-id should have been provided this roles in AAD. I think you can further check these claims in the Authentication section

Key Takeaways

Managed Identities always appear as Enterprise Apps in Azure AD.
App roles cannot be assigned via the portal for Enterprise Apps; Graph / PowerShell is required.
Token validation depends on correct Issuer, Audience, and presence of role claims.
Explicit role assignment ensures tokens carry the required roles for API authorization.

Troubleshooting notes — Azure Table Storage 403 Authentication

Sun, 22 Feb 2026 00:00:00 +0000

Symptom

Calling Azure Table Storage REST API returns:

403 Server failed to authenticate the request.
Make sure the value of Authorization header is formed correctly including the signature.

Even though Authorization header looks valid

Root Cause

The request is missing x-ms-version header

Azure Storage requires this header to determine the API version used for request validation. Without it, the service may reject the request with a misleading authentication error.

Fix

Add header

x-ms-version: 2020–08–04

Example minimal headers:

x-ms-version: 2020–08–04
Accept: application/json;odata=nometadata
Content-Type: application/json

Lesson learned

If Azure Storage returns a 403 authentication error for a manually signed REST request, check for missing x-ms-version before debugging the signature.

Access AppConfiguration from Function App using Managed Identity

Sat, 21 Feb 2026 00:00:00 +0000

Accessing Azure App Configuration using Managed Identity in Azure Functions is slightly different from accessing other Azure services.

For most Azure services (Storage, Service Bus, Key Vault), you typically:

Enable Managed Identity on the Function
Grant RBAC access to the resource
Create the SDK client using DefaultAzureCredential

However, App Configuration is usually loaded as part of the application configuration pipeline at startup, so it must be added via the host builder.

Prerequisites

Enable Managed Identity on the Function App
Grant the identity: App Configuration Data Reader on the App Configuration resource

Sample code as shown below

var host = new HostBuilder()
.ConfigureAppConfiguration(builder =>
{
string cs = Environment.GetEnvironmentVariable(“ConnectionString”);
builder.AddAzureAppConfiguration(options =>
options.Connect(new Uri(@“https://appconfiguri.azconfig.io”), new ManagedIdentityCredential()));
})
.ConfigureFunctionsWebApplication()
.Build();
host.Run();

Note: I’m using ManagedIdentityCredential but the recommend class is DefaultAzureCredential

Key Insight

Other Azure services → authenticated when creating the client
App Configuration → authenticated when building the configuration provider. That’s why it must be configured inside ConfigureAppConfiguration().

Coding with Integrity

Fri, 20 Feb 2026 00:00:00 +0000

Coding with Integrity

The real measure of a software engineer is simple — how you code when no one is watching.

We often associate strong engineering with technical brilliance — mastering languages, designing scalable systems, or solving complex problems.

But beyond skill, the most valuable attribute a software engineer can bring to the table is integrity.

“Coding with Integrity, is how you code when you know that no one is going to review your code”

Coding with integrity is about the choices you make in the quiet moments of development — when there’s no reviewer, no deadline pressure, and no immediate accountability except your own standards.

Many common engineering issues don’t come from lack of knowledge.
They come from small decisions made in those unseen moments.

Design decisions

When implementing a feature, it’s easy to think only about the immediate ask: Does it work? Does it avoid breaking anything?

Integrity pushes the thinking further: Is this the right approach? Is it maintainable? Should I pause and rethink this before moving forward?

Unit tests

It’s possible to reach high coverage while knowing the tests don’t really validate behaviour.

Integrity asks: Do these tests genuinely protect the system? Would I trust them if something broke tomorrow?

Technical debt

Sometimes we clearly see duplication, fragile logic, or missed refactoring opportunities.

Integrity isn’t about always fixing everything immediately. It’s about being honest: acknowledging the debt, documenting it, not pretending the shortcut is a solution and ensure the debt is addressed.

Documentation and clarity

After spending days or weeks on a module, everything feels obvious.

Integrity means writing code and comments for the next reader — even if that reader is your future self, months later.

Maybe integrity in coding isn’t something we formally learn or measure.

Maybe it’s simply the voice that nudges us toward clarity, correctness, and responsibility. Whether we follow that voice or ignore it is what ultimately shows up in our code.

Cheers!

Managing Azure APIM Operation Policies in Terraform by Importing OpenAPI Specification

Fri, 20 Feb 2026 00:00:00 +0000

When using Terraform to import an OpenAPI/Swagger definition into Azure API Management (APIM), the API and its operations are created successfully. However, one subtle behavior can cause confusion when trying to manage operation-level policies declaratively.

This post explains the issue and a simple workaround.

The Scenario

I was importing my API using Terraform:

Swagger/OpenAPI definition imported into APIM
API created successfully
All operations appeared correctly in Azure

Later, I wanted to attach operation-level policies using Terraform using azurerm_api_management_api_operation_policy

At this point I ran into a problem: Terraform had no record of the operations in its state file.

Why This Happens

This behavior is expected once you understand how Terraform works. Terraform only tracks resources explicitly declared in configuration, or
resources manually imported into state

When Swagger is imported via azurerm_api_management_api the operations are created inside Azure, but they are not separate Terraform-managed resources unless you explicitly declare using azurerm_api_management_api_operation

Effectively — API is created in Azure and tracked in Terraform while
API Operations (via Swagger import) are created in Azure but NOT tracked in Terraform

This makes it unclear how to attach policies to those operations without creating the operations explicitly — a nightmare if you have hundreds of operations

The Simple Workaround

You do not need a Terraform resource reference to the operation for you to create an operation policy and attach it. Instead, you can attach the policy directly using azurerm_api_management_api_operation_policy resource and referencing the Swagger operationId.

Example:

resource “azurerm_api_management_api_operation_policy” “my_op_policy” {
provider = «provider»
api_name = “”
api_management_name = data.azurerm_api_management.apim.name
resource_group_name = data.azurerm_api_management.apim.resource_group_name
operation_id = “”
xml_content = templatefile("", {
backend_name = “”
method = “”
})
}

As long as the API exists in APIM and the operation exists and operation_id exactly matches the Swagger operationId — Terraform can apply and update the policy successfully. No explicit Terraform operation resource is required.

Notes

1. Use the Swagger operationId, not the display name. Terraform identifies the operation strictly by operationId.

2. Treat operationId as a stable contract. If you later rename the operationId or remove an endpoint or restructure the Swagger Terraform may fail because the referenced operation no longer exists.

3. Importing operations individually is possible but rarely worth it. You can define azurerm_api_management_api_operation and import each operation manually into Terraform state. However, it requires one resource per operation. Also, manual imports are tedious and scales poorly for large APIs thus defeating the benefit of Swagger-driven API definition

For most setups, referencing operationId directly in the policy resource is simpler.

Takeaway

When importing Swagger into APIM using Terraform:

Operations are created in Azure
Terraform does not automatically track them
Operation policies can still be managed declaratively by simply referencing the Swagger/OpenAPI Spec operationId

Understanding this distinction can save significant time when automating API Management deployments.

Demo workflow for Minikube

Sat, 31 Jan 2026 00:00:00 +0000

Photo by Shubham Dhage on Unsplash

Here’s a ready-to-run “one-shot” demo workflow for Minikube that sets up a webserver deployment, exposes it, configures HPA, and generates load so you can see autoscaling in action immediately.

You can copy-paste these commands one after the other in your terminal.

Step 0: (Optional) Clean up old resources

kubectl delete deployment webserver --ignore-not-found
kubectl delete svc webserver --ignore-not-found
kubectl delete hpa webserver --ignore-not-found
kubectl delete pod load-generator --ignore-not-found

Step 1: Create the webserver deployment

kubectl create deployment webserver --image=gcr.io/google_containers/echoserver:1.10

Step 2: Expose the deployment as a NodePort service

kubectl expose deployment webserver –type=NodePort –port=8080

Check service:

kubectl get svc webserver

Step 3: Enable metrics-server if not already

minikube addons enable metrics-server

Step 4: Create Horizontal Pod Autoscaler

kubectl autoscale deployment webserver –cpu=20% –min=1 –max=5

Check HPA:

kubectl get hpa

Step 5: Launch load-generator pod

kubectl run -i –tty load-generator –image=busybox – /bin/sh

Inside the pod, generate heavy load:

while true; do wget -q -O- http://webserver:8080 & done

The & ensures requests run in parallel for higher CPU usage.
This will trigger the HPA to scale the webserver pods.

Step 6: Watch autoscaling in another terminal

kubectl get hpa -w
kubectl get pods -w

You will see replicas increase as CPU usage rises.
When you stop the load (Ctrl+C in the BusyBox pod), HPA will scale pods back down.

Step 7: Optional — test webserver from host

minikube service webserver

Opens your webserver in the browser.

Using Model Overlays using .modelfile

Sat, 31 Jan 2026 00:00:00 +0000

Photo by Steve Johnson on Unsplash

When you want to use a model but don’t want to keep initializing it with a specific persona, temperature, and other attributes, you can use the .modelfile Customization Approach.

Step 1: Create a .modelfile as shown below (sys_admin.modelfile)

# 1. THE BASE (Required)
FROM llama3

# 2. BRAIN PHYSICS (Parameters)
PARAMETER temperature 0.7 # Creativity (0.0 to 1.0+)
PARAMETER num_ctx 4096 # How many “tokens” of memory it has
PARAMETER top_k 40 # Limits the “vocabulary” pool for each word
PARAMETER top_p 0.9 # Probability threshold for word choice
PARAMETER repeat_penalty 1.1 # Prevents the AI from getting stuck in a loop
PARAMETER stop “User:” # Tells the AI exactly when to stop talking
PARAMETER stop “—”

# 3. THE TEMPLATE (The “Skeleton” of a conversation)
# This defines how the model sees the Turn-taking between User and AI.
TEMPLATE “”"{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

# 4. (System Instructions)
SYSTEM """
You are a specialized Azure Networking Assistant and System Administrator with plenty of experience.
You provide CLI commands for Linux Mint and PowerShell for Windows.
Constraints:
1. If a config is insecure, call it out immediately.
"""

# 5. PRE-LOADING (The “Conversation Starter”)
# You can bake in a “fake” memory so the model thinks it’s already talking to you.
# [OPTIONAL] ADAPTER ~/models/my-adapter # (for actual fine-tuned weights)
MESSAGE user “Check the S2S status.”
MESSAGE assistant “checking the IPsec tunnels now. One moment.”

Step 2: Create an overlay on top of existing model

Once the .modelfile is ready, pick one of you exisiting models and create a new overlay like so -

ollama create my-new-overlay-sysadmin -f ./sys_admin.modelfile

Step 3: Create an alias for easy use

To make it “instant” so you don’t have to type long commands, you add an alias to your .bashrc file. This is the bridge between your OS and the AI.

Open your config: nano ~/.bashrc
Add this line at the bottom: alias summon-admin=’ollama run my-new-overlay-sysadmin’
Save and refresh: source ~/.bashrc

How it works in practice

Now, whenever you are looking at a messy config file on your machine, you just pipe the text to your new friend:

cat /etc/ssh/sshd_config | summon-admin

The model will wake up, read the file, and start grumbling about your security choices.

How is this different from prompt engineering

1. Hardware & Environment Parameters

Prompt engineering cannot change how the computer actually runs the model. A .modelfile can.

Parameter Tuning: You set things like PARAMETER temperature 0.2 (for consistency) or PARAMETER num_ctx 4096 (how much “memory” it has for your config files).
Stop Sequences: You can tell the model exactly when to stop talking (e.g., PARAMETER stop "User:"), preventing it from rambling.

2. The “Persona” vs. The “Ask”

Prompt Engineering: You have to tell the model every time: “Act like a sys admin and check this file…”
Modelfile (The Base-Overlay): The persona is “baked in.” when you launch your “SysAdmin” model.

3. Layered Inheritance (The “FROM” command)

This is the part that is impossible with just prompting.

In a .modelfile, the first line is usually FROM llama3(or any model that you use). This is Inheritance.

Extracting Swagger definition for Azure Logic App and importing to Azure APIM

Mon, 10 Jun 2024 00:00:00 +0000

Use case — I want to import a Logic App as an API within my APIM instance.

There is no direct way to get the swagger file of a logic app using CLI (at least, I could not figure out). So, detailing the steps to extract the swagger definition of a logic app. I use the generated swagger file to import a Logic App as an API within APIM using Azure CLI

Provide the service principal contributor role to the logic app

Get the resource id of the logic app —

$logicAppResourceId = (az logic workflow show –resource-group “{resourcegroup-name}” –name “{logicAppName}” –query id –output tsv)

Provide contributor role for the service principal —

az role assignment create --assignee {sp-id} - role Contributor –scope $logicAppResourceId

2. Get the swagger file from the Logic App

generate a JW token from https://login.microsoftonline.com/{tenantId}/oauth2/token for the service principle with resource as “https://management.core.windows.net/”

$tenantId = “11111111-1111-1111-1111-111111111111”
$clientId = “00000000-0000-0000-0000-000000000000”
$clientSecret = “your-client-secret”
$resource = “https://management.core.azure.com/"

$body = @{
grant_type = “client_credentials”
client_id = $clientId
client_secret = $clientSecret
resource = $resource
}

$response = Invoke-RestMethod -Method Post -Uri “https://login.microsoftonline.com/$tenantId/oauth2/token" -ContentType “application/x-www-form-urlencoded” -Body $body

$accessToken = $response.access_token
$accessToken

construct the swagger URL for the logic app —

$swaggerUrl = “https://management.azure.com” + (az logic workflow show –resource-group “{resourcegroup-name}” –name “{logicapp-name}” –query id –output tsv) + “/listSwagger?api-version=2016–06–01”

Issue a POST request to $swaggerUrl to get the swagger definition of the LogicApp using Postman (or any other option you prefer)

3. Import into APIM

Run the below command to import the above swagger file to APIM

az apim api import –resource-group “{resourcegroup-name}” –service-name “{apim-instance-name}” –path “/v1” –api-id myapi –specification-path “.\logicapp.backend.swagger.json” –specification-format Swagger

4. Remove the contributor role for the service principal

az role assignment delete –assignee 00000000–0000–0000–0000–000000000000 –role “Contributor” –scope $logicAppResourceId

Calling a Logic App from APIM

Wed, 22 May 2024 00:00:00 +0000

There are couple of ways to integrate an APIM with Logic App. The most common use case as far as I know is exposing the Logic App as an API on the APIM. The other scenario is calling a Logic App from APIM.

I will provide the APIM policy snippet to call a Logic App. If you are using Managed Identity to authenticate to Logic App (will cover in a separate article), you can skip sending the bearer token.

Few steps to be done in the Logic App

enable Authentication at the Logic App end
the Logic App URL should not contain the SAS token
make sure that the Logic App has the below in trigger section. Basically, this is the ensure that the Logic App expects the Bearer token and “IncludeAuthorizationHeadersInOutputs” ensures that the Auth token is available for further processing within the Logic App

“triggers”: {
“manual”: {
“conditions”: [
{
“expression”: “@startsWith(triggerOutputs()?[‘headers’]?[‘Authorization’], ‘Bearer’)”
}
],
“inputs”: {
“schema”: {}
},
“kind”: “Http”,
“operationOptions”: “IncludeAuthorizationHeadersInOutputs”,
“type”: “Request”
}
}

APIM Policy to call the Logic App

We issue a call to the Logic App from with the . The response from the Logic App is captured in response-variable-name=”responsela”.

      
        https://xxxxxxx.com:443/workflows/xxxxxxxxxxxx/triggers/manual/paths/invoke?api-version=2016-10-01  
        POST  
          
            application/json  
          
          
            Bearer \*\*\*\*  
          
      

      
          
        @(((IResponse)context.Variables\["responsela"\]).Body.As(preserveContent: true).ToString())

All the tags are quite self-explanatory and there a loads of documentation available about them. is very useful policy, it suspends further policy pipeline execution and returns to the caller.

Hope this helps.

Cheers!

Reducing Data Transfer Objects using Tuples in C#

Thu, 25 Nov 2021 00:00:00 +0000

I have come across quite a few ASP.NET Core WebAPI solutions where there is a inordinate number of Data Transfer Object (DTO) classes. This results in a kind of class explosion which I think can be avoided. Yes, DTOs do have their utility, no doubt. But, many a times as the application evolves and grows, we often end up with numerous DTOs and these DTOs sometimes differ just by a handful of attributes or in some cases they are a simple composition of multiple entities / DTOs.

One of the reasons we have so many such DTO classes is the need to pass data to and from repository and service layer ( between different layers of the application for that matter). In order to find a way around creating yet another DTO, I was exploring some options and realized that Tuples can be used to minimize the creation of DTOs

Tuples have been around in C# for quite sometime now. I am not sure if it is a common knowledge but I recently figured out that you could eliminate quite a few DTOs that we use to ferry data between the layers by leveraging Tuples

Let us take the following scenario of instance.

We have an API, say, GetCustomers which, of course, will return me the list of customers . And, we have an entity, Customer as defined below.

class Customer
{
public int customerId {get; set;}
public string firstname {get; set;}
public string lastname {get; set;}
}

The API response for our GetCustomers API is as below

You would have noticed that the attribute count is expected in the response and this is not present in our Customer class. The repository layer would just return a List but the service layer needs to pass it along with the count attribute to the controller. This is usually where we tend to create a DTO as below.

class CustomerDTO
{
int count;
List customers
}

The only reason for the above class to exist is to ferry the data from repository in a format that the controller is expecting. We can eliminate this class altogether by returning Tuple as below

return new Tuple>(result.Count,result)

Granted, this is a very trivial scenario and you can add the count attribute in the controller and return an anonymous type also.

Now consider the cases when you need a response that is aggregation of multiple custom types. For instance, if we have two API one to get customer and another to get order details we would have created two DTO for Customer and Order. If a new API is required that gives details pertaining to a particular Customer and all related Orders as response, you might have to create a new DTO again, as below.

public class newDTO {
public int orderCount {get; set;}
public int customerId {get; set;}
public List orders {get; set;}
}

The expected response is

This is exactly what we can avoid by using Tuples like below in the service and repository layer.

RepositoryLayer.cs

var repoResponse = new Tuple>(count,custResult, orderResult);
return repoResponse;

custResult holds a particular customer’s data and orderResult will be a List

ServiceLayer.cs

.
.
.
//Create a Tuple with three members, count, customer and orders
//repoResponse is the response from your repository(a Tuple)
(int count, Customer customer, List orderList) result = (repoResponse.Item1, repoResponse.Item2, repoResponse.Item3);

return result;
}

From the above service response, the controller can create an anonymous type as below without ever creating a DTO and return the response .

return new { ordercount= serviceResponse.count, customer = serviceResponse.customer.customerId, serviceResponse = serviceResponse.orderList };

It should be noted that, although this approach eliminates the need to create DTO classes, it come at the cost of readability. Your method signatures may not be very elegant and readable. While DTOs will still be the right way to go in some scenarios, for others, Tuples can help.

Hope this helps in reducing a few DTOs at your end.

Cheers!

How to get rid of a rouge instance in Azure App Service Plan

Thu, 22 Apr 2021 00:00:00 +0000

If you have been using Azure App Services for a while to host your API, there is a small chance that you would have encountered the issue with a faulty instance. Your API just doesn’t respond or keeps crashing in a particular instance. And, if your ARR Affinity was enabled, your problems will just be exacerbated. Some users will always be routed to the faulty instance.

AFAIK, there are no straight forward way to release an instance that is allotted to you by Azure for the given App Service Plan. Adding more instances and removing instances (scale out / in) will not guarantee that the rogue instance will be released. I will share the approach I took to get rid of the rogue instance. Note that, the approach below needs your app service to be out of rotation and should not be serving incoming requests.

Assume that you suspect that a given instance in your App Service Plan has issues and is crashing frequently and you wish to remove this instance. As of today, there is no way to select an instance and remove it via the Azure Portal (yes, you can stop an instance from Process Explorer, but it would still not get rid of the instance). One way to achieve this would be to use vertical scaling (up/down). When you scale up/down Azure allocates necessary hardware based on the target pricing tier you have chosen. The infrastructure differs significantly across tiers and moving across tiers will almost always guarantee different infrastructure allocation. We will use this to get rid of the rogue instance.

Start by scaling down to a lesser tier (moving laterally within the same tier may not help) For instance, if you are operating on a Premium tier, move to Standard. This action will make Azure allocate new instances in the lesser tier that you have chosen. Now, after scaling down, scale up again to your target pricing tier. When you do this, you are going to be allocated fresh (at least not the old rogue) instances. This is how I got rid of one of the instances that was bothering me.

Hope this helps.

How I ended up writing cleaner PATCH calls using JSON Patch

Mon, 06 Jul 2020 00:00:00 +0000

I have written my fair share of RESTful API but am no expert by any measure. I had never given enough thought about the HTTP Verbs I should be using (like, PUT, POST, PATCH) while writing API. If a resource had to be created, I would automatically go for POST (_never considered the idempoten_cy angle at all) and if a resource had to be modified, I would go for PATCH.

On one of my API assignments, I decided to make a very conscious and deliberated choice of the verbs I will be using; and an API in particular got me thinking.

I had to write an API to modify a resource and this resource happened to have nested resources and numerous attributes. I soon realized that it wasn’t so straight forward to create an elegant PATCH API due to the sheer number of attributes on this resource that can potentially get modified (Why did you design a resource with so many attributes in the first place? you might ask. But that is a topic for another day)

So, coming back to the task in hand, I was aware of only two choices to go about writing a PATCH call. Either, send just the attributes requiring modifications to the API (Approach 1) or end the entire resource to API after making changes to the necessary attributes at consumers’ end (Approach 2). We will examine both the approaches in the context of the below two classes (Employee and Address)

    public class Employee  
    {  
        public int EmployeeId {get; set;}  
        public string EmployeeName {get; set;}  
        public Address EmployeeAddress {get; set;}  
        public string WorkLocation {get; set;}  
        public List PreferredWorkLocations {get; set;}

    }

    public class EmployeeAddress  
    {  
        public string HouseNumber {get; set;}  
        public string AddressLine1 {get; set;}  
        public string AddressLine2 {get; set;}  
    }

Approach 1: The consumer of the PATCH API will send the entire resource, after changing a few select attribute that need to be modified. At the API end, the PATCH payload will be handed over to the repository layer which would then update the entire resource to the database. Other than some basic validations, no additional work is needed at the API end. A sample PATCH call (almost a PUT) would look like below

    **api/ModifyEmployee/{empId}  
    **{  
    "EmployeeName" : "ename",  
    "EmployeeAddress" : {  
                "HouseNumber" : "F32",  
                "AddressLine1" : "Addr1",  
                "AddressLine2" : "Addr2"  
                },  
    "WorkLocation" : "Brazil",  
    "PreferredWorkLocations" : \["Brazil","France"\]  
    }

Downside: the consumer has to build the entire object even if only a single attribute requires modification. Also note that, there is no way of knowing if just the “houseNumber” has changed or any other / all the attributes of Employee has changed. So, all the attributes’ values need to be copied back to a new object object to be persisted in the database.

Approach 2: The consumer of the API will send only the attribute that had to be modified. A sample PATCH for modifying the work location will look like below:

    **api/ModifyEmployeeWorkLocation/{empId}  
    **{  
    "WorkLocation" : "Brazil"  
    }

Downside: the onus of constructing an object that can be handed over to the repository layer falls on the API service layer (an object mapper need to be used here)

Further, this approach might necessitates that a new API be created of each combination of possible modifications in the resource attributes. Consider if I have to update the Employee Address I will have to have another method like api/ModifyEmployeeAddress/{empId}. If the class has many attributes that could be modified this can lead to explosion of PATCH methods.

JSON Patch

Neither of the approaches appealed to me. This is when I stumbled upon JSON PATCH. Honestly, I had never heard of JSON Patch before and wanted to give it a shot as I thought it would address the downsides mentioned above.

What I like the most about JSON Patch was that as a consumer I don’t have to send the entire object as payload for the PATCH call, I can just mention what operation (add / remove / replace / copy) I want to perform on which resource attribute / subset of attributes. Also, at the API end, I there is not need to have multiple methods for each type of modifications and there is no need to manually copy over the incoming values to a new object that the repository will understand and persist

Using JSON Patch, these call can be as simple as below

    **\[    
        {**  
            "**value**": "address line one",  
            "**path**": "/address/addressLine1",  
            "**op**": "replace"  
        **}  
    \]**

The advantage of using JSON Patch is that you don’t have to reconstruct the object at your API end. You can use a middle-ware like NewtonsoftJsonPatch and simply use ApplyTo method to construct the object for persistence / further processing.

    public async Task UpdateEmployee(\[FromBody\] **JsonPatchDocument** patchDoc, int empId)  
    {  
        if (patchDoc != null)  
        {  
            var emp = await .GetAsync("cacheKey");

            if (emp == null)   
            {  
                emp = await yourService.GetEmployeeData(empId);  
            }

            **patchDoc.ApplyTo(emp, ModelState);**

            //call repository to update.   
            \_ = await yourService.UpdateAsync(emp);

            return new ObjectResult(emp);  
        }  
        else  
        {  
            return BadRequest(ModelState);  
        }  
    }

The ApplyTo method will take care of copying (or performing any operation based on the value supplied in “op” attribute of the PATCH call) the new values to the existing object. This eliminates the need to do this mapping and copying manually using a mapper.

Another plus is that you don’t have to have multiple PATCH calls for each of the attributes, you can club multiple modification requests in the same PATCH call like below. Please note that you have to use /- notation to add to a list.

    \[  
    {  
            "value": "new work Location",  
            "path": "/preferredWorkLocations/-",  
            "op": "add"  
    },  
    {  
            "value": "address line one",  
            "path": "/address/addressLine1",  
            "op": "replace"  
    }  
    \]

You may refer to the below link for detailed information on how to use JSON Patch in ASP.NET Core.

JsonPatch in ASP.NET Core web API
_By Tom Dykstra and Kirk Larkin This article explains how to handle JSON Patch requests in an ASP.NET Core web API. To…_docs.microsoft.com

So, that’s how I embraced JSON Patch.

Cheers!

Using Azure function proxies for mocking API

Sun, 14 Jun 2020 00:00:00 +0000

There are many options available when it comes to mocking API response, like, JSON server or even having a response JSON file added to your solutions, to cite a few. In this article we will see how Azure function proxies can be used to mock API responses.

Azure function provides an elegant option to mock API response using proxies. Using a Azure function proxy, you can provide a mock endpoint which can be used by your team to continue their work till your actual API is ready for integration.

Let us go ahead, create a simple proxy and see how the mock response is served.

We will be creating a proxy end point which will service a GET call, say, getCustomer. Our getCustomer API method is expected to provide a response in the below format. So, till getCustomer is up and ready for consumption, our proxy can be used to get the below JSON as mock response.

Below are the steps for create a Function proxy.

Step 1: We will create an Azure function app which will host the proxy. (If there is already a general purpose / maintenance Function App present we can use that.)

Step 2: Now that we have created the function app to host our proxy, let us create our proxy. Choose the “Proxies” item in the Azure function blade as shown below. Click on “Add” to create a new proxy. We will call this MockCustomerAPI

And we will provide a route /api/getcustomer. In the HTTP Method section, we select “GET”. Please note that we can choose to mock other HTTP methods like POST as well.

Step 3: This is the step where we will provide the response we want the proxy to send us back. We will override the response as shown below by expanding the “Response override” link and paste our mock response in the space provide in Body section.

We can provide the status code and status message as per our use case and click on “Create”. Once the proxy gets created successfully we will be provided with a link to access the proxy as shown below

Step 4: Now that we are done with creating the proxy, let us test. To test our proxy, copy the generated proxy URL and open in the browser. We will see the response as below

Thus, we have created a proxy for the getCustomer API which can be used by the UX team or other API teams to integrate during the early development cycles when our API is not ready yet. Please do note that mocks are not just for GET method, you could do other HTTP methods as well

Some of the advantages of this approach are

Mock API responses unblocks the collaborating team like UX team as they can work against the mock endpoint till the actual API is ready
Testing is easier and thorough if your API relies on a partner API. Creating a mock endpoint gives you the flexibility to change the response and test your code for all possible, allowed parameter values from the partner API
If there is a dependency on partner API which is not available in your lower environments, you can resort to creating a proxy in lower environment.
As it is hosted in a common URL, same contract will be used across all crews consistently. Any change done to the contract will be immediately visible to all the consuming developers.
Eliminates the need of having a separate JSON response file or JSON server on local dev box and thus ensure you are developing against the latest contract

Before we conclude, a note about CORS: If you are hitting the proxy from your front-end web application, please ensure you tweak the CORS setting for the mock function app accordingly as show below

Conclusion

There are many useful feature of Azure function Proxies like redirection and route template parameters. You can read more about Azure Function proxies in official Microsoft documentation.

Scheduling vertical scaling using Microsoft Azure Automation Accounts

Sat, 25 Apr 2020 00:00:00 +0000

Scaling cloud resources dynamically is a fascinating topic. Microsoft Azure provide quite a few ways to dynamically scale resources. This article focuses on creating a scheduled vertical scaling (scaled up/down) of App Services. The approach outlined here can be used for other Azure resource like SQL Databases, Redis Cache or in fact pretty much most of the Azure resources that support scaling. Just to clarify right at the outset, we are talking about vertical scaling (between pricing tiers) and not horizontal scaling (scale in/out) wherein we deal with the number of instances at our disposal.

It is a common knowledge that Azure provides out-of-the-box options to scale out/ scale in based on the scaling rules for App Service plan but there is no way to scale up / scale down as per some schedule.

For instance, there is no direct way to say that between 10 AM and 12 Noon, let my App Service plan run on P2V2 and come back to P1V2 there after or have my SQL Server move up to P6 for a few hours before coming back to S2. In other words, no option to scale up/scale down based on schedule

Note on Serverless Azure SQL Database :- We have Serverless Azure SQL Databases with two key capabilities which make them attractive in terms of cost. 1. The option to auto scale up / down between the minimum and maximum threshold 2. Auto-pause — wherein the SQL Server is stopped after a predefined period of inactivity till some activity is detected again. You don’t get charged for the period of inactivity. The downside is that it will take some time of the SQL Server to warm up and be available for the next use after the period of inactivity. Serverless is best suited for test / dev environments where you have tolerance to the brief period of connection unavailability during the warm-up. There could also be slight performance degradation for sometime as the cache memory are gradually reclaimed. Serverless is not for your use case if these limitations are not acceptable. Furthermore, there are cases when although your usage will be limited for an interval you cannot afford to shut down the server using auto pause. Without auto pause you will be charged for the minimum number of vCores and minimum memory configured. For more details, refer to Microsoft documentation on Azure Serverless SQL Database

So, if your case is such that you will want to use DTU based provisioning and still want scaling based on a schedule as you have predictable utilisation, you can use the approach outlined in this article. One example that comes to my mind is bumping up your DTU for a few hours when you are doing a performance test or scaling down during a seasonal / weekend low utilisation to save costs.

Alright, now that we have context set, let us move on to see how we can achieve this scheduled vertical scaling for Azure resource in the following section

To start with, create an automation account. Details on how to create automation account can be found here. Azure Automation allows us to invoke runbooks as per a schedule. We will leverage this capability for our purpose. Please remember to select the option to create a RunAs account while creating the Automation account as shown below. This is the principle under with the runbooks can execute.

After the automation account is created, create two Runbooks which will be invoked by a scheduler to perform the scaling operation automatically without our intervention. These runbooks will contain PowerShell scripts to perform the scaling operation based on your need. (one for Scale up and another for Scale down) The fact that we use PowerShell to perform the scaling gives us the option to scale pretty much all resource for which you can get hold of PowerShell scripts to scale; and the best source for PowerShell reference is Microsoft’s official documentation.

Now that you have the automation account and the runbooks that you need, create a schedule and link these runbooks as per your need. (I won’t go into details on creating a schedule as it is very well documented and simple. Refer to Microsoft’s documentation on how to create a Schedule )

So, for instance, the below schedule will automatically call the “ScaleDown” runbook at 5:10 AM on 7th Feb

The below PowerShell script can be used to scale up / down an App Service Plan

Write-Output “API Scale”

$connectionName = “AzureRunAsConnection”
try {
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName

Add-AzureRmAccount -ServicePrincipal -TenantId $servicePrincipalConnection.TenantId -ApplicationId $servicePrincipalConnection.ApplicationId -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint

Set-AzureRmAppServicePlan -ResourceGroupName “«yourresourcegroup” -Name “«yourappserviceplanname” -Tier PremiumV2 -NumberofWorkers 2 -WorkerSize “Medium”}

catch {
if (!$servicePrincipalConnection){
$ErrorMessage = “Connection $connectionName not found.”
throw $ErrorMessage } else{
Write-Error -Message $_.Exception
throw $_.Exception
}}

In case you receive an error as below, go and update the PowerShell modules in your automation account. That should fix the issue

The term ‘Set-AzureRmAppServicePlan’ is not recognized as the name of a cmdlet, function, script file, or operable program.

Below, I have provided a sample PowerShell to scale a SQL database; it scales to database to P1 tier. This script uses a credential to perform the DB scaling as opposed to the AzureRunAsAccount in the previous PowerShell.

To create a new credential, navigate to your automation account and select the “Credentials” option in the “Shared Resources” section. Refer to the below screen shot showing the credential creation

param([parameter(Mandatory=$true)] [PSCredential] $Credential )

# Name of the Azure SQL Database server
[string] $SqlServerName = “yourserver.database.windows.net”

$Servercredential = New-Object System.Management.Automation.PSCredential($Credential.UserName, (($Credential).GetNetworkCredential().Password | ConvertTo-SecureString -asPlainText -Force))

$CTX = New-AzureSqlDatabaseServerContext -ServerName $SqlServerName -Credential $ServerCredential

[string] $DatabaseName = “yourdb”
[string] $Edition = “Premium”
[string] $PerfLevel = “P1”
$Db = Get-AzureSqlDatabase $CTX –DatabaseName $DatabaseName

Write-Output “Database Scale state " - $Db.ServiceObjectiveAssignementStateDescription

if($Db.ServiceObjectiveName -ne $PerfLevel -and $Db.ServiceObjectiveAssignementStateDescription -ne “Pending”){
$ServiceObjective = Get-AzureSqlDatabaseServiceObjective $CTX -ServiceObjectiveName $PerfLevel

# Set the new edition/performance level
#None, Business, Web, Premium, Basic, Standard”

Write-Output “Trigger the scale operation”
Set-AzureSqlDatabase $CTX –Database $Db –ServiceObjective $ServiceObjective –Edition $Edition -Force

Write-Output “Completed vertical scale”
}else{
Write-Output “The DB is already in the target pricing tier Or DB is currenlty being scale up / down”}

This same approach can be applied for other Azure resources also. Go on and try it out!

Happy Scaling!