From Zero to Stateful: Building a PowerShell Chat Completion with Microsoft Foundry

alt text

Did you know that large language models don’t actually remember your conversations?

Every time you chat with a LLM, it feels continuous and stateful but under the hood, each response is generated from scratch. Any “memory” you experience is carefully reconstructed by the application.

In this post, I’ll walk you through building a PowerShell chat completion using Microsoft Foundry, and we’ll unpack exactly how chat history works, where it’s stored, and why the LLM remains fundamentally stateless.

What We Are Building?

We’ll use an LLM deployed in Microsoft Foundry and interact with it through a PowerShell script to create a simple chat experience. This example will clearly demonstrate how conversation history is managed outside the model and passed explicitly with each request.

Here is how the end result will look like:

Let’s dive into the architecture.

Architecture

Diagram

Here is the flow of operations:

User Send a request for Authentication Token to EntraID
User send a Prompt to LLM Foundry Endpoint
Intelligence Routing (Model): A Model is actually a router that going to route your prompt to the least busy available Model fleet at Microsoft Cluster.
Model fleet (Standard/Global/etc): Here is where the magic happens, Model fleet will inference the prompt and output the reponse to you.

Prerequisites

Visual Studio Code
Azure CLI
An Azure subscription with appropriate permissions

The Stack

Microsoft Foundry

Phase 1 Deployment: Microsoft Foundry

In this phase, we will provision the "brain" (Microsoft Foundry) and configure chat completion authentication.

1. Create Foundry Resource (The Backend)

Create new Resource Group called rg-foundry-xxxxx. Recommended the location to be in East US or East US2. Because that's where the Model fleet is available the most.
Create new Microsoft Foundry Resource

Resourc name :foundry-eus-xxxxx
Region: East US2 or the same as Resource Group location
Default project name: proj-xxxxx

alt text

2. Role Assignment

Go to Foundry Resource just created > Access control (IAM) > Role assignments
Click +Add > Add role assignment
Search for Cognitive Services OpenAI User > Click Next
Click +Select members > Select Your User Account > Click Review + assign

Note:

Now your user accounta has permission to Create a completion for the chat message from a chosen model.

3. Deploy LLM (Large Language Model)

Go to Foundry Resource > Overview > Go to Foundry Portal
Make sure you have New Foundry toggle on. This will bring you to the New Foundry Portal
Click Build
On the left panel: Select Models > Deploy a base model
You will see that Foundry has more than 11K models available. Search for gpt-4o-mini and select it.
Click Deploy > Custom settings > Deploy

Deployment type:

There are several deployment types, by default it will create Global Standard. If you want to know more about other options. Deployment types for Microsoft Foundry Models

4. Endpoint

On the left panel: Select Models
Click on tab Details next to Playground tab
Copy Target URI

Endpoint & Key:

This endpoint is the target URI that we will do the REST API call to. There are options for authentication to consume the API. Either with the Key or with RBAC. In this one, we will use RBAC, so we won't do anything with the Key.

In Phase 1, you have successfully created Microsoft Foundry Resource, configured Permission for your account and deploy Language model. With his in place, you are now ready to begin building and testing chat interactions.

Phase 2: PowerShell Chat Completion (The FrontEnd)

In Phase 2, we’ll use PowerShell to interact with the language model deployed earlier.

1. Basic Chat Completion API Call

Here is the plan.

First we create a PowerShell script to send Prompt to LLM. Prompt will be contained 2 messages in JSON format.
1. System message [Pre-defined in the script]
2. User message [User input]
In the script we will use Azure CLI to authenticate to Azure and aquire the Authentication token.
Prepare API Payload
API Call to Foundry Model Endpoint

GPTCompletion.ps1

    # 1. Foundry Endpoint and model
    $endpoint = "[TARGET URI FROM PREVIOUS STEP]"

    # 2. Initialize System Prompt
        $messages = @(
            @{ role = "system"; content = "You are a helpful assistant." }
        )


    # 3. Get User Input
    $userInput = Read-Host "`nYou"
    if ([string]::IsNullOrWhiteSpace($userInput)) { return }

    # Append User Message
    $messages += @{ role = "user"; content = $userInput }

    # 4. Get Auth Token (Using Azure CLI)
    $tokenResponse = az account get-access-token --resource https://cognitiveservices.azure.com/ | ConvertFrom-Json
    $headers = @{
        "Authorization" = "Bearer $($tokenResponse.accessToken)"
        "Content-Type"  = "application/json"
    }

    # 5. Prepare Payload
    $body = @{
        messages = $messages
        max_tokens = 800
        temperature = 0.7
    } | ConvertTo-Json -Depth 10

    # 6. Call API
    try {
        $response = Invoke-RestMethod -Uri $endpoint -Method Post -Headers $headers -Body $body
        $GPTReply = $response.choices[0].message.content

        # Display Reply
        Write-Host "`GPT: " -NoNewline -ForegroundColor Green
        Write-Host $GPTReply
    }
    catch {
        Write-Error "Request failed: $_"
    }

This is what it's like what you trying to run it. alt text

What we learn:

An LLM has no memory of past conversations. Each response is generated fresh, based only on the text you send in that request. If you don’t send it again, it doesn’t “remember” it.

2. Chat Completion with memory (Session-Based Memory)

Basically, LLMs don’t remember conversations. Applications remember conversations and remind the LLM every time. So we will build a conversation memory and send to LLM.

alt text

Let's build the memory

Modify the section # 2. Initialize System Prompt to the following

############################
# 2. Initialize or Load History
if (Test-Path $historyFile) {
    Write-Host "Loading existing conversation history..." -ForegroundColor Cyan
    $messages = Get-Content $historyFile | ConvertFrom-Json
} else {
    Write-Host "Starting a new conversation..." -ForegroundColor Cyan
    $messages = @(
        @{ role = "system"; content = "You are a helpful assistant." }
    )
}
############################

Adding this section just after Show GPT Reply

############################
# 7. Update History and Save to File
$messages += @{ role = "assistant"; content = $GPTReply }
$messages | ConvertTo-Json -Depth 10 | Set-Content $historyFile
Write-Host "`n[History saved to $historyFile]" -ForegroundColor Gray
############################

GPTCompletionWithHistory.ps1 (COMPLETE SCRIPT)

# 1. Configuration & Persistence
$historyFile = "chat_history.json"
$endpoint = "[TARGET URI FROM PREVIOUS STEP]"

############################
# 2. Initialize or Load History
if (Test-Path $historyFile) {
    Write-Host "Loading existing conversation history..." -ForegroundColor Cyan
    $messages = Get-Content $historyFile | ConvertFrom-Json
} else {
    Write-Host "Starting a new conversation..." -ForegroundColor Cyan
    $messages = @(
        @{ role = "system"; content = "You are a helpful assistant." }
    )
}
############################

# 3. Get User Input
$userInput = Read-Host "`nYou"
if ([string]::IsNullOrWhiteSpace($userInput)) { return }

# Append User Message
$messages += @{ role = "user"; content = $userInput }

# 4. Get Auth Token (Using Azure CLI)
$tokenResponse = az account get-access-token --resource https://cognitiveservices.azure.com/ | ConvertFrom-Json
$headers = @{
    "Authorization" = "Bearer $($tokenResponse.accessToken)"
    "Content-Type"  = "application/json"
}

# 5. Prepare Payload
$body = @{
    messages = $messages
    max_tokens = 800
    temperature = 0.7
} | ConvertTo-Json -Depth 10

# 6. Call API
try {
    $response = Invoke-RestMethod -Uri $endpoint -Method Post -Headers $headers -Body $body
    $GPTReply = $response.choices[0].message.content

    # Show GPT Reply
    Write-Host "`nGPT: " -NoNewline -ForegroundColor Green
    Write-Host $GPTReply

    ############################
    # 7. Update History and Save to File
    $messages += @{ role = "assistant"; content = $GPTReply }
    $messages | ConvertTo-Json -Depth 10 | Set-Content $historyFile
    Write-Host "`n[History saved to $historyFile]" -ForegroundColor Gray
    ############################
}
catch {
    Write-Error "Request failed: $_"
}

Now try to run the script again and ask the same questions.

alt text

You can see that now, LLM is able to have a continue conversation with us. Let's look into what's in chat_history.json file we created.

[
  {
    "content": "You are a helpful assistant.",
    "role": "system"
  },
  {
    "content": "Do you know Google Gemini ?",
    "role": "user"
  },
  {
    "content": "Yes, Google Gemini is an AI model developed by Google that is designed for various applications in natural language processing, image understanding, and other AI-related tasks. It was introduced as part of Google's efforts to enhance its AI capabilities and compete with other advanced models in the industry. Gemini aims to integrate multimodal capabilities, meaning it can process and understand both text and images, making it versatile for a range of applications.\n\nIf you have specific questions about Google Gemini or its functionalities, feel free to ask!",
    "role": "assistant"
  },
  {
    "content": "How much is it ?",
    "role": "user"
  },
  {
    "content": "As of my last update in October 2023, Google had not publicly disclosed specific pricing details for using Google Gemini. The cost for access to AI models like Gemini can vary based on factors such as the level of service, usage volume, and whether it is integrated into existing Google Cloud services.\n\nFor the most accurate and up-to-date information on pricing, I recommend checking Google's official website or contacting their sales team directly. They can provide details tailored to your needs and usage scenarios.",
    "role": "assistant"
  }
]

Gotcha! We send the entire conversation history to the LLM in JSON format with every request, which creates the illusion that the model has memory. In reality, the model only sees the text included in the prompt for that single inference.

As the conversation grows longer, the prompt grows with it. This directly increases token consumption, which leads to:

Higher costs
Slower response times
Hitting context window limits

So the natural question is What do we do about it?

3. Common Strategies to Control Token Usage

Truncate Old Messages - Keep only the most recent messages in the prompt
Summarize Conversation History - Replace older messages with a concise summary
Store Long-Term Context Outside the Prompt - Persist important information externally (DB, cache, vector store)

In this post, we will do the first one. Trucate Old Messages

We need to make sure we have a maximum number of chat history and we need to keep the system prompt, so the way it's responding will be consistent.

How are we going to do that? We could trim chat history to the last 10 messages and we need a logic to keep System Prompt [0]. So we grab only the latest 9 messages.

############################
# 2.1 Trim History to Last 10 Messages
$maxHistory = 10
if ($messages.Count -gt $maxHistory) {
    # Keep the System Prompt [0] It is important for conversation moving forward
    # Then grab only the latest 9 messages
    $messages = @($messages[0]) + @($messages[-($maxHistory - 1)..-1])
}
############################

GPTCompletionWithMaxHistory.ps1 (COMPLETE SCRIPT)

# 1. Configuration & Persistence
$historyFile = "chat_history.json"
$endpoint = "[TARGET URI FROM PREVIOUS STEP]"


# 2. Initialize or Load History
if (Test-Path $historyFile) {
    Write-Host "Loading existing conversation history..." -ForegroundColor Cyan
    $messages = Get-Content $historyFile | ConvertFrom-Json
} else {
    Write-Host "Starting a new conversation..." -ForegroundColor Cyan
    $messages = @(
        @{ role = "system"; content = "You are a helpful assistant." }
    )
}


############################
# 2.1 Trim History to Last 10 Messages
$maxHistory = 5
if ($messages.Count -gt $maxHistory) {
    # Keep the System Prompt [0] It is important for conversation moving forward
    # Then grab only the latest 9 messages
    $messages = @($messages[0]) + @($messages[-($maxHistory - 1)..-1])
}
############################


# 3. Get User Input
$userInput = Read-Host "`nYou"
if ([string]::IsNullOrWhiteSpace($userInput)) { return }

# Append User Message
$messages += @{ role = "user"; content = $userInput }

# 4. Get Auth Token (Using Azure CLI)
$tokenResponse = az account get-access-token --resource https://cognitiveservices.azure.com/ | ConvertFrom-Json
$headers = @{
    "Authorization" = "Bearer $($tokenResponse.accessToken)"
    "Content-Type"  = "application/json"
}

# 5. Prepare Payload
$body = @{
    messages = $messages
    max_tokens = 800
    temperature = 0.7
} | ConvertTo-Json -Depth 10

# 6. Call API
try {
    $response = Invoke-RestMethod -Uri $endpoint -Method Post -Headers $headers -Body $body
    $GPTReply = $response.choices[0].message.content

    # Show Assistant Reply
    Write-Host "`nGPT: " -NoNewline -ForegroundColor Green
    Write-Host $GPTReply

    # 7. Update History and Save to File
    $messages += @{ role = "assistant"; content = $GPTReply }
    $messages | ConvertTo-Json -Depth 10 | Set-Content $historyFile
    Write-Host "`n[History saved to $historyFile]" -ForegroundColor Gray

}
catch {
    Write-Error "Request failed: $_"
}

Conclusion

LLMs don’t remember conversations, applications do.

By understanding how chat history is constructed and sent to the model, you can build AI solutions that are predictable, efficient, and cost-aware. With PowerShell and Microsoft Foundry, even simple scripts can demonstrate the core architectural patterns behind modern AI chat systems.