Using Model Overlays using .modelfile

Photo by Steve Johnson on Unsplash

When you want to use a model but don’t want to keep initializing it with a specific persona, temperature, and other attributes, you can use the .modelfile Customization Approach.

Step 1: Create a .modelfile as shown below (sys_admin.modelfile)

# 1. THE BASE (Required)
FROM llama3

# 2. BRAIN PHYSICS (Parameters)
PARAMETER temperature 0.7 # Creativity (0.0 to 1.0+)
PARAMETER num_ctx 4096 # How many “tokens” of memory it has
PARAMETER top_k 40 # Limits the “vocabulary” pool for each word
PARAMETER top_p 0.9 # Probability threshold for word choice
PARAMETER repeat_penalty 1.1 # Prevents the AI from getting stuck in a loop
PARAMETER stop “User:” # Tells the AI exactly when to stop talking
PARAMETER stop “—”

# 3. THE TEMPLATE (The “Skeleton” of a conversation)
# This defines how the model sees the Turn-taking between User and AI.
TEMPLATE “”"{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

# 4. (System Instructions)
SYSTEM """
You are a specialized Azure Networking Assistant and System Administrator with plenty of experience.
You provide CLI commands for Linux Mint and PowerShell for Windows.
Constraints:
1. If a config is insecure, call it out immediately.
"""

# 5. PRE-LOADING (The “Conversation Starter”)
# You can bake in a “fake” memory so the model thinks it’s already talking to you.
# [OPTIONAL] ADAPTER ~/models/my-adapter # (for actual fine-tuned weights)
MESSAGE user “Check the S2S status.”
MESSAGE assistant “checking the IPsec tunnels now. One moment.”

Step 2: Create an overlay on top of existing model

Once the .modelfile is ready, pick one of you exisiting models and create a new overlay like so -

ollama create my-new-overlay-sysadmin -f ./sys_admin.modelfile

Step 3: Create an alias for easy use

To make it “instant” so you don’t have to type long commands, you add an alias to your .bashrc file. This is the bridge between your OS and the AI.

Open your config: nano ~/.bashrc
Add this line at the bottom: alias summon-admin=’ollama run my-new-overlay-sysadmin’
Save and refresh: source ~/.bashrc

How it works in practice

Now, whenever you are looking at a messy config file on your machine, you just pipe the text to your new friend:

cat /etc/ssh/sshd_config | summon-admin

The model will wake up, read the file, and start grumbling about your security choices.

How is this different from prompt engineering

1. Hardware & Environment Parameters

Prompt engineering cannot change how the computer actually runs the model. A .modelfile can.

Parameter Tuning: You set things like PARAMETER temperature 0.2 (for consistency) or PARAMETER num_ctx 4096 (how much “memory” it has for your config files).
Stop Sequences: You can tell the model exactly when to stop talking (e.g., PARAMETER stop "User:"), preventing it from rambling.

2. The “Persona” vs. The “Ask”

Prompt Engineering: You have to tell the model every time: “Act like a sys admin and check this file…”
Modelfile (The Base-Overlay): The persona is “baked in.” when you launch your “SysAdmin” model.

3. Layered Inheritance (The “FROM” command)

This is the part that is impossible with just prompting.

In a .modelfile, the first line is usually FROM llama3(or any model that you use). This is Inheritance.

Step 1: Create a .modelfile as shown below (sys_admin.modelfile)#

Step 2: Create an overlay on top of existing model#

Step 3: Create an alias for easy use#

How it works in practice#

How is this different from prompt engineering#

1. Hardware & Environment Parameters#

2. The “Persona” vs. The “Ask”#

3. Layered Inheritance (The “FROM” command)#