Simple Decision Making
Intelligent Decision Systems
Create AI that makes smart choices! Learn utility-based AI, decision trees, fuzzy logic, and goal-oriented action planning (GOAP) to build NPCs that adapt to changing situations! 🧠💡🎯
Understanding Decision Making
🤔 The Decision Process Analogy
Think of AI decision-making like choosing what to have for lunch:
- Options: Available choices (pizza, salad, sandwich)
- Criteria: What matters (taste, health, cost, time)
- Weights: How much each criterion matters
- Scoring: Evaluate each option
- Selection: Pick the best option
- Adaptation: Learn from outcomes
Decision Making Implementation
import math
import random
from enum import Enum
from typing import Any, Dict, List, Optional, Tuple
class UtilityAI:
"""Utility-based AI decision system"""
def __init__(self, agent: Any) -> None:
self.agent: Any = agent
self.actions: list[dict[str, Any]] = []
self.setup_actions()
def setup_actions(self) -> None:
"""Define available actions with scoring functions"""
self.actions = [
{
'name': 'eat',
'score_func': self.score_eat,
'requirements': lambda: self.agent.find_food() is not None
},
{
'name': 'fight',
'score_func': self.score_fight,
'requirements': lambda: self.agent.find_enemy() is not None
},
{
'name': 'flee',
'score_func': self.score_flee,
'requirements': lambda: self.agent.find_threat() is not None
},
{
'name': 'explore',
'score_func': self.score_explore,
'requirements': lambda: True
}
]
def decide(self) -> str:
"""Select best action based on utility scores"""
best_action = None
best_score = float('-inf')
for action in self.actions:
# Check if action is possible
if not action['requirements']():
continue
# Calculate score
score = action['score_func']()
# Apply personality modifiers
score *= self.get_personality_modifier(action['name'])
if score > best_score:
best_score = score
best_action = action['name']
return best_action or 'idle'
# Response curves
def linear(self, value: float, min_val: float = 0, max_val: float = 1) -> float:
"""Linear response curve"""
return max(min_val, min(max_val, value))
def quadratic(self, value: float) -> float:
"""Quadratic response curve"""
return value * value
def exponential(self, value: float, power: float = 2) -> float:
"""Exponential response curve"""
return math.pow(value, power)
def sigmoid(self, value: float, steepness: float = 10, center: float = 0.5) -> float:
"""Sigmoid response curve"""
return 1 / (1 + math.exp(-steepness * (value - center)))
def gaussian(self, value: float, center: float = 0.5, width: float = 0.2) -> float:
"""Gaussian (bell curve) response"""
return math.exp(-math.pow(value - center, 2) / (2 * width * width))
# Scoring functions
def score_eat(self) -> float:
"""Score eating action"""
hunger = self.agent.needs['hunger']
food = self.agent.find_food()
if not food:
return 0
distance = self.agent.distance_to(food)
distance_score = 1 - (distance / 500)
# High hunger = high score
hunger_score = self.exponential(hunger, 3)
return hunger_score * distance_score
def score_fight(self) -> float:
"""Score fighting action"""
enemy = self.agent.find_enemy()
if not enemy:
return 0
health = self.agent.needs['health']
strength = self.agent.attributes['strength']
# Consider health and strength
combat_score = self.sigmoid(health) * strength
# Consider enemy threat level
threat_score = enemy.threat_level * 0.5
return combat_score - threat_score
def score_flee(self) -> float:
"""Score fleeing action"""
threat = self.agent.find_threat()
if not threat:
return 0
health = self.agent.needs['health']
fear = 1 - self.agent.needs['safety']
# Low health + high fear = flee
flee_score = self.exponential(fear) * (1 - health)
# Distance to threat
distance = self.agent.distance_to(threat)
urgency = 1 - (distance / 200)
return flee_score * urgency
def score_explore(self) -> float:
"""Score exploration action"""
curiosity = self.agent.personality['curiosity']
boredom = self.agent.needs['stimulation']
safety = self.agent.needs['safety']
# High curiosity + boredom + safety = explore
return curiosity * boredom * safety * 0.5
def get_personality_modifier(self, action: str) -> float:
"""Apply personality-based modifiers"""
personality = self.agent.personality
modifiers = {
'fight': personality.get('aggression', 0.5),
'flee': personality.get('caution', 0.5),
'explore': personality.get('curiosity', 0.5),
'eat': 1.0 # No personality modifier
}
return modifiers.get(action, 1.0)
score_* functions. The dashed line at input = 0.7 shows the same input value producing very different output scores depending on the curve — sigmoid behaves like a threshold (snap to high near the center), exponential delays the response (high values dominate late), and gaussian peaks at the center then falls off both sides. Tuning a utility AI is largely about picking the curve whose shape matches how the agent should react to that need.Best Practices
⚡ Decision Making Tips
- Response Curves: Use different curves for different behaviors
- Weight Tuning: Expose weights for designer control
- Context Awareness: Consider environment in decisions
- Personality Traits: Add variety to NPCs
- Action Validation: Check if actions are possible
- Hysteresis: Prevent rapid switching between actions
- Debugging Tools: Visualize decision scores
- Performance: Cache calculations when possible
- Emergent Behavior: Simple rules create complex behaviors
Key Takeaways
- 🧠 Utility AI provides flexible, designer-friendly systems
- 📈 Response curves shape behavior naturally
- 🎯 GOAP creates goal-driven intelligent agents
- 🔀 Fuzzy logic handles uncertainty gracefully
- 🌳 Decision trees offer clear, debuggable logic
- ⚖️ Weighted scoring balances multiple concerns
- 🎭 Personality traits create unique NPCs
- 📊 Visualization helps debug and tune AI
🏋️♂️ Practice Exercise
🏋️♂️ Exercise 1: Utility AI Agent with Response Curves + Personality Modifiers + Requirements Gating in One Pygame Window
Objective: Build a runnable pygame window in roughly 90 lines that shows three orthogonal utility-AI disciplines visible per frame on a 768×480 play area plus a 320px score-bar sidebar (1088×480 total). A single agent with four needs (hunger, health, safety, stim) and three personality traits (aggression, caution, curiosity) considers four candidate actions each tick — eat / fight / flee / explore — and picks the action with the highest score via a single decide() loop: if not action.requirements(): continue; score = action.score_func(agent) * personality_modifier(action); if score > best: best = action. (a) Utility-scored runtime weighted choice: unlike a behavior tree whose Selector tries children in fixed top-down child-list order, decide() ranks ALL viable options by a numeric score every tick, so the same agent in the same world picks differently as needs drift — an agent at hunger=0.3 might explore, while at hunger=0.8 the same agent prefers eat even with food much further away because eat’s score now exceeds explore’s. (b) Response curves shape behavior: keys 1/2/3/4/5 swap the curve used by score_eat (linear / quadratic / exponential / sigmoid / gaussian) so the SAME hunger value (e.g., 0.7) produces visibly different eat scores per curve — sigmoid behaves like a soft threshold (snap to high near center=0.5), exponential delays the response (high values dominate late), gaussian peaks at center then falls off both sides; the sidebar bar for eat changes length without any other state changing. (c) Personality modifiers as multiplicative weights: keys A/Z adjust aggression, C/X adjust caution; the same world state and same scoring functions produce a different chosen action per personality (a high-aggression agent fights an enemy that a low-aggression agent flees from), demonstrating that personality lives entirely in the data dict and never touches the scoring code. Keys F/E/T spawn food (gold) / enemy (red) / threat (orange) at random positions. Sidebar shows live bar chart of all four scores per tick with the chosen action highlighted at full color and the rest dimmed; needs drift over time (hunger climbs +0.0008/tick, stim falls −0.0005/tick, safety drops while threats present); HUD shows tick count, chosen action, all four needs, active hunger curve, and aggression value — three orthogonal utility-AI disciplines visible per frame as concrete bar lengths and score values.
Instructions:
- Initialize pygame with a 768×480 play area + 320px sidebar (1088×480 total). Create an
agentobject at center (W/2, H/2) withneeds = {'hunger': 0.4, 'health': 1.0, 'safety': 1.0, 'stim': 0.5}andpersonality = {'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5}; create aworldobject with emptyfoods,enemies,threatslists. - Define five response-curve functions as standalone
defs:linear(v)= clamp 0..1;quadratic(v)=v*v;exponential(v, p=3)=v**p;sigmoid(v, k=10, c=0.5)=1/(1+exp(−k*(v−c)));gaussian(v, c=0.5, w=0.2)=exp(−(v−c)**2/(2*w*w)). Store them in a listCURVES = [('linear', linear), ('quadratic', quadratic), ...]indexed byhunger_curve_i(default 3 = sigmoid). Same input value, different output shapes — that’s the entire point. - Define four
can_X(agent)requirement functions returningTrue/False(can_eatneeds food in world;can_fightneeds enemy;can_fleeneeds threat;can_explorealways returnsTrue) and fourscore_X(agent)functions returning a float in roughly0..1.score_eatappliesCURVES[hunger_curve_i][1](agent.needs['hunger'])times a distance-decay factor1 − min(1.0, dist_to_food / 500);score_fight=sigmoid(agent.needs['health']);score_flee=exponential(1 − agent.needs['safety']) * (1 − agent.needs['health'] * 0.5);score_explore=(1 − agent.needs['stim']) * agent.needs['safety'] * 0.5. - Define
decide(agent)as the single utility-AI loop. IterateACTIONS = [(name, req_fn, score_fn), ...]: ifreq_fn(agent)returnsFalse, storescores[name] = Noneandcontinue(Layer-1 binary gate — this is what prevents the agent from even considering eat when there’s no food); otherwise computes = score_fn(agent) * personality_mod(name)wherepersonality_modlooks upagent.personality[trait]per action via aPMOD = {'eat': None, 'fight': 'aggression', 'flee': 'caution', 'explore': 'curiosity'}table (Layer-2 continuous score). Trackbest_scoreandbest_action; return both the chosen action and the fullscoresdict so the sidebar can render the bar chart for ALL viable actions, not just the winner. - Define
take_action(agent, name):eatmoves toward nearest food at 3 px/tick and consumes when within 12 px (setshunger=0.0);fightmoves toward nearest enemy at 2.5 px/tick and removes when within 20 px (decrements health by 0.005/tick while engaged);fleemoves AWAY from nearest threat at 4 px/tick (regenerates safety +0.01/tick while moving);explorejitters the agent ±2 px on each axis (regenerates stim +0.005/tick). - Define
tick_needs(agent): hunger climbs +0.0008/tick (slow drift toward starvation), stim falls −0.0005/tick (boredom builds), safety drops −0.002/tick while threats exist in the world (otherwise stable). Clampagent.posto play-area bounds. This is what makes the chosen action change over time even with no key presses — needs drift, scores drift, the max-score winner shifts. - Main loop: keys F/E/T spawn food/enemy/threat at random play-area positions; keys 1/2/3/4/5 switch
hunger_curve_i; A/Z adjust aggression by ±0.1 (clamped 0..1); C/X adjust caution by ±0.1; R resets agent + world; SPACE pauses. Each unpaused frame callaction, scores = decide(agent), thentake_action(agent, action), thentick_needs(agent), thentick_n += 1. - Render play area: food as gold 8-px circles, enemies as red 10-px circles, threats as orange 12-px circles, agent as blue 12-px circle. Render the sidebar with one row per action: action name + a horizontal bar whose width equals
int(score * 200)with the chosen action’s bar at FULL color and non-chosen bars at half-brightness; show numeric score to the right of each bar; if the action is gated out by requirements (scores[name] is None), show “REQ FAIL” instead of a bar. - HUD line at top of game area:
tick=N | action=NAME | hung=H | hp=H | safe=S | stim=S | curve=NAME | aggr=A. Watching the eat bar length change WITHOUT changing hunger (just by pressing 1/2/3/4/5) is the response-curve discipline in action; watching the fight bar grow as you press A repeatedly (without enemies dying or health changing) is the personality-modifier discipline; watching the chosen action flip from explore to eat as hunger drifts up over time is the utility-scored runtime weighted choice in action. - Add a sidebar legend at the bottom showing “CHOSEN: NAME” in green plus a key reminder. Watching the chosen action flicker between two near-tied options (e.g., eat=0.42 vs explore=0.41) reveals one classic utility-AI failure mode — action thrashing — which production utility AIs solve with hysteresis (a small bonus added to the currently-chosen action’s score so it has to be clearly beaten to lose, not just barely beaten); the lesson’s “Decision Making Tips” section lists hysteresis explicitly as a recommended add-on, but the demo intentionally omits it so the thrashing failure mode is visible.
💡 Hint
The single most important detail is keeping the three disciplines orthogonal in the code so you can see each one work in isolation. The response-curve choice for hunger lives entirely in CURVES[hunger_curve_i][1] applied inside score_eat — nothing else changes when you press 1/2/3/4/5, so the eat bar’s length change is purely the curve’s output shape. The personality modifier lives entirely in personality_mod(name) as a multiplicative factor outside the per-action score function — nothing inside score_fight reads the aggression value directly, so you can swap aggression from 0.2 to 0.9 and watch the fight bar grow proportionally without any scoring-function code running differently. The requirements gate lives entirely in the if not req_fn(agent): continue line in decide() — it’s a Layer-1 binary check that prevents an action from ever entering the score comparison, so “REQ FAIL” in the sidebar means the action wasn’t even considered (not that it scored badly). Keep scores as a dict that holds None for gated-out actions and the float score for considered actions, so the sidebar render can distinguish “not even considered” from “considered and lost.” For action thrashing, leave hysteresis OUT of decide() on the first pass so the failure mode is visible; if you want to add it as a stretch goal, store last_action on the module level and add a small bonus (e.g., +0.05) to that action’s score before the max comparison — the same ad-hoc bias the lesson’s “Decision Making Tips” section recommends.
✅ Example Solution
import pygame, sys, math, random
from typing import Any, Optional
pygame.init()
W, H, SIDEBAR = 768, 480, 320
screen = pygame.display.set_mode((W + SIDEBAR, H))
font = pygame.font.SysFont('Consolas', 13)
clock = pygame.time.Clock()
def linear(v: float) -> float: return max(0.0, min(1.0, v))
def quadratic(v: float) -> float: return v * v
def exponential(v: float, p: float = 3) -> float: return math.pow(max(0.0, v), p)
def sigmoid(v: float, k: float = 10, c: float = 0.5) -> float: return 1 / (1 + math.exp(-k * (v - c)))
def gaussian(v: float, c: float = 0.5, w: float = 0.2) -> float: return math.exp(-((v - c) ** 2) / (2 * w * w))
CURVES = [('linear', linear), ('quadratic', quadratic), ('exponential', exponential),
('sigmoid', sigmoid), ('gaussian', gaussian)]
hunger_curve_i = 3 # default sigmoid
agent = type('Ag', (), {})()
agent.pos = [W // 2, H // 2]
agent.needs = {'hunger': 0.4, 'health': 1.0, 'safety': 1.0, 'stim': 0.5}
agent.personality = {'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5}
world = type('Wd', (), {})(); world.foods, world.enemies, world.threats = [], [], []
def dist_to(a: Any, p: tuple[int, int]) -> float: return math.hypot(a.pos[0]-p[0], a.pos[1]-p[1])
def nearest(a: Any, things: list[tuple[int, int]]) -> Optional[tuple[int, int]]: return min(things, key=lambda t: dist_to(a, t)) if things else None
def can_eat(a: Any) -> bool: return bool(world.foods)
def can_fight(a: Any) -> bool: return bool(world.enemies)
def can_flee(a: Any) -> bool: return bool(world.threats)
def can_explore(a: Any) -> bool: return True
def score_eat(a: Any) -> float:
f = nearest(a, world.foods)
if not f: return 0.0
d_score = 1 - min(1.0, dist_to(a, f) / 500)
h_score = CURVES[hunger_curve_i][1](a.needs['hunger'])
return h_score * d_score
def score_fight(a: Any) -> float:
if not world.enemies: return 0.0
return sigmoid(a.needs['health'])
def score_flee(a: Any) -> float:
if not world.threats: return 0.0
return exponential(1 - a.needs['safety']) * (1 - a.needs['health'] * 0.5)
def score_explore(a: Any) -> float:
return (1 - a.needs['stim']) * a.needs['safety'] * 0.5
ACTIONS = [('eat', can_eat, score_eat), ('fight', can_fight, score_fight),
('flee', can_flee, score_flee), ('explore', can_explore, score_explore)]
PMOD = {'eat': None, 'fight': 'aggression', 'flee': 'caution', 'explore': 'curiosity'}
def personality_mod(name: str) -> float: return 1.0 if PMOD[name] is None else agent.personality[PMOD[name]]
def decide(a: Any) -> tuple[str, dict[str, Optional[float]]]:
best, best_s, scores = 'idle', float('-inf'), {}
for name, req, sf in ACTIONS:
if not req(a): scores[name] = None; continue
s = sf(a) * personality_mod(name)
scores[name] = s
if s > best_s: best_s, best = s, name
return best, scores
def take_action(a: Any, name: str) -> None:
if name == 'eat':
f = nearest(a, world.foods)
if f:
if dist_to(a, f) < 12: world.foods.remove(f); a.needs['hunger'] = 0.0
else:
dx, dy = f[0]-a.pos[0], f[1]-a.pos[1]; d = math.hypot(dx, dy) or 1
a.pos[0] += dx/d * 3; a.pos[1] += dy/d * 3
elif name == 'fight':
e = nearest(a, world.enemies)
if e:
if dist_to(a, e) < 20: world.enemies.remove(e)
else:
dx, dy = e[0]-a.pos[0], e[1]-a.pos[1]; d = math.hypot(dx, dy) or 1
a.pos[0] += dx/d * 2.5; a.pos[1] += dy/d * 2.5
a.needs['health'] = max(0.0, a.needs['health'] - 0.005)
elif name == 'flee':
t = nearest(a, world.threats)
if t:
dx, dy = a.pos[0]-t[0], a.pos[1]-t[1]; d = math.hypot(dx, dy) or 1
a.pos[0] += dx/d * 4; a.pos[1] += dy/d * 4
a.needs['safety'] = min(1.0, a.needs['safety'] + 0.01)
elif name == 'explore':
a.pos[0] += random.choice([-2, 2]); a.pos[1] += random.choice([-2, 2])
a.needs['stim'] = min(1.0, a.needs['stim'] + 0.005)
def tick_needs(a: Any) -> None:
a.needs['hunger'] = min(1.0, a.needs['hunger'] + 0.0008)
a.needs['stim'] = max(0.0, a.needs['stim'] - 0.0005)
if world.threats: a.needs['safety'] = max(0.0, a.needs['safety'] - 0.002)
a.pos[0] = max(8, min(W-8, a.pos[0])); a.pos[1] = max(8, min(H-8, a.pos[1]))
tick_n, paused = 0, False
COL_BAR = {'eat': (255,215,0), 'fight': (244,67,54), 'flee': (255,152,0), 'explore': (156,39,176)}
while True:
for ev in pygame.event.get():
if ev.type == pygame.QUIT: pygame.quit(); sys.exit()
if ev.type == pygame.KEYDOWN:
if ev.key == pygame.K_f: world.foods.append((random.randint(40, W-40), random.randint(40, H-40)))
if ev.key == pygame.K_e: world.enemies.append((random.randint(40, W-40), random.randint(40, H-40)))
if ev.key == pygame.K_t: world.threats.append((random.randint(40, W-40), random.randint(40, H-40)))
if pygame.K_1 <= ev.key <= pygame.K_5: hunger_curve_i = ev.key - pygame.K_1
if ev.key == pygame.K_a: agent.personality['aggression'] = min(1.0, agent.personality['aggression'] + 0.1)
if ev.key == pygame.K_z: agent.personality['aggression'] = max(0.0, agent.personality['aggression'] - 0.1)
if ev.key == pygame.K_c: agent.personality['caution'] = min(1.0, agent.personality['caution'] + 0.1)
if ev.key == pygame.K_x: agent.personality['caution'] = max(0.0, agent.personality['caution'] - 0.1)
if ev.key == pygame.K_r:
agent.pos = [W//2, H//2]; agent.needs = {'hunger':0.4,'health':1.0,'safety':1.0,'stim':0.5}
world.foods, world.enemies, world.threats = [], [], []
if ev.key == pygame.K_SPACE: paused = not paused
action, scores = decide(agent)
if not paused: take_action(agent, action); tick_needs(agent); tick_n += 1
screen.fill((44,62,80))
for f in world.foods: pygame.draw.circle(screen, (255,215,0), (int(f[0]), int(f[1])), 8)
for e in world.enemies: pygame.draw.circle(screen, (244,67,54), (int(e[0]), int(e[1])), 10)
for t in world.threats: pygame.draw.circle(screen, (255,152,0), (int(t[0]), int(t[1])), 12)
pygame.draw.circle(screen, (33,150,243), (int(agent.pos[0]), int(agent.pos[1])), 12)
hud = f"tick={tick_n} action={action} hung={agent.needs['hunger']:.2f} hp={agent.needs['health']:.2f} safe={agent.needs['safety']:.2f} stim={agent.needs['stim']:.2f} curve={CURVES[hunger_curve_i][0]} aggr={agent.personality['aggression']:.1f}"
screen.blit(font.render(hud, True, (255,255,255)), (8, 8))
pygame.draw.rect(screen, (30,30,30), (W, 0, SIDEBAR, H))
screen.blit(font.render('Utility scores per action:', True, (200,200,200)), (W+10, 20))
for i, (name, _, _) in enumerate(ACTIONS):
s = scores.get(name); y = 50 + i * 50
screen.blit(font.render(f'{name}:', True, (220,220,220)), (W+10, y))
if s is None:
screen.blit(font.render('REQ FAIL', True, (160,160,160)), (W+90, y))
else:
col = COL_BAR[name]
dim = col if name == action else (col[0]//2, col[1]//2, col[2]//2)
pygame.draw.rect(screen, dim, (W+90, y, max(1, int(s * 200)), 18))
screen.blit(font.render(f'{s:.3f}', True, (255,255,255)), (W+295, y))
screen.blit(font.render(f'CHOSEN: {action}', True, (76,175,80)), (W+10, 280))
screen.blit(font.render('F=food E=enemy T=threat', True, (180,180,180)), (W+10, 320))
screen.blit(font.render('1-5=hunger curve', True, (180,180,180)), (W+10, 340))
screen.blit(font.render('A/Z=aggr C/X=caution', True, (180,180,180)), (W+10, 360))
screen.blit(font.render('R=reset SPACE=pause', True, (180,180,180)), (W+10, 380))
pygame.display.flip(); clock.tick(60)
🎯 Quick Quiz
Question 1: The lesson’s decide() method iterates all four candidate actions (eat, fight, flee, explore), computes a utility score for each viable action, and picks the action with the highest score. Compare this with a behavior-tree Selector, which tries children in a fixed top-down child-list order and picks the first one that returns SUCCESS. Which statement most accurately captures when each architecture fits and what the practical difference is on a given tick?
Question 2: The lesson’s score_eat function applies CURVES[hunger_curve_i][1](agent.needs['hunger']) — the response curve transforms the raw hunger value into a behavioral weight. With the same hunger value of 0.7, the curve choice produces dramatically different scores: linear → 0.70, quadratic → 0.49, exponential (cubed) → 0.34, sigmoid (k=10, c=0.5) → ~0.88, gaussian (c=0.5, w=0.2) → ~0.61. Why does designer-controlled curve choice matter as much as the raw input value?
Question 3: The lesson’s decide() loop multiplies each action’s raw score by a personality modifier looked up from the agent’s personality dict ({'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5} by default; fight uses aggression, flee uses caution, explore uses curiosity, eat uses no modifier). The same world state + same scoring functions produce a different chosen action depending on the personality dict — a high-aggression agent (aggression: 0.9) fights an enemy that a low-aggression agent (aggression: 0.1) flees from. What does this externalization-as-data-dict buy you?
What's Next?
Congratulations! You've completed the AI for Games section! Next, we'll dive into Networking & Multiplayer to create connected gaming experiences!