Skip to main content

Simple Decision Making

Intelligent Decision Systems

Create AI that makes smart choices! Learn utility-based AI, decision trees, fuzzy logic, and goal-oriented action planning (GOAP) to build NPCs that adapt to changing situations! 🧠💡🎯

Understanding Decision Making

🤔 The Decision Process Analogy

Think of AI decision-making like choosing what to have for lunch:

Decision Making Implementation

import math
import random
from enum import Enum
from typing import Any, Dict, List, Optional, Tuple

class UtilityAI:
    """Utility-based AI decision system"""
    def __init__(self, agent: Any) -> None:
        self.agent: Any = agent
        self.actions: list[dict[str, Any]] = []
        self.setup_actions()
    
    def setup_actions(self) -> None:
        """Define available actions with scoring functions"""
        self.actions = [
            {
                'name': 'eat',
                'score_func': self.score_eat,
                'requirements': lambda: self.agent.find_food() is not None
            },
            {
                'name': 'fight',
                'score_func': self.score_fight,
                'requirements': lambda: self.agent.find_enemy() is not None
            },
            {
                'name': 'flee',
                'score_func': self.score_flee,
                'requirements': lambda: self.agent.find_threat() is not None
            },
            {
                'name': 'explore',
                'score_func': self.score_explore,
                'requirements': lambda: True
            }
        ]
    
    def decide(self) -> str:
        """Select best action based on utility scores"""
        best_action = None
        best_score = float('-inf')
        
        for action in self.actions:
            # Check if action is possible
            if not action['requirements']():
                continue
            
            # Calculate score
            score = action['score_func']()
            
            # Apply personality modifiers
            score *= self.get_personality_modifier(action['name'])
            
            if score > best_score:
                best_score = score
                best_action = action['name']
        
        return best_action or 'idle'
    
    # Response curves
    def linear(self, value: float, min_val: float = 0, max_val: float = 1) -> float:
        """Linear response curve"""
        return max(min_val, min(max_val, value))
    
    def quadratic(self, value: float) -> float:
        """Quadratic response curve"""
        return value * value
    
    def exponential(self, value: float, power: float = 2) -> float:
        """Exponential response curve"""
        return math.pow(value, power)
    
    def sigmoid(self, value: float, steepness: float = 10, center: float = 0.5) -> float:
        """Sigmoid response curve"""
        return 1 / (1 + math.exp(-steepness * (value - center)))
    
    def gaussian(self, value: float, center: float = 0.5, width: float = 0.2) -> float:
        """Gaussian (bell curve) response"""
        return math.exp(-math.pow(value - center, 2) / (2 * width * width))
    
    # Scoring functions
    def score_eat(self) -> float:
        """Score eating action"""
        hunger = self.agent.needs['hunger']
        food = self.agent.find_food()
        
        if not food:
            return 0
        
        distance = self.agent.distance_to(food)
        distance_score = 1 - (distance / 500)
        
        # High hunger = high score
        hunger_score = self.exponential(hunger, 3)
        
        return hunger_score * distance_score
    
    def score_fight(self) -> float:
        """Score fighting action"""
        enemy = self.agent.find_enemy()
        if not enemy:
            return 0
        
        health = self.agent.needs['health']
        strength = self.agent.attributes['strength']
        
        # Consider health and strength
        combat_score = self.sigmoid(health) * strength
        
        # Consider enemy threat level
        threat_score = enemy.threat_level * 0.5
        
        return combat_score - threat_score
    
    def score_flee(self) -> float:
        """Score fleeing action"""
        threat = self.agent.find_threat()
        if not threat:
            return 0
        
        health = self.agent.needs['health']
        fear = 1 - self.agent.needs['safety']
        
        # Low health + high fear = flee
        flee_score = self.exponential(fear) * (1 - health)
        
        # Distance to threat
        distance = self.agent.distance_to(threat)
        urgency = 1 - (distance / 200)
        
        return flee_score * urgency
    
    def score_explore(self) -> float:
        """Score exploration action"""
        curiosity = self.agent.personality['curiosity']
        boredom = self.agent.needs['stimulation']
        safety = self.agent.needs['safety']
        
        # High curiosity + boredom + safety = explore
        return curiosity * boredom * safety * 0.5
    
    def get_personality_modifier(self, action: str) -> float:
        """Apply personality-based modifiers"""
        personality = self.agent.personality
        
        modifiers = {
            'fight': personality.get('aggression', 0.5),
            'flee': personality.get('caution', 0.5),
            'explore': personality.get('curiosity', 0.5),
            'eat': 1.0  # No personality modifier
        }
        
        return modifiers.get(action, 1.0)
Five normalized response curves overlaid on a 0-to-1 chart: linear (slate diagonal), quadratic (blue), exponential cubed (teal), sigmoid (amber S-curve), and gaussian (red bell). A dashed vertical line at input 0.7 crosses each curve at a different output value: 0.70 linear, 0.49 quadratic, 0.34 exponential, 0.88 sigmoid, 0.61 gaussian. A side panel lists each curve with its formula and the worked output values; a footer ties the curves to the lesson's score_eat (exponential) and score_fight (sigmoid).
Five response curves used by the utility AI's score_* functions. The dashed line at input = 0.7 shows the same input value producing very different output scores depending on the curve — sigmoid behaves like a threshold (snap to high near the center), exponential delays the response (high values dominate late), and gaussian peaks at the center then falls off both sides. Tuning a utility AI is largely about picking the curve whose shape matches how the agent should react to that need.

Best Practices

⚡ Decision Making Tips

Key Takeaways

🏋️‍♂️ Practice Exercise

🏋️‍♂️ Exercise 1: Utility AI Agent with Response Curves + Personality Modifiers + Requirements Gating in One Pygame Window

Objective: Build a runnable pygame window in roughly 90 lines that shows three orthogonal utility-AI disciplines visible per frame on a 768×480 play area plus a 320px score-bar sidebar (1088×480 total). A single agent with four needs (hunger, health, safety, stim) and three personality traits (aggression, caution, curiosity) considers four candidate actions each tick — eat / fight / flee / explore — and picks the action with the highest score via a single decide() loop: if not action.requirements(): continue; score = action.score_func(agent) * personality_modifier(action); if score > best: best = action. (a) Utility-scored runtime weighted choice: unlike a behavior tree whose Selector tries children in fixed top-down child-list order, decide() ranks ALL viable options by a numeric score every tick, so the same agent in the same world picks differently as needs drift — an agent at hunger=0.3 might explore, while at hunger=0.8 the same agent prefers eat even with food much further away because eat’s score now exceeds explore’s. (b) Response curves shape behavior: keys 1/2/3/4/5 swap the curve used by score_eat (linear / quadratic / exponential / sigmoid / gaussian) so the SAME hunger value (e.g., 0.7) produces visibly different eat scores per curve — sigmoid behaves like a soft threshold (snap to high near center=0.5), exponential delays the response (high values dominate late), gaussian peaks at center then falls off both sides; the sidebar bar for eat changes length without any other state changing. (c) Personality modifiers as multiplicative weights: keys A/Z adjust aggression, C/X adjust caution; the same world state and same scoring functions produce a different chosen action per personality (a high-aggression agent fights an enemy that a low-aggression agent flees from), demonstrating that personality lives entirely in the data dict and never touches the scoring code. Keys F/E/T spawn food (gold) / enemy (red) / threat (orange) at random positions. Sidebar shows live bar chart of all four scores per tick with the chosen action highlighted at full color and the rest dimmed; needs drift over time (hunger climbs +0.0008/tick, stim falls −0.0005/tick, safety drops while threats present); HUD shows tick count, chosen action, all four needs, active hunger curve, and aggression value — three orthogonal utility-AI disciplines visible per frame as concrete bar lengths and score values.

Instructions:

  1. Initialize pygame with a 768×480 play area + 320px sidebar (1088×480 total). Create an agent object at center (W/2, H/2) with needs = {'hunger': 0.4, 'health': 1.0, 'safety': 1.0, 'stim': 0.5} and personality = {'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5}; create a world object with empty foods, enemies, threats lists.
  2. Define five response-curve functions as standalone defs: linear(v) = clamp 0..1; quadratic(v) = v*v; exponential(v, p=3) = v**p; sigmoid(v, k=10, c=0.5) = 1/(1+exp(−k*(v−c))); gaussian(v, c=0.5, w=0.2) = exp(−(v−c)**2/(2*w*w)). Store them in a list CURVES = [('linear', linear), ('quadratic', quadratic), ...] indexed by hunger_curve_i (default 3 = sigmoid). Same input value, different output shapes — that’s the entire point.
  3. Define four can_X(agent) requirement functions returning True/False (can_eat needs food in world; can_fight needs enemy; can_flee needs threat; can_explore always returns True) and four score_X(agent) functions returning a float in roughly 0..1. score_eat applies CURVES[hunger_curve_i][1](agent.needs['hunger']) times a distance-decay factor 1 − min(1.0, dist_to_food / 500); score_fight = sigmoid(agent.needs['health']); score_flee = exponential(1 − agent.needs['safety']) * (1 − agent.needs['health'] * 0.5); score_explore = (1 − agent.needs['stim']) * agent.needs['safety'] * 0.5.
  4. Define decide(agent) as the single utility-AI loop. Iterate ACTIONS = [(name, req_fn, score_fn), ...]: if req_fn(agent) returns False, store scores[name] = None and continue (Layer-1 binary gate — this is what prevents the agent from even considering eat when there’s no food); otherwise compute s = score_fn(agent) * personality_mod(name) where personality_mod looks up agent.personality[trait] per action via a PMOD = {'eat': None, 'fight': 'aggression', 'flee': 'caution', 'explore': 'curiosity'} table (Layer-2 continuous score). Track best_score and best_action; return both the chosen action and the full scores dict so the sidebar can render the bar chart for ALL viable actions, not just the winner.
  5. Define take_action(agent, name): eat moves toward nearest food at 3 px/tick and consumes when within 12 px (sets hunger=0.0); fight moves toward nearest enemy at 2.5 px/tick and removes when within 20 px (decrements health by 0.005/tick while engaged); flee moves AWAY from nearest threat at 4 px/tick (regenerates safety +0.01/tick while moving); explore jitters the agent ±2 px on each axis (regenerates stim +0.005/tick).
  6. Define tick_needs(agent): hunger climbs +0.0008/tick (slow drift toward starvation), stim falls −0.0005/tick (boredom builds), safety drops −0.002/tick while threats exist in the world (otherwise stable). Clamp agent.pos to play-area bounds. This is what makes the chosen action change over time even with no key presses — needs drift, scores drift, the max-score winner shifts.
  7. Main loop: keys F/E/T spawn food/enemy/threat at random play-area positions; keys 1/2/3/4/5 switch hunger_curve_i; A/Z adjust aggression by ±0.1 (clamped 0..1); C/X adjust caution by ±0.1; R resets agent + world; SPACE pauses. Each unpaused frame call action, scores = decide(agent), then take_action(agent, action), then tick_needs(agent), then tick_n += 1.
  8. Render play area: food as gold 8-px circles, enemies as red 10-px circles, threats as orange 12-px circles, agent as blue 12-px circle. Render the sidebar with one row per action: action name + a horizontal bar whose width equals int(score * 200) with the chosen action’s bar at FULL color and non-chosen bars at half-brightness; show numeric score to the right of each bar; if the action is gated out by requirements (scores[name] is None), show “REQ FAIL” instead of a bar.
  9. HUD line at top of game area: tick=N | action=NAME | hung=H | hp=H | safe=S | stim=S | curve=NAME | aggr=A. Watching the eat bar length change WITHOUT changing hunger (just by pressing 1/2/3/4/5) is the response-curve discipline in action; watching the fight bar grow as you press A repeatedly (without enemies dying or health changing) is the personality-modifier discipline; watching the chosen action flip from explore to eat as hunger drifts up over time is the utility-scored runtime weighted choice in action.
  10. Add a sidebar legend at the bottom showing “CHOSEN: NAME” in green plus a key reminder. Watching the chosen action flicker between two near-tied options (e.g., eat=0.42 vs explore=0.41) reveals one classic utility-AI failure mode — action thrashing — which production utility AIs solve with hysteresis (a small bonus added to the currently-chosen action’s score so it has to be clearly beaten to lose, not just barely beaten); the lesson’s “Decision Making Tips” section lists hysteresis explicitly as a recommended add-on, but the demo intentionally omits it so the thrashing failure mode is visible.
💡 Hint

The single most important detail is keeping the three disciplines orthogonal in the code so you can see each one work in isolation. The response-curve choice for hunger lives entirely in CURVES[hunger_curve_i][1] applied inside score_eat — nothing else changes when you press 1/2/3/4/5, so the eat bar’s length change is purely the curve’s output shape. The personality modifier lives entirely in personality_mod(name) as a multiplicative factor outside the per-action score function — nothing inside score_fight reads the aggression value directly, so you can swap aggression from 0.2 to 0.9 and watch the fight bar grow proportionally without any scoring-function code running differently. The requirements gate lives entirely in the if not req_fn(agent): continue line in decide() — it’s a Layer-1 binary check that prevents an action from ever entering the score comparison, so “REQ FAIL” in the sidebar means the action wasn’t even considered (not that it scored badly). Keep scores as a dict that holds None for gated-out actions and the float score for considered actions, so the sidebar render can distinguish “not even considered” from “considered and lost.” For action thrashing, leave hysteresis OUT of decide() on the first pass so the failure mode is visible; if you want to add it as a stretch goal, store last_action on the module level and add a small bonus (e.g., +0.05) to that action’s score before the max comparison — the same ad-hoc bias the lesson’s “Decision Making Tips” section recommends.

✅ Example Solution
import pygame, sys, math, random
from typing import Any, Optional
pygame.init()
W, H, SIDEBAR = 768, 480, 320
screen = pygame.display.set_mode((W + SIDEBAR, H))
font = pygame.font.SysFont('Consolas', 13)
clock = pygame.time.Clock()

def linear(v: float) -> float: return max(0.0, min(1.0, v))
def quadratic(v: float) -> float: return v * v
def exponential(v: float, p: float = 3) -> float: return math.pow(max(0.0, v), p)
def sigmoid(v: float, k: float = 10, c: float = 0.5) -> float: return 1 / (1 + math.exp(-k * (v - c)))
def gaussian(v: float, c: float = 0.5, w: float = 0.2) -> float: return math.exp(-((v - c) ** 2) / (2 * w * w))
CURVES = [('linear', linear), ('quadratic', quadratic), ('exponential', exponential),
          ('sigmoid', sigmoid), ('gaussian', gaussian)]
hunger_curve_i = 3  # default sigmoid

agent = type('Ag', (), {})()
agent.pos = [W // 2, H // 2]
agent.needs = {'hunger': 0.4, 'health': 1.0, 'safety': 1.0, 'stim': 0.5}
agent.personality = {'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5}
world = type('Wd', (), {})(); world.foods, world.enemies, world.threats = [], [], []

def dist_to(a: Any, p: tuple[int, int]) -> float: return math.hypot(a.pos[0]-p[0], a.pos[1]-p[1])
def nearest(a: Any, things: list[tuple[int, int]]) -> Optional[tuple[int, int]]: return min(things, key=lambda t: dist_to(a, t)) if things else None

def can_eat(a: Any) -> bool: return bool(world.foods)
def can_fight(a: Any) -> bool: return bool(world.enemies)
def can_flee(a: Any) -> bool: return bool(world.threats)
def can_explore(a: Any) -> bool: return True

def score_eat(a: Any) -> float:
    f = nearest(a, world.foods)
    if not f: return 0.0
    d_score = 1 - min(1.0, dist_to(a, f) / 500)
    h_score = CURVES[hunger_curve_i][1](a.needs['hunger'])
    return h_score * d_score

def score_fight(a: Any) -> float:
    if not world.enemies: return 0.0
    return sigmoid(a.needs['health'])

def score_flee(a: Any) -> float:
    if not world.threats: return 0.0
    return exponential(1 - a.needs['safety']) * (1 - a.needs['health'] * 0.5)

def score_explore(a: Any) -> float:
    return (1 - a.needs['stim']) * a.needs['safety'] * 0.5

ACTIONS = [('eat', can_eat, score_eat), ('fight', can_fight, score_fight),
           ('flee', can_flee, score_flee), ('explore', can_explore, score_explore)]
PMOD = {'eat': None, 'fight': 'aggression', 'flee': 'caution', 'explore': 'curiosity'}
def personality_mod(name: str) -> float: return 1.0 if PMOD[name] is None else agent.personality[PMOD[name]]

def decide(a: Any) -> tuple[str, dict[str, Optional[float]]]:
    best, best_s, scores = 'idle', float('-inf'), {}
    for name, req, sf in ACTIONS:
        if not req(a): scores[name] = None; continue
        s = sf(a) * personality_mod(name)
        scores[name] = s
        if s > best_s: best_s, best = s, name
    return best, scores

def take_action(a: Any, name: str) -> None:
    if name == 'eat':
        f = nearest(a, world.foods)
        if f:
            if dist_to(a, f) < 12: world.foods.remove(f); a.needs['hunger'] = 0.0
            else:
                dx, dy = f[0]-a.pos[0], f[1]-a.pos[1]; d = math.hypot(dx, dy) or 1
                a.pos[0] += dx/d * 3; a.pos[1] += dy/d * 3
    elif name == 'fight':
        e = nearest(a, world.enemies)
        if e:
            if dist_to(a, e) < 20: world.enemies.remove(e)
            else:
                dx, dy = e[0]-a.pos[0], e[1]-a.pos[1]; d = math.hypot(dx, dy) or 1
                a.pos[0] += dx/d * 2.5; a.pos[1] += dy/d * 2.5
                a.needs['health'] = max(0.0, a.needs['health'] - 0.005)
    elif name == 'flee':
        t = nearest(a, world.threats)
        if t:
            dx, dy = a.pos[0]-t[0], a.pos[1]-t[1]; d = math.hypot(dx, dy) or 1
            a.pos[0] += dx/d * 4; a.pos[1] += dy/d * 4
            a.needs['safety'] = min(1.0, a.needs['safety'] + 0.01)
    elif name == 'explore':
        a.pos[0] += random.choice([-2, 2]); a.pos[1] += random.choice([-2, 2])
        a.needs['stim'] = min(1.0, a.needs['stim'] + 0.005)

def tick_needs(a: Any) -> None:
    a.needs['hunger'] = min(1.0, a.needs['hunger'] + 0.0008)
    a.needs['stim'] = max(0.0, a.needs['stim'] - 0.0005)
    if world.threats: a.needs['safety'] = max(0.0, a.needs['safety'] - 0.002)
    a.pos[0] = max(8, min(W-8, a.pos[0])); a.pos[1] = max(8, min(H-8, a.pos[1]))

tick_n, paused = 0, False
COL_BAR = {'eat': (255,215,0), 'fight': (244,67,54), 'flee': (255,152,0), 'explore': (156,39,176)}

while True:
    for ev in pygame.event.get():
        if ev.type == pygame.QUIT: pygame.quit(); sys.exit()
        if ev.type == pygame.KEYDOWN:
            if ev.key == pygame.K_f: world.foods.append((random.randint(40, W-40), random.randint(40, H-40)))
            if ev.key == pygame.K_e: world.enemies.append((random.randint(40, W-40), random.randint(40, H-40)))
            if ev.key == pygame.K_t: world.threats.append((random.randint(40, W-40), random.randint(40, H-40)))
            if pygame.K_1 <= ev.key <= pygame.K_5: hunger_curve_i = ev.key - pygame.K_1
            if ev.key == pygame.K_a: agent.personality['aggression'] = min(1.0, agent.personality['aggression'] + 0.1)
            if ev.key == pygame.K_z: agent.personality['aggression'] = max(0.0, agent.personality['aggression'] - 0.1)
            if ev.key == pygame.K_c: agent.personality['caution'] = min(1.0, agent.personality['caution'] + 0.1)
            if ev.key == pygame.K_x: agent.personality['caution'] = max(0.0, agent.personality['caution'] - 0.1)
            if ev.key == pygame.K_r:
                agent.pos = [W//2, H//2]; agent.needs = {'hunger':0.4,'health':1.0,'safety':1.0,'stim':0.5}
                world.foods, world.enemies, world.threats = [], [], []
            if ev.key == pygame.K_SPACE: paused = not paused

    action, scores = decide(agent)
    if not paused: take_action(agent, action); tick_needs(agent); tick_n += 1

    screen.fill((44,62,80))
    for f in world.foods: pygame.draw.circle(screen, (255,215,0), (int(f[0]), int(f[1])), 8)
    for e in world.enemies: pygame.draw.circle(screen, (244,67,54), (int(e[0]), int(e[1])), 10)
    for t in world.threats: pygame.draw.circle(screen, (255,152,0), (int(t[0]), int(t[1])), 12)
    pygame.draw.circle(screen, (33,150,243), (int(agent.pos[0]), int(agent.pos[1])), 12)
    hud = f"tick={tick_n} action={action} hung={agent.needs['hunger']:.2f} hp={agent.needs['health']:.2f} safe={agent.needs['safety']:.2f} stim={agent.needs['stim']:.2f} curve={CURVES[hunger_curve_i][0]} aggr={agent.personality['aggression']:.1f}"
    screen.blit(font.render(hud, True, (255,255,255)), (8, 8))

    pygame.draw.rect(screen, (30,30,30), (W, 0, SIDEBAR, H))
    screen.blit(font.render('Utility scores per action:', True, (200,200,200)), (W+10, 20))
    for i, (name, _, _) in enumerate(ACTIONS):
        s = scores.get(name); y = 50 + i * 50
        screen.blit(font.render(f'{name}:', True, (220,220,220)), (W+10, y))
        if s is None:
            screen.blit(font.render('REQ FAIL', True, (160,160,160)), (W+90, y))
        else:
            col = COL_BAR[name]
            dim = col if name == action else (col[0]//2, col[1]//2, col[2]//2)
            pygame.draw.rect(screen, dim, (W+90, y, max(1, int(s * 200)), 18))
            screen.blit(font.render(f'{s:.3f}', True, (255,255,255)), (W+295, y))
    screen.blit(font.render(f'CHOSEN: {action}', True, (76,175,80)), (W+10, 280))
    screen.blit(font.render('F=food E=enemy T=threat', True, (180,180,180)), (W+10, 320))
    screen.blit(font.render('1-5=hunger curve', True, (180,180,180)), (W+10, 340))
    screen.blit(font.render('A/Z=aggr  C/X=caution', True, (180,180,180)), (W+10, 360))
    screen.blit(font.render('R=reset  SPACE=pause', True, (180,180,180)), (W+10, 380))
    pygame.display.flip(); clock.tick(60)

🎯 Quick Quiz

Question 1: The lesson’s decide() method iterates all four candidate actions (eat, fight, flee, explore), computes a utility score for each viable action, and picks the action with the highest score. Compare this with a behavior-tree Selector, which tries children in a fixed top-down child-list order and picks the first one that returns SUCCESS. Which statement most accurately captures when each architecture fits and what the practical difference is on a given tick?

Question 2: The lesson’s score_eat function applies CURVES[hunger_curve_i][1](agent.needs['hunger']) — the response curve transforms the raw hunger value into a behavioral weight. With the same hunger value of 0.7, the curve choice produces dramatically different scores: linear → 0.70, quadratic → 0.49, exponential (cubed) → 0.34, sigmoid (k=10, c=0.5) → ~0.88, gaussian (c=0.5, w=0.2) → ~0.61. Why does designer-controlled curve choice matter as much as the raw input value?

Question 3: The lesson’s decide() loop multiplies each action’s raw score by a personality modifier looked up from the agent’s personality dict ({'aggression': 0.5, 'caution': 0.5, 'curiosity': 0.5} by default; fight uses aggression, flee uses caution, explore uses curiosity, eat uses no modifier). The same world state + same scoring functions produce a different chosen action depending on the personality dict — a high-aggression agent (aggression: 0.9) fights an enemy that a low-aggression agent (aggression: 0.1) flees from. What does this externalization-as-data-dict buy you?

What's Next?

Congratulations! You've completed the AI for Games section! Next, we'll dive into Networking & Multiplayer to create connected gaming experiences!