Skip to main content

Command Palette

Search for a command to run...

Using the Agrona library (Part 6)

Agents & Idle Strategies

Published
8 min read

"The Agent is Agrona's answer to the question: 'What if your thread were a perfectly tuned event loop?'"

Agrona's Agent framework provides a composable, duty-cycle model for building concurrent services. Instead of thread pools and blocking queues, you get single-threaded event loops with pluggable idle strategies that control the CPU-vs-latency trade-off.


What is an Agent?

An Agent is a unit of work with a simple contract: implement doWork() and return the amount of work done. Agrona's AgentRunner calls doWork() in a tight loop, using an IdleStrategy to decide what to do when there's no work.


The Agent Interface

import org.agrona.concurrent.Agent;

public interface Agent {
    
    // Called in a loop by AgentRunner.
    // Return: number of work items processed (0 means "nothing to do")
    int doWork() throws Exception;
    
    // Human-readable name (for logging, monitoring)
    String roleName();
    
    // Called once when AgentRunner starts the agent
    default void onStart() { }
    
    // Called once when AgentRunner shuts down the agent
    default void onClose() { }
}

A Simple Agent

public class OrderAgent implements Agent {
    private final OneToOneRingBuffer ringBuffer;
    private final AtomicCounter ordersProcessed;
    
    public OrderAgent(OneToOneRingBuffer ringBuffer, AtomicCounter counter) {
        this.ringBuffer = ringBuffer;
        this.ordersProcessed = counter;
    }
    
    @Override
    public int doWork() {
        // Read messages from ring buffer — return count of messages read
        return ringBuffer.read((msgTypeId, buffer, index, length) -> {
            processOrder(buffer, index, length);
            ordersProcessed.increment();
        });
    }
    
    @Override
    public String roleName() {
        return "order-processor";
    }
    
    @Override
    public void onStart() {
        System.out.println("Order processor starting...");
    }
    
    @Override
    public void onClose() {
        System.out.println("Order processor shutting down...");
    }
}

AgentRunner — The Event Loop

AgentRunner wraps an Agent and runs it on a dedicated thread:

import org.agrona.concurrent.AgentRunner;
import org.agrona.concurrent.SleepingMillisIdleStrategy;
import org.agrona.concurrent.ShutdownSignalBarrier;

// 1. Create the agent
Agent orderAgent = new OrderAgent(ringBuffer, counter);

// 2. Choose an idle strategy
IdleStrategy idleStrategy = new SleepingMillisIdleStrategy(1);

// 3. Create and start the AgentRunner
AgentRunner agentRunner = new AgentRunner(
    idleStrategy,                          // What to do when idle
    Throwable::printStackTrace,            // Error handler
    null,                                  // AtomicCounter for errors (optional)
    orderAgent                             // The agent to run
);

// 4. Start on a new thread
AgentRunner.startOnThread(agentRunner);

// 5. Wait for shutdown signal (e.g., Ctrl+C)
ShutdownSignalBarrier barrier = new ShutdownSignalBarrier();
barrier.await();

// 6. Clean shutdown
agentRunner.close();

What AgentRunner.run() Looks Like Internally

// Simplified AgentRunner event loop
public void run() {
    agent.onStart();
    
    while (isRunning) {
        try {
            int workCount = agent.doWork();
            idleStrategy.idle(workCount);
        } catch (Throwable t) {
            errorHandler.onError(t);
        }
    }
    
    agent.onClose();
}

Idle Strategies — The CPU Knob

Idle strategies control what happens when doWork() returns 0 (no work available). This is where you tune the latency vs CPU usage trade-off:

BusySpinIdleStrategy — Minimum Latency

// Burns 100% CPU — but gives nanosecond response time
IdleStrategy idle = new BusySpinIdleStrategy();
// Uses Thread.onSpinWait() (Java 9+) which issues a PAUSE instruction
// Tells the CPU "I'm spinning" → saves power and avoids pipeline stalls

Use when: You have a dedicated CPU core and need absolute minimum latency. Typical in trading systems where latency is worth more than electricity.

YieldingIdleStrategy — Good Balance

// Yields the CPU when idle, but resumes quickly
IdleStrategy idle = new YieldingIdleStrategy();
// Progression: spin N times → Thread.yield()

Use when: Low-latency is important but you want to share the CPU occasionally.

BackoffIdleStrategy — Adaptive

// Progressively backs off: spin → yield → park (sleep)
IdleStrategy idle = new BackoffIdleStrategy(
    100,   // maxSpins before yield
    10,    // maxYields before parking
    1,     // minParkPeriodNs (1 ns minimum)
    1_000  // maxParkPeriodNs (1 μs maximum)
);

Use when: Load is bursty — aggressive during bursts, low CPU when quiet.

SleepingMillisIdleStrategy — Low CPU

// Sleeps for a fixed duration when idle
IdleStrategy idle = new SleepingMillisIdleStrategy(1); // 1 ms sleep
// Uses LockSupport.parkNanos()

Use when: Latency of milliseconds is acceptable and CPU efficiency matters.

The Decision Matrix

Strategy Latency CPU Idle Best For
BusySpinIdleStrategy ~ns 100% Ultra-low-latency (dedicated core)
NoOpIdleStrategy ~ns 100% Benchmarking
YieldingIdleStrategy ~1 μs 80-100% Low-latency, shared cores
BackoffIdleStrategy ~1-100 μs Adaptive Bursty workloads
SleepingMillisIdleStrategy ~1 ms 0-5% Background processing
SleepingIdleStrategy ~μs Low Configurable sleep in ns

Composite Agents — Combining Duties

Run multiple agents on a single thread using composition:

import org.agrona.concurrent.CompositeAgent;

Agent orderAgent = new OrderAgent(ringBuffer1, counter1);
Agent marketDataAgent = new MarketDataAgent(ringBuffer2, counter2);
Agent heartbeatAgent = new HeartbeatAgent(heartbeatCounter);

// All three agents share a single thread!
Agent compositeAgent = new CompositeAgent(
    orderAgent,
    marketDataAgent,
    heartbeatAgent
);

AgentRunner runner = new AgentRunner(
    new BackoffIdleStrategy(100, 10, 1, 1000),
    Throwable::printStackTrace,
    null,
    compositeAgent
);

AgentRunner.startOnThread(runner);

Why compose on one thread?

  • Fewer threads = fewer context switches

  • Agents that share data don't need synchronization

  • Easier to reason about (no concurrency within a thread)

  • Common in Aeron: media driver + conductor on one thread


Agent Lifecycle

Graceful Shutdown

// Pattern: Use ShutdownSignalBarrier for clean shutdown
ShutdownSignalBarrier barrier = new ShutdownSignalBarrier();

AgentRunner runner = new AgentRunner(idle, errorHandler, null, agent);
AgentRunner.startOnThread(runner);

// Main thread waits for SIGTERM/SIGINT
barrier.await();

// Clean shutdown: stops the loop, calls agent.onClose()
try (runner) {
    // AutoCloseable — close() is called by try-with-resources
}

Building a Complete Service

Here's a complete example: a message processing service with telemetry:

public class TradingService implements Agent, AutoCloseable {
    private final OneToOneRingBuffer inboundRingBuffer;
    private final OneToOneRingBuffer outboundRingBuffer;
    private final AtomicCounter messagesReceived;
    private final AtomicCounter messagesPublished;
    private final AtomicCounter errors;
    private final UnsafeBuffer scratchBuffer = new UnsafeBuffer(new byte[256]);
    
    @Override
    public int doWork() {
        int workDone = 0;
        
        // Duty 1: Read from inbound ring buffer
        workDone += inboundRingBuffer.read((msgTypeId, buffer, index, length) -> {
            messagesReceived.increment();
            
            try {
                // Process the message
                byte[] result = processOrder(buffer, index, length);
                
                // Write result to outbound ring buffer
                scratchBuffer.putBytes(0, result);
                outboundRingBuffer.write(2, scratchBuffer, 0, result.length);
                messagesPublished.increment();
                
            } catch (Exception e) {
                errors.increment();
            }
        }, 10); // Read up to 10 messages per duty cycle
        
        return workDone;
    }
    
    @Override
    public String roleName() {
        return "trading-service";
    }
    
    @Override
    public void onStart() {
        System.out.println("[" + roleName() + "] Starting...");
    }
    
    @Override
    public void onClose() {
        System.out.println("[" + roleName() + "] Processed: " +
            messagesReceived.get() + " messages");
    }

    @Override
    public void close() {
        messagesReceived.close();
        messagesPublished.close();
        errors.close();
    }
}

// Bootstrap:
public static void main(String[] args) {
    // ... set up ring buffers, counters ...
    
    TradingService service = new TradingService(inbound, outbound, counters);
    
    AgentRunner runner = new AgentRunner(
        new BackoffIdleStrategy(100, 10, 1, 1000),
        (t) -> System.err.println("Error: " + t.getMessage()),
        errorCounter,
        service
    );
    
    AgentRunner.startOnThread(runner);
    
    new ShutdownSignalBarrier().await();
    runner.close();
}

AgentRunner vs Thread Pools

Aspect AgentRunner + Agent ExecutorService
Threading model 1 thread per Agent(Runner) Pool of worker threads
Scheduling Continuous duty cycle Task submission
Idle behavior Configurable IdleStrategy Thread.park() or steal
Latency Nanoseconds (busy spin) Microseconds (context switch)
Composition CompositeAgent (same thread) Not composable
Lifecycle onStart / onClose None
Error handling Pluggable ErrorHandler UncaughtExceptionHandler
CPU affinity Easy (pin thread to core) Hard
Use case Latency-sensitive event loops General concurrent workloads

Common Pitfalls

Pitfall 1: Blocking in doWork()

// ❌ NEVER block in doWork()!
@Override
public int doWork() {
    String line = bufferedReader.readLine(); // BLOCKS the entire agent!
    Thread.sleep(100);                       // BLOCKS the entire agent!
    socket.accept();                         // BLOCKS the entire agent!
    return 1;
}

// ✅ Only do non-blocking work in doWork()
@Override
public int doWork() {
    return ringBuffer.read(handler, 10); // Non-blocking polling
}

Pitfall 2: Burning CPU Without Awareness

// ❌ Using BusySpinIdleStrategy when you don't need it
AgentRunner runner = new AgentRunner(
    new BusySpinIdleStrategy(), // 100% CPU even when idle!
    errorHandler, null, agent
);

// ✅ Use BackoffIdleStrategy for most use cases
AgentRunner runner = new AgentRunner(
    new BackoffIdleStrategy(100, 10, 1, 1_000_000),
    errorHandler, null, agent
);

Pitfall 3: Forgetting to Close

// ❌ Leaking the thread!
AgentRunner.startOnThread(runner);
// ... never calling runner.close()

// ✅ Always close on shutdown
try {
    barrier.await();
} finally {
    CloseHelper.quietClose(runner);
}

Summary

The one-liner: Agrona's Agent framework provides a duty-cycle event loop with pluggable idle strategies — giving you sub-microsecond latency with continuous polling or millisecond latency with minimal CPU usage, all with a single knob.