Autonomous Infrastructure Guardian

It never blinks.

Every existing monitoring tool is a tripwire. It waits. It alerts. It pages a human. The human is the intelligence — the tool is just a sensor.

Marshal is the intelligence. It watches your servers, services, and systems — not as a passive monitor waiting for thresholds to breach, but as an active agent that understands context, detects developing problems before they become incidents, and acts with calibrated confidence.

The human is the exception — consulted only when the blast radius demands it.

See the architecture →
System Class Infrastructure Guardian
LLM in Stack None
Decision Model Deterministic
Phase 0 Complete ✓
Audit Log Append-only
Action Tiers 4 (configurable)
Sensors CPU/Mem/Disk/HTTP
Observe-Only First 7 Days
The Three Principles

Not a better monitor. A different thing entirely.

Monitoring tools operate on thresholds. Marshal operates on context. These three principles define what separates an autonomous agent from a sophisticated alert system.

Principle 01
Proactive over Reactive
Marshal acts before thresholds breach. A signal trending toward 85% CPU over 20 minutes is a different situation from a spike to 85%. Marshal knows the difference. It acts at 70% trending, not 85% breached.
TRENDING signal at t=0: 62%
TRENDING signal at t=10m: 68%
TRENDING signal at t=20m: 74%
→ NOTIFY_AND_ACT at 74%, not at 85%
Principle 02
Contextual over Absolute
The same CPU spike means different things at 2am on a Sunday versus 2pm on a Tuesday. Before a deploy versus 20 minutes after one. In isolation versus alongside a correlated memory trend. Marshal tracks all of it.
85% CPU at 03:00 Sunday → ANOMALY
85% CPU at 14:00 Tuesday → normal
Same signal. Opposite response.
Temporal baseline: 168-cell grid
Principle 03
Learning over Rules
Marshal builds a model of this environment, not generic best practices. "Restart nginx" reliably resolves memory leaks on this service but never fixes latency spikes. Marshal learns from outcome tracking and stops recommending things that don't work here.
restart_nginx + memory_leak → 0.91
restart_nginx + latency_spike → 0.22
→ Marshal stops using it for latency
→ Surfaces to human with note
The Confidence Gate

Every action earns its autonomy.

The Confidence Gate is the core decision function. It weighs signal urgency against action risk and returns one of four tiers. This gate is what separates Marshal from both passive monitors and naive automation.

Autonomous
Marshal acts, logs to audit DB before execution, continues patrol. No human notification required. Used only for reversible, well-understood actions with proven outcome history in this environment.
high confidence + low risk
Notify & Act
Marshal executes and simultaneously notifies a human. The action proceeds — but a human knows it happened and can intervene. For medium-risk actions where urgency outweighs the cost of waiting for approval.
med confidence + med risk
Recommend
Marshal surfaces a specific action recommendation with full signal context. A human approves or dismisses. Default tier during Phase 0 (observe-only period). Used for all high-risk or low-confidence decisions indefinitely.
med confidence + high risk
Observe
Signal is noted and accumulated. Too early to act — either confidence is low or the deviation hasn't persisted long enough. Re-evaluated every patrol tick. Signals that persist through Observe accumulate toward threshold.
low confidence
decision_tier = gate(signal_significance, action_confidence, action_risk, uptime_days)

if uptime_days < observe_only_days (7): force RECOMMEND regardless of confidence
if action.confirmation_required: force RECOMMEND regardless of risk score

AUTONOMOUS: significance ≥ auto_threshold AND risk ≤ 0.35 AND confidence ≥ auto_confidence
NOTIFY_AND_ACT: significance ≥ notify_threshold AND risk ≤ 0.65
RECOMMEND: significance ≥ recommend_threshold
OBSERVE: default (significance below all thresholds)
The Audit Invariant
Every autonomous action Marshal takes is written to the append-only audit log before execution. This is non-negotiable and cannot be disabled. If execution fails mid-command, the attempt is still recorded. The audit log is the source of truth — not the execution result. Borrowed from AiMe's Evidence Ledger invariant: the system is always accountable for its own behavior.
The Patrol Loop

Collect. Analyze. Gate. Act.

Marshal runs a continuous patrol loop — every tick, it collects all sensor readings, analyzes deviation from baseline, scores significance, passes through the confidence gate, and executes or notifies. Fail-open: a tick exception never stops the loop.

Step 1
Collect

All active sensors fire. CPU, memory, disk, swap, load average, network I/O, HTTP endpoint health and latency. Every reading written to the append-only signal store (signals.db, WAL mode).

Step 2
Analyze

Pattern engine computes deviation (σ from baseline), velocity (rate of change per minute), and duration (minutes since deviation threshold exceeded). Correlation detection flags co-anomalous signals.

Step 3
Gate

Each anomalous signal is scored for significance. The confidence gate evaluates action candidates from the playbook against current signal context and returns AUTONOMOUS / NOTIFY_AND_ACT / RECOMMEND / OBSERVE.

Step 4
Act

Autonomous actions execute (after audit write). Notify-and-act actions execute and fire webhook. Recommendations surface to the dashboard. Outcomes tracked and fed back into action effectiveness weights.

Signal Significance Formula

Borrowed from Ethos. Calibrated for infrastructure.

Marshal's significance formula shares structural DNA with Ethos's resistance scoring — an additive formula with independently tunable components. Signal urgency, not just current value, determines action tier. Range: 0.0 → 1.0.

Deviation Score 0 – 0.45 (primary component)
σ from baseline, capped at 5σ. A 2.1σ deviation scores 0.189. A 4.2σ deviation scores 0.378. The formula is tanh-scaled — extreme outliers don't dominate linearly. Formula: min(deviation_sigma / 5.0, 1.0) × 0.45
Duration Bonus 0 – 0.30 (persistence matters)
A signal that has been anomalous for 30+ minutes scores the full duration bonus. A brief spike does not. Rewards sustained anomalies over transient noise. Formula: min(duration_minutes / 30.0, 1.0) × 0.30
Velocity Bonus 0 – 0.15 (trending amplified)
Rate of change per minute (capped at 5 units/min). A signal trending hard upward gets additional urgency even before deviation becomes severe. This is what enables proactive action. Formula: min(velocity / 5.0, 1.0) × 0.15
Correlation Bonus 0 – 0.10 (cascade detection)
Active in Phase 4. When multiple signals are simultaneously anomalous in correlated directions, each receives a bonus. A single disk anomaly is bad. CPU + memory + error rate simultaneously trending together suggests a cascade — act faster. Formula: min(correlated_count × 0.025, 0.10)
significance = dev_score + dur_bonus + vel_bonus + corr_bonus  (range: 0.0 → 1.0)

dev_score = min(deviation_sigma / 5.0, 1.0) × 0.45
dur_bonus = min(duration_minutes / 30.0, 1.0) × 0.30
vel_bonus = min(velocity / 5.0, 1.0) × 0.15
corr_bonus = min(correlated_count × 0.025, 0.10) ← Phase 4

Example: deviation=4.2σ, duration=15min, velocity=0.3/min, correlation=0
→ 0.378 + 0.150 + 0.009 + 0.000 = 0.537 → RECOMMEND tier
Build Phases

Phase 0 runs today. Everything compounds.

Each phase is independently deployable and builds on the previous one. Phase 0 alone is more capable than most monitoring setups. Everything beyond Phase 0 turns data into intelligence.

Phase 0 ✓
Foundation
Signal collection, significance scoring, human-confirmed recommendations. System sensor (CPU/memory/disk/network), HTTP health sensor, significance formula, confidence gate (RECOMMEND-only), append-only audit log, webhook notifications. Smoke test passing.
Complete
Phase 1
Autonomous Actions
AUTONOMOUS tier enabled. Low-risk, reversible actions execute without human confirmation — rotate_logs, clear_tmp, restart_app. Audit-before-execute invariant enforced. Outcome tracking: RESOLVED/PARTIAL/FAILED/UNKNOWN.
7d Phase 0 data
Phase 2
Learning Loop
Marshal tracks outcome of every autonomous action. Rolling mean effectiveness per (action, signal_type) tuple. Action selection incorporates effectiveness weights. If "restart nginx" has 0.22 effectiveness for latency, Marshal stops using it and says so.
1wk Phase 1 data
Phase 3
Temporal Intelligence
168-cell baseline grid per signal (24 hours × 7 days). 3am Sunday vs 2pm Tuesday baselines are separate. Eliminates false positives from predictable periodic load. Alert fatigue drops. AUTONOMOUS threshold can safely lower.
2wks data, 10 samples/cell
Phase 4
Correlation Engine
Multi-signal anomaly detection. Co-anomalous signals detected by direction alignment and onset time offset (<5 min). Correlation bonus in significance formula. Root-signal identification for cascades — act on the cause, not the symptoms.
Phase 3 baselines
Phase 5
REST API + Dashboard
Full REST API: signal history, active anomalies, action effectiveness, patrol control, watch rules, audit log. Single-file dashboard with signal grid, pending recommendations, approve/dismiss surface, live patrol status.
Phase 1+
Phase 6
Log + Process Sensors
Tail log files for ERROR/WARN/EXCEPTION patterns (log sensor). Per-process CPU, memory, threads, restart count (process sensor). Enables application-level visibility on top of system-level signals.
Phase 0 architecture
Phase 7
Verum Integration
Marshal's audit log exported in Ethos-compatible JSONL format. Ethos ingests it as doc_type="action" passages. Verum scores Marshal's behavioral alignment against documented SRE best practices. An infrastructure guardian that can prove its own integrity.
Ethos + Verum live
The Product Family

Systems that do more than react.

Four products. One architecture. One philosophy. All four share the same underlying idea: systems that understand context, act with calibrated confidence, and are accountable for every decision.

Personal Cognition
AiMe
The bond between AI and person — cognitive operating layer. Model-agnostic companion that knows you.
Active
Behavioral Corpus
Ethos
The corpus — what integrity looks like, extracted from the full human record. 15 values, resistance-weighted, deterministic.
Phase 0+
AI Evaluation
Verum
The evaluation layer — verified truth, applied in real time to AI outputs. Corpus-grounded certification. Verified by Verum.
Live
Infrastructure
Marshal
The knight — autonomous infrastructure guardian. Always on patrol. Authority to act. Accountable for every decision.
Phase 0 ✓
The Phase 7 loop: Marshal generates the evidence. Ethos extracts the values. Verum certifies the alignment. An infrastructure guardian that can prove its own behavioral integrity via an external evaluation pipeline is a fundamentally different product from one that just claims it works. The product moat is the loop itself — the more Marshal acts, the more behavioral evidence accumulates, the stronger the Verum certification becomes.
"Every existing monitor waits for a human to decide.
Marshal decides.
The human is the exception."
Marshal · Autonomous Infrastructure Guardian · Phase 0 Complete