Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.risolu.to/llms.txt

Use this file to discover all available pages before exploring further.

Risoluto provides multiple observability surfaces: Prometheus metrics, real-time event streams, audit logs, request tracing, and structured logs.

Prometheus Metrics

Available at GET /metrics in Prometheus exposition format. Scrape at 10-15 second intervals.
curl -s http://127.0.0.1:4000/metrics

Metric reference

MetricTypeLabelsDescription
risoluto_http_requests_totalCountermethod, statusTotal HTTP requests
risoluto_http_request_duration_secondsHistogrammethodRequest latency distribution
Useful queries:
# Error rate (5xx)
sum(rate(risoluto_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(risoluto_http_requests_total[5m]))

# P95 latency
histogram_quantile(0.95, rate(risoluto_http_request_duration_seconds_bucket[5m]))
MetricTypeLabelsDescription
risoluto_orchestrator_polls_totalCounterstatusPoll cycles (success, error, skipped)
Useful queries:
# Polls per minute
rate(risoluto_orchestrator_polls_total{status="success"}[5m]) * 60

# Stall detection — no polls in 10 minutes
increase(risoluto_orchestrator_polls_total[10m]) == 0
MetricTypeLabelsDescription
risoluto_agent_runs_totalCounteroutcomeAgent completions (completed, failed, oom, stalled)
risoluto_container_cpu_percentGaugecontainerSandbox container CPU usage percentage
risoluto_container_memory_percentGaugecontainerSandbox container memory usage percentage
Useful queries:
# Success rate (1h)
sum(rate(risoluto_agent_runs_total{outcome="completed"}[1h]))
/ sum(rate(risoluto_agent_runs_total[1h]))

# Runs by outcome (stacked chart)
sum by (outcome) (increase(risoluto_agent_runs_total[1h]))

# Containers at CPU saturation
max by (container) (risoluto_container_cpu_percent) > 90
Webhook delivery pipeline metrics — ingest volume, duplicate rejection, DLQ routing, backlog depth, and end-to-end processing latency.
MetricTypeLabelsDescription
risoluto_webhook_deliveries_totalCounterprovider, statusRaw webhook deliveries received
risoluto_webhook_duplicates_totalCounterproviderDeliveries rejected as duplicates by the idempotency store
risoluto_webhook_events_processed_totalCounterprovider, outcomeEvents that passed validation and reached the processor
risoluto_webhook_processor_retries_totalCounterprovider, reasonRetry attempts inside the webhook processor
risoluto_webhook_dlq_totalCounterreasonEvents moved to the dead-letter queue
risoluto_webhook_subscription_checks_totalCounterprovider, statusSubscription health checks performed
risoluto_webhook_backlog_countGaugeproviderCurrent count of unprocessed webhook events
risoluto_webhook_dlq_countGaugeCurrent count of events in the dead-letter queue
risoluto_webhook_last_delivery_age_secondsGaugeproviderSeconds since the last successful delivery
risoluto_webhook_processing_latency_secondsHistogramproviderEnd-to-end webhook processing latency
Useful queries:
# Webhook backlog growing
max(risoluto_webhook_backlog_count) > 100

# DLQ accumulating
increase(risoluto_webhook_dlq_total[15m]) > 0

# Delivery staleness (webhook feed broken)
max(risoluto_webhook_last_delivery_age_seconds) > 900

# P95 webhook processing latency
histogram_quantile(0.95, rate(risoluto_webhook_processing_latency_seconds_bucket[5m]))

Scrape configuration

scrape_configs:
  - job_name: risoluto
    static_configs:
      - targets: ["risoluto:4000"]
    metrics_path: /metrics
    scrape_interval: 10s

Alert rules

groups:
  - name: risoluto
    rules:
      - alert: HighErrorRate
        expr: >
          rate(risoluto_http_requests_total{status=~"5.."}[5m])
          / rate(risoluto_http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HTTP error rate above 5%"

      - alert: AgentRunFailures
        expr: rate(risoluto_agent_runs_total{outcome="failed"}[15m]) > 0
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Agent runs failing consistently"

      - alert: PollStalled
        expr: increase(risoluto_orchestrator_polls_total[10m]) == 0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator hasn't polled in 10 minutes"

Request Tracing

Every request gets an X-Request-ID header:
  • Incoming: If the client sends X-Request-ID, it is preserved
  • Generated: Otherwise a UUID v4 is assigned
  • Response: The ID is always returned in the response headers
Use this ID to correlate logs, metrics, and audit entries for a single request.

Error Tracking

Sentry-compatible error tracking when SENTRY_DSN is set:
export SENTRY_DSN=https://your-key@sentry.io/project-id
When enabled: exceptions are captured with full stack traces, breadcrumbs track the last 100 operations, and context (issue identifier, attempt count) is attached to every error. DSN is redacted in log output. When SENTRY_DSN is not set, a no-op tracker is used with zero overhead.

Process Logs

Logs are emitted to stdout via Pino in structured format:
VariableValuesDefault
RISOLUTO_LOG_FORMATlogfmt, jsonlogfmt
LOG_LEVELtrace through fatalinfo
# Persist logs to file
node dist/cli/index.js --port 4000 2>&1 | tee risoluto.log

# JSON format for log aggregators (Loki, Datadog, etc.)
RISOLUTO_LOG_FORMAT=json node dist/cli/index.js --port 4000

Data Persistence

All attempt and event data is stored in SQLite (risoluto.db) with WAL mode:
.risoluto/
├── risoluto.db           # Attempts, events, issue index
├── risoluto.db-shm       # WAL shared-memory
├── risoluto.db-wal       # Write-ahead log
├── config/               # Operator config overlay
├── secrets.enc           # Encrypted credential store
├── secrets.audit.log     # Access audit trail
└── master.key            # Encryption master key

Querying data

# Via API
curl -s http://127.0.0.1:4000/api/v1/state | jq
curl -s http://127.0.0.1:4000/api/v1/NIN-6/attempts | jq

# Via CLI helper
./risoluto MT-42
./risoluto --attempt <attempt-id>

# Direct SQLite (read-only via WAL)
sqlite3 .risoluto/risoluto.db \
  "SELECT attempt_id, issue_identifier, status FROM attempts ORDER BY started_at DESC LIMIT 10;"

What’s Next

Monitoring Stack

Set up Prometheus and Grafana with pre-built queries and alerts.

Troubleshooting

Diagnose common failures and recovery procedures.

Notifications

Configure Slack notifications for agent lifecycle events.

Dashboard Guide

Board view, issue inspector, and live event stream.
Last modified on April 10, 2026