Skip to content

Hassette Architecture Overview

Hassette is an async-first Python framework for building Home Assistant automations. It connects to Home Assistant over WebSocket, routes incoming events through a typed pub/sub bus, dispatches them to user-defined App classes, and provides a web UI for monitoring the running system.


1. Component Ownership

Every component is a Resource in a parent/child tree rooted at the Hassette instance. Apps get four lightweight handles (Bus, Scheduler, Api, StateManager) that delegate to shared framework services.

graph TD
    accTitle: Component Ownership Tree
    accDescr: Parent-child resource hierarchy from Hassette root to per-app handles

    Hassette

    subgraph infra["Infrastructure Services"]
        EventStreamService
        DatabaseService
        CommandExecutor
        WebsocketService
    end

    subgraph core["Core Services"]
        BusService
        SchedulerService
        ApiResource
        StateProxy
    end

    subgraph web["Web Layer"]
        WebApiService
        RuntimeQueryService
        TelemetryQueryService
    end

    subgraph apps["App Management"]
        AppHandler
        AppLifecycleService
        AppRegistry
    end

    Hassette --- infra
    Hassette --- core
    Hassette --- web
    Hassette --- apps

    AppHandler --> AppLifecycleService
    AppHandler --> AppRegistry

    subgraph perapp["Per-App Resources (0..N instances)"]
        App
        App --> Bus
        App --> Scheduler
        App --> Api
        App --> StateManager
        App --> Cache
    end

    AppLifecycleService --> App

    style infra fill:#f0f0f0,stroke:#999
    style core fill:#e8f0ff,stroke:#6688cc
    style web fill:#f0f8e8,stroke:#88aa66
    style apps fill:#fff0e8,stroke:#cc8844
    style perapp fill:#f8f0ff,stroke:#8866cc
Hold "Ctrl" to enable pan & zoom

Per-app handles are thin wrappers. When an app shuts down, its Bus removes its listeners from BusService, its Scheduler removes its jobs from SchedulerService, and so on. The shared services continue running for other apps.


2. Service Dependencies

Services declare depends_on at the class level. The framework computes wave-based startup order from these declarations. Arrows point from dependents down to their dependencies — services at the top start last.

graph BT
    accTitle: Service Dependency Graph
    accDescr: Wave-based startup order, wave 0 at the top

    subgraph wave0["Wave 0 — No Dependencies"]
        DB[DatabaseService]
        WS[WebsocketService]
    end

    subgraph wave1["Wave 1"]
        BUS[BusService]
        SCHED[SchedulerService]
        CMD[CommandExecutor]
        API[ApiResource]
    end

    subgraph wave2["Wave 2"]
        SP[StateProxy]
        TQS[TelemetryQueryService]
    end

    subgraph wave3["Wave 3"]
        AH[AppHandler]
    end

    subgraph wave4["Wave 4"]
        RQS[RuntimeQueryService]
    end

    subgraph wave5["Wave 5 — Last to Start"]
        WEB[WebApiService]
    end

    BUS --> DB
    SCHED --> DB
    CMD --> DB
    TQS --> DB
    API --> WS
    SP --> WS & API & BUS & SCHED
    AH --> WS & API & BUS & SCHED & SP
    RQS --> BUS & SP & AH
    WEB --> RQS & TQS

    style wave0 fill:#e8f0ff,stroke:#6688cc
    style wave1 fill:#dde8f8,stroke:#6688cc
    style wave2 fill:#d0e0f0,stroke:#6688cc
    style wave3 fill:#c4d8e8,stroke:#6688cc
    style wave4 fill:#b8d0e0,stroke:#6688cc
    style wave5 fill:#acc8d8,stroke:#6688cc
Hold "Ctrl" to enable pan & zoom

Shutdown proceeds in reverse wave order — WebApiService stops first, DatabaseService and WebsocketService stop last.


3. Event and Data Flow

Events flow from Home Assistant through a four-stage inbound pipeline. Outbound calls go through the Api handle back to HA via REST or WebSocket.

flowchart TD
    accTitle: Event and Data Flow
    accDescr: Inbound event pipeline and outbound API calls

    subgraph ha_in["Home Assistant"]
        HA_IN(("Inbound<br/>WS events"))
    end

    subgraph inbound["Inbound Pipeline"]
        WS["WebsocketService<br/><i>receive loop</i>"]
        ESS["EventStreamService<br/><i>memory channel</i>"]
        BS["BusService<br/><i>topic expand + filter</i>"]
        CE["CommandExecutor<br/><i>invoke + record</i>"]
        WS --> ESS --> BS --> CE
    end

    subgraph cache["State Cache"]
        SP["StateProxy"]
    end

    subgraph app["App"]
        Handler["on_* handler"]
    end

    subgraph outbound["Outbound"]
        AR["ApiResource<br/>(REST)"]
        WSOut["WebsocketService<br/>(WS send)"]
    end

    subgraph ha_out["Home Assistant"]
        HA_OUT(("Outbound<br/>WS / REST"))
    end

    HA_IN --> WS
    WS -. "state_changed<br/>(priority 100)" .-> SP
    CE --> Handler
    SP -. "self.states.*" .-> Handler
    Handler --> AR & WSOut
    AR & WSOut --> HA_OUT

    style ha_in fill:#f0f0f0,stroke:#999
    style ha_out fill:#f0f0f0,stroke:#999
    style inbound fill:#e8f0ff,stroke:#6688cc
    style cache fill:#f0f8e8,stroke:#88aa66
    style app fill:#fff0e8,stroke:#cc8844
    style outbound fill:#f8f0ff,stroke:#8866cc
Hold "Ctrl" to enable pan & zoom

StateProxy subscribes to state_changed events at priority 100, so its cache is always updated before any user handler sees the event. The CommandExecutor records every invocation to SQLite for the telemetry UI.

Failure Behavior
WS disconnect _make_connection retries up to 5 times (tenacity, exponential jitter). If serve() still fails, ServiceWatcher restarts the service per its RestartSpec (TRANSIENT, budget 5/300s).
Auth failure InvalidAuthError is a FatalError subclass — it bypasses ServiceWatcher entirely. _serve_wrapper catches it and calls handle_crash(), setting the service to CRASHED. Hassette shuts down immediately.
Handler timeout Logged, invocation recorded as timed-out
DB write failure 3 retries, then dropped with counter increment

4. Bus Internals

The Bus handle translates on_*() calls into Listener objects, which the shared BusService indexes by topic for fast dispatch.

flowchart TD
    accTitle: Bus Event Routing
    accDescr: From app subscription through predicate filtering to handler invocation

    subgraph registration["Registration"]
        on["Bus.on_*()"]
        pca["Predicates (P)<br/>Conditions (C)<br/>Accessors (A)"]
        L["Listener"]
        on --> pca --> L
    end

    subgraph routing["BusService Router"]
        exact["Exact topics<br/><i>light.kitchen</i>"]
        glob["Glob topics<br/><i>light.*</i>"]
    end

    subgraph dispatch["Dispatch"]
        match["Predicate check"]
        exec["CommandExecutor"]
        handler["App handler"]
        match --> exec --> handler
    end

    L -- "add_listener()" --> exact & glob
    exact & glob -- "event arrives" --> match

    style registration fill:#e8f0ff,stroke:#6688cc
    style routing fill:#f0f8e8,stroke:#88aa66
    style dispatch fill:#fff0e8,stroke:#cc8844
Hold "Ctrl" to enable pan & zoom

Topic expansion. A state_changed event for light.office produces three topics in specificity order: hass.event.state_changed.light.office, hass.event.state_changed.light.*, hass.event.state_changed.

Listener behaviors:

Option Effect
debounce=N Buffer events, fire only if quiet for N seconds
throttle=N Fire immediately, suppress for N seconds
duration=N Fire only if predicate still matches after N seconds
once=True Auto-remove after first invocation
priority=N Lower values dispatch first (StateProxy uses 100)

5. Scheduler Internals

The Scheduler handle wraps convenience methods (run_in, run_once, run_every, run_daily, run_cron, schedule) around trigger objects. All jobs end up in a shared min-heap inside SchedulerService.

flowchart TD
    accTitle: Scheduler Job Pipeline
    accDescr: From convenience methods through triggers to the dispatch loop

    subgraph api["Scheduler API"]
        methods["run_*() / schedule()"]
    end

    subgraph triggers["Triggers"]
        T["Trigger<br/><i>implements TriggerProtocol</i>"]
    end

    subgraph engine["SchedulerService"]
        heap["Min-heap<br/>by next_run"]
        loop["serve() loop"]
        exec["CommandExecutor"]
        heap -- "pop due" --> loop --> exec
    end

    methods --> T
    T -- "ScheduledJob" --> heap
    exec -. "re-enqueue<br/>if recurring" .-> heap

    style api fill:#e8f0ff,stroke:#6688cc
    style triggers fill:#f0f8e8,stroke:#88aa66
    style engine fill:#fff0e8,stroke:#cc8844
Hold "Ctrl" to enable pan & zoom

Built-in triggers: After (one-shot delay), Once (one-shot at time), Every (recurring interval), Daily (DST-safe cron), Cron (croniter expression). Custom triggers implement TriggerProtocol.

  • Daily uses cron internally for DST-safe wall-clock scheduling. A naive 24-hour interval would drift across DST transitions.
  • jitter adds random offset at enqueue time to spread concurrent starts.
  • Job groups (group=) enable bulk cancellation. Named jobs (name=) support deduplication via if_exists="skip".

6. Database Internals

Hassette stores all telemetry in a local SQLite database managed by DatabaseService. Schema migrations use SQLite's native PRAGMA user_version — no external migration tool.

Schema and Migrations

The migration runner reads PRAGMA user_version on startup and applies each numbered .sql file in order. Every migration runs inside BEGIN IMMEDIATE / COMMIT, with PRAGMA user_version = N as the final statement. A crash mid-migration leaves the database at the previous version; the next startup retries from where it left off.

When the on-disk schema version is older than the code expects, the runner applies forward migrations. When it is newer (database created by a newer binary), DatabaseService raises SchemaVersionError — a fatal error that stops startup and requires manual intervention rather than automatic deletion. When the database is corrupt or otherwise unrecoverable, deleting it is safe: telemetry is observability data and does not affect app execution. Hassette recreates an empty database on the next startup.

On a fresh database (user_version = 0), the runner configures auto_vacuum = INCREMENTAL via a separate sqlite3.Connection before any transaction, because PRAGMA auto_vacuum cannot be set inside BEGIN IMMEDIATE.

Unified Executions Table

Handler invocations and scheduled job executions are stored in a single executions table with a kind discriminator ('handler' or 'job'). Two nullable foreign keys — listener_id and job_id — point to the registration tables (listeners and scheduled_jobs respectively). A CHECK constraint enforces that exactly one is non-null per row.

Registration tables remain separate: listeners stores bus listener registrations with their natural key (app_key, instance_index, name, topic); scheduled_jobs stores scheduled job registrations.

Synchronous Registration

BusService and SchedulerService both declare depends_on: [DatabaseService], so the database is always ready before any listener or job registration can occur.

Each bus.on_state_change() (and all other bus.on_*() methods) awaits the database INSERT inline before returning. The listener's db_id is a valid integer immediately when the awaited call returns — there is no background registration task or deferred persistence.

The same applies to scheduler methods: scheduler.run_every() and all other scheduler.run_*() methods await the job registration before returning.


7. Api Internals

The per-app Api handle delegates all transport to shared singletons. Single-entity reads use REST; bulk reads and service calls use WebSocket.

flowchart TD
    accTitle: Api Transport
    accDescr: How per-app Api delegates to shared REST and WebSocket transports

    subgraph app["Per-App"]
        Api
    end

    subgraph transport["Shared Singletons"]
        AR["ApiResource<br/>(aiohttp)"]
        WS["WebsocketService"]
    end

    subgraph ha["Home Assistant"]
        REST["REST API"]
        WSAPI["WebSocket API"]
    end

    Api -- "get_state(id)" --> AR
    Api -- "call_service()<br/>get_states()" --> WS
    AR -- "HTTP" --> REST
    WS -- "WS frame" --> WSAPI

    style app fill:#e8f0ff,stroke:#6688cc
    style transport fill:#fff0e8,stroke:#cc8844
    style ha fill:#f0f0f0,stroke:#999
Hold "Ctrl" to enable pan & zoom
Method Transport Pattern
get_state(entity_id) REST GET /api/states/{id}
get_states() WebSocket get_states command
call_service() WebSocket fire-and-forget or send_and_wait
fire_event() WebSocket fire-and-forget

Auth: long-lived access token from HassetteConfig.token. Injected as Bearer header (REST) and auth handshake (WebSocket).


8. StateManager and StateProxy

StateProxy maintains an in-memory cache of all entity states. StateManager provides typed per-app access with Pydantic model validation and caching.

flowchart TD
    accTitle: State Management
    accDescr: How entity states flow from HA through the cache to typed app access

    subgraph sources["Cache Population"]
        bus_sub["Bus subscription<br/>(priority 100)"]
        poll["Periodic poll<br/>(run_every)"]
    end

    subgraph proxy["StateProxy"]
        cache["In-memory dict<br/>entity_id to HassStateDict"]
    end

    subgraph access["StateManager (per-app)"]
        attr["self.states.light<br/><i>DomainStates[LightState]</i>"]
        item["self.states[CustomState]<br/><i>DomainStates[T]</i>"]
        get["self.states.get(entity_id)<br/><i>raw lookup</i>"]
    end

    subgraph convert["Type Conversion"]
        SR["StateRegistry<br/>domain to model class"]
        TR["TypeRegistry<br/>scalar conversion"]
    end

    bus_sub --> cache
    poll --> cache
    cache --> attr & item & get
    attr & item --> SR & TR

    style sources fill:#f0f8e8,stroke:#88aa66
    style proxy fill:#fff0e8,stroke:#cc8844
    style access fill:#e8f0ff,stroke:#6688cc
    style convert fill:#f8f0ff,stroke:#8866cc
Hold "Ctrl" to enable pan & zoom
  • Read access is lock-free — CPython dict assignment is atomic; the proxy replaces whole objects rather than mutating.
  • DomainStates caches validated Pydantic models keyed by context_id (a UUID from HA). Same context ID = return cached model without re-validating.
  • On disconnect, StateProxy clears the cache and marks itself not-ready. On reconnect, it bulk-reloads via get_states_raw().

9. Web/UI Layer

The web layer is opt-in. WebApiService starts a uvicorn/FastAPI server. The frontend is a Preact SPA. Two data source services provide live and historical data.

flowchart TD
    accTitle: Web Layer
    accDescr: How the frontend connects to backend data sources

    subgraph browser["Browser"]
        SPA["Preact SPA"]
    end

    subgraph server["WebApiService"]
        rest["REST endpoints<br/>/api/health, /api/apps,<br/>/api/telemetry/*, ..."]
        ws["/api/ws<br/>WebSocket"]
        static["Static files<br/>SPA catch-all"]
    end

    subgraph data["Data Sources"]
        RQS["RuntimeQueryService<br/><i>live state, event buffer,<br/>WS broadcast</i>"]
        TQS["TelemetryQueryService<br/><i>SQLite: listeners, jobs,<br/>errors, sessions</i>"]
    end

    SPA -- "fetch" --> rest
    SPA <-- "push events" --> ws
    rest --> RQS & TQS
    ws --> RQS

    style browser fill:#e8f0ff,stroke:#6688cc
    style server fill:#fff0e8,stroke:#cc8844
    style data fill:#f0f8e8,stroke:#88aa66
Hold "Ctrl" to enable pan & zoom
  • RuntimeQueryService subscribes to bus events and fan-out broadcasts to all connected WebSocket clients via asyncio.Queue per client.
  • The SPA catch-all returns index.html for all non-asset paths, enabling client-side routing.
  • When config.run_web_api is False, the service blocks on shutdown_event.wait() without binding a port, preserving the dependency graph.

10. Resource Lifecycle

Every component extends Resource (synchronous init) or Service (long-running serve() loop). The LifecycleMixin provides status transitions and readiness signaling.

State Transitions

flowchart TD
    accTitle: Resource Lifecycle States
    accDescr: Status transitions for all framework components

    NOT_STARTED:::neutral -- "start()" --> STARTING:::active
    STARTING -- "handle_running()" --> RUNNING:::active
    RUNNING -- "shutdown()" --> STOPPING:::active
    STOPPING -- "handle_stop()" --> STOPPED:::neutral

    STARTING -- "error" --> FAILED:::error
    RUNNING -- "error" --> FAILED
    RUNNING -- "FatalError" --> CRASHED:::error
    FAILED -- "restart()" --> STARTING
    FAILED -- "PERMANENT\nexhausted" --> CRASHED
    FAILED -- "TEMPORARY\nexhausted" --> EXHAUSTED_DEAD:::error
    FAILED -- "TRANSIENT\nexhausted" --> EXHAUSTED_COOLING:::error
    EXHAUSTED_COOLING -- "after cooldown" --> STARTING
    EXHAUSTED_COOLING -- "cooldown limit\nexceeded" --> EXHAUSTED_DEAD

    classDef neutral fill:#f0f0f0,stroke:#999,color:#333
    classDef active fill:#e8f0ff,stroke:#6688cc,color:#333
    classDef error fill:#ffe8e8,stroke:#cc6666,color:#333
Hold "Ctrl" to enable pan & zoom

Readiness vs. Running

These are separate concerns that are easy to confuse:

Concept Method What it does Who calls it
Status handle_running() Sets RUNNING, emits event Framework (automatic)
Readiness mark_ready() Unblocks depends_on waiters Resource: end of on_initialize(). Service: inside serve() once processing

A component can be RUNNING but not ready (still initializing internal state), or ready but not yet RUNNING (edge case during transition).

Wave Startup and Shutdown

Dependencies are computed into topological levels. Within a wave, the framework calls start() on each child so their initialization can proceed concurrently, then waits for all to become ready before starting the next wave.

Shutdown proceeds in reverse wave order. A per-wave timeout triggers _force_terminal() on non-compliant children, which recursively force-stops without running hooks (accepted risk for stuck services).

Service Supervision

When a Service transitions to FAILED, ServiceWatcher reads that service's restart_spec class attribute and drives the restart decision. Every Service subclass declares a RestartSpec as a class-level attribute; services that don't declare one inherit the default (TRANSIENT, budget 5/300s).

RestartSpec

RestartSpec is a frozen dataclass in hassette.resources.restart:

Field Type Default Description
restart_type RestartType TRANSIENT Strategy governing restart and exhaustion behavior
non_retryable_error_names tuple[str, ...] () Exception names that skip restart and go straight to exhaustion handling
fatal_error_names tuple[str, ...] () Exception names that trigger immediate system shutdown
backoff_base_seconds float 2.0 Initial delay before first restart attempt
backoff_multiplier float 2.0 Factor applied to backoff on each successive attempt
backoff_max_seconds float 60.0 Maximum backoff delay
budget_intensity int 5 Maximum restarts allowed within the sliding window
budget_period_seconds float 300.0 Sliding window size in seconds
startup_timeout_seconds float 30.0 How long to wait for mark_ready() after a restart
cooldown_seconds float 300.0 Duration of the long-cooldown phase (TRANSIENT services only)
max_cooldown_cycles int 0 Maximum cooldown cycles before transitioning to EXHAUSTED_DEAD; 0 means infinite

Usage:

from hassette.resources.restart import RestartSpec
from hassette.resources.service import Service
from hassette.types.enums import RestartType


class MyService(Service):
    restart_spec = RestartSpec(
        restart_type=RestartType.TRANSIENT,
        budget_intensity=3,
        budget_period_seconds=120,
        fatal_error_names=("SchemaVersionError",),
    )

RestartType

RestartType is a StrEnum with three values:

Value Behavior when budget is exhausted
PERMANENT Transitions to CRASHED and triggers system shutdown. Used for services that are structurally required (BusService, SchedulerService).
TRANSIENT Enters a long cooldown (EXHAUSTED_COOLING), then resets the budget and retries. Useful for services with intermittent failures (WebsocketService, DatabaseService).
TEMPORARY Transitions to EXHAUSTED_DEAD — no further restarts. Used for optional background services (FileWatcherService, WebUiWatcherService).

Sliding-Window Budget

RestartBudget tracks restart timestamps within a rolling time window. When the number of recorded restarts within the window reaches budget_intensity, the budget is exhausted.

The window slides continuously: a restart from 10 minutes ago no longer counts against the budget if budget_period_seconds is 300. When a service successfully reaches RUNNING and signals readiness, the budget resets automatically — brief instability followed by a successful recovery doesn't accumulate permanently toward exhaustion.

Three-Layer Error Routing

ServiceWatcher.restart_service() evaluates each FAILED event through three layers before deciding to restart:

  1. FatalError subclasses — raised inside serve(), caught by the service wrapper, route directly to CRASHED status and shutdown. These bypass ServiceWatcher entirely.
  2. fatal_error_names — exception type names checked by ServiceWatcher on FAILED events. Triggers immediate system shutdown even if restarts remain in the budget.
  3. non_retryable_error_names — exception type names checked by ServiceWatcher. Skips the restart entirely and jumps directly to exhaustion handling.

Errors that don't match any of the above proceed through the normal restart flow: budget check → exponential backoff → restart.

New Statuses

Two statuses represent exhaustion states specific to services:

Status Meaning
EXHAUSTED_DEAD Budget exhausted, no further restarts will occur. Terminal state.
EXHAUSTED_COOLING Budget exhausted; service is in long-cooldown before budget reset and retry.

Per-Service Restart Specs

Service Type Budget (intensity/period) Notes
BusService PERMANENT 2 / 30s Structural — shutdown if it can't stay up
SchedulerService PERMANENT 2 / 30s Structural — shutdown if it can't stay up
WebsocketService TRANSIENT 5 / 300s startup_timeout=60s — HA may take time to come back
DatabaseService TRANSIENT 3 / 120s fatal_error_names=("SchemaVersionError",)
WebApiService TRANSIENT 3 / 60s
CommandExecutor TRANSIENT 3 / 120s
FileWatcherService TEMPORARY 3 / 60s Optional — stops permanently on exhaustion
WebUiWatcherService TEMPORARY 3 / 60s Optional — stops permanently on exhaustion