Borrow TTL, Auto-Return, and Session Persistence¶
Generated: 2025-08-31 (local time)
This document describes how the Hub manages the lifetime of borrowed Playwright sessions using a lease/TTL, how timed-out sessions are auto-returned to the pool, and how session metadata is persisted in Redis to survive Hub restarts.
Overview¶
- When a client borrows a browser via the Hub, the Hub now creates a short-lived lease and persists minimal session metadata in Redis.
- If the borrower does not return the session within the TTL, a background sweeper automatically returns the capacity to the pool and asks the Worker to recycle the browser instance.
- Persisted session metadata enables the Hub to resume management after restarts.
Configuration¶
Environment variables (Hub): - HUB_BORROW_TTL_SECONDS - Default: 900 (15 minutes). Minimum enforced: 60 seconds; maximum: 24 hours. - Sets the default lease time for a borrowed session. - HUB_BORROW_TTL_SWEEP_SECONDS - Default: 10 seconds when not specified. - Controls the interval for the background sweeper that checks for expired leases and auto-returns.
Per-request override (borrow body): - ttlSeconds (integer) can be supplied in the /session/borrow request body to override the default. - Enforced bounds: 60 <= ttlSeconds <= 86400.
API Usage¶
Borrow endpoint:
- POST /session/borrow
- Headers: x-hub-secret:
Return endpoint:
- POST /session/return
- Headers: x-hub-secret:
Redis Keys and Lifecycle¶
The following keys are written and maintained by the Hub: - available:{labelKey} (list): items ready to be borrowed. - inuse:{labelKey} (list): items currently borrowed; each item is a JSON blob including browserId, nodeId, etc. - session:{browserId} (hash): persisted metadata for a borrowed session. - Fields: - browserId: string - labelKey: string - runId: string - nodeId: string (may be empty) - borrowedAtUtc: ISO-8601 UTC timestamp - ttlSeconds: string-int value of the lease - borrow_ttl:{browserId} (string): the lease key with an expiration equal to the TTL. Presence indicates the lease is still active. - browser_run:{browserId} (string, TTL ~ 6h): lightweight mapping to attribute logs to a runId. - browser_test:{browserId} (string, transient): optional mapping used by log forwarding. - recycle:{browserId} (string, ~2 minutes): marker that asks the Worker to tear down the sidecar/browser and replenish capacity.
Lifecycle summary: 1. Borrow success → move one item from available: to inuse: atomically (Lua), set borrow_ttl and write session:{browserId}. 2. Client uses the browser (WS connection is between client and Worker). 3. Return path: - If client returns before TTL: Hub moves item back to available:, cleans lease/session keys, posts recycle marker, updates results/logs. - If client does not return and TTL expires: BorrowTtlSweeperService detects missing borrow_ttl, atomically moves the in-use item back to available:, cleans up session and mappings, and sets recycle marker.
Auto-Return Sweeper¶
BorrowTtlSweeperService (Hosted Service): - Runs periodically (HUB_BORROW_TTL_SWEEP_SECONDS; default 10s). - Scans session:* keys. For each session: - If borrow_ttl:{browserId} exists → lease still active, skip. - If lease is missing → TTL expired: - Atomically move from inuse:{labelKey} to available:{labelKey} using the same return Lua script as the /session/return endpoint. - Clean up: browser_run:/browser_test:, borrow_ttl:, session: keys. - Emit a recycle:{browserId} marker for the Worker to refresh capacity. - Logs one line per sweep with processed, returned, and error counts.
Notes: - The sweeper tolerates Hub restarts: persisted session:{browserId} holds the context needed to return capacity even if the Hub was down when the TTL elapsed. - Capacity queue signaling is not required for correctness; availability is restored and new borrows will see it. Future enhancements may add queue wake-ups from the sweeper.
Operational Guidance and Troubleshooting¶
- Extending a lease mid-run: not currently supported via an API; borrowers should choose an appropriate ttlSeconds when borrowing.
- If you see sessions sticking in inuse: without a matching borrow_ttl:, the sweeper will clean them on its next pass.
- Use Prometheus metrics and Dashboard to observe borrow outcomes and pool utilization. TTL expirations currently do not have a dedicated metric, but can be inferred from logs and capacity changes.
- For long-running tests, increase HUB_BORROW_TTL_SECONDS or set ttlSeconds per request.
- Ensure clocks are reasonably in sync across Hub and Workers; the Hub uses UTC timestamps for session metadata and a separate alive TTL for worker liveness.
Examples¶
Borrow with a custom TTL (20 minutes):
Request:
POST /session/borrow
x-hub-secret: runner-secret
Content-Type: application/json
{
"labelKey": "AppB:Chromium:UAT",
"ttlSeconds": 1200
}
Return:
POST /session/return
x-hub-secret: runner-secret
Content-Type: application/json
{
"labelKey": "AppB:Chromium:UAT",
"browserId": "b-123"
}
Compatibility and Defaults¶
- Backward compatible: existing clients that do not send ttlSeconds inherit the default HUB_BORROW_TTL_SECONDS (15 minutes by default).
- Hub restarts: persisted session metadata allows the Hub to reconcile and auto-return expired sessions post-restart.
Related Documents¶
- docs/tasks.md (Improvement #12)
- Configuration and deployment: see README.md and docker-compose.yml for environment wiring.