Deal-Hunter Agent — Hardened Plan v2 (post red-team #1 + OSS research)¶
Date: 2026-06-24 · Supersedes v1. To be red-teamed again before finalizing.
0. The four reframes forced by red-team #1¶
- Legal → human-in-the-loop is mandatory. eBay's 2026-02-20 User Agreement bans automated "data-gathering/extraction tools" and "any end-to-end flow that places orders without human review." Its API License also forbids feeding eBay content to an LLM and requires <6h purge. So: no fully-autonomous eBay bidding. Design = auto-discover → auto-verify → auto-rank → one-click human approval → Gixen snipe. (This overrides the earlier "full-auto" choice; full-auto eBay = account-ban risk. Full-auto remains possible only in the narrow, compliant sense of "Gixen fires the pre-approved max at close.")
- eBay content is processed deterministically — never by the LLM. Regex/spec gates only; purge raw eBay content <6h; never send seller PII to Claude.
- Drop the classifieds scrapers entirely. Marktplaats/2dehands/Kleinanzeigen/Leboncoin run DataDome/Akamai, ban + sue scrapers. Use ONLY native saved-search email alerts (Gmail, DKIM/SPF-verified) → re-resolve via official means. LLM free-text extraction applies to classifieds text only.
- Valuation must be live; budget must be reserved. Static
market_refis fatal in the 2026 DRAM spike — use a rolling trimmed-median with a staleness gate. Global budget = a reserved-commitment ledger (Σ live max-bids ≤ cap), not a per-item check.
1. Revised architecture¶
┌──────── Orchestrator: APScheduler 3.x (cron discover + DateTrigger snipe) ────────┐
eBay Browse API ─▶ Ingest(B1) ─▶ [deterministic spec parse] ─┐
Gmail saved-search ─▶ Ingest(B1) ─▶ Extract(B2, LLM+injection-guard) ─▶ Verify(B3) ─▶ Value(B4) ─▶ Decide ─▶ APPROVE(human) ─▶ Act(B5 Gixen)
│ └─ Notify(B7) digest/approval
└──────── SQLAlchemy2+Alembic / SQLite WAL (state, ledger, audit, <6h eBay TTL) ───────┘
Cross-cutting from day 1: Security(B-sec) · Tests/Mocks/CI(B8) · Observability+dead-man-switch · Docs/ADRs
2. Functional blocks + chosen OSS stack¶
| # | Block | Stack (maintained 2026) | Key hardening |
|---|---|---|---|
| B1 | Ingestion | httpx + ~30-line OAuth client-credentials helper + tenacity + pyrate-limiter(SQLite backend) for eBay; Gmail MCP for classifieds alert emails |
eBay token TTL 2h (refresh ~90%); 5k/day cap tracked persistently; <6h cache TTL for raw eBay content (separate store from rate-limit counter); DKIM/SPF-verify alert emails, re-resolve item by id, allowlist URL hosts, block private IPs (SSRF) |
| B2 | Extraction (classifieds free-text ONLY) | Claude native structured outputs (messages.parse, strict schema) + optional instructor; protectai/llm-guard PromptInjection scanner |
Listing text = data, wrapped in delimiters; operator instructions only in system; regex cross-validation: LLM may only narrow, never override (Pydantic @model_validator intersects LLM vs regex); ignore self-reported confidence for escalation; LLM output never touches price/budget/seller-facing text |
| B3 | Compatibility gates | plain typed predicates over a Module dataclass (+ optional rule-engine BSD); local SQLite part-number→spec DB seeded from JEDEC JESD21-C/400-5 offsets + spdr/decode-dimms parsers + Kingston/Crucial catalogs |
~20 hard gates (SODIMM≠DIMM, DDR4≠DDR5, 1.2V≠1.35V, ¬ECC, rank, pair); PN diff vs canonical; unknown PN = "unverified", not PASS; matched-pair gated for (2,x) |
| B4 | Valuation | numpy+scipy.stats (MAD-trimmed median) + pandas/river (trend+staleness) + CurrencyConverter (offline ECB FX, timestamped) |
Live rolling market_ref per (capacity,modules,speed,condition,region); staleness gate → ALERT-only if old/thin; landed cost = price+ship+FX+VAT 21%+customs+GSP fee+payment FX+E[DOA loss]; sold-comps gated → fall back to Browse active-ask distribution + request Insights |
| B5 | Bidding + budget | httpx → Gixen api.php (notags=1); reserved-budget ledger (atomic SQL); python-statemachine lifecycle |
UPDATE budget SET committed=committed+:max WHERE committed+:max<=cap (race-free); idempotency UNIQUE(account,ebay_item_id) + write-ahead PENDING intent row before Gixen call; FSM PENDING→REGISTERED→VERIFIED→FIRED→WON/LOST; re-verify snipe at T-30m; risk-coupled, jittered max bid; human approval gate before register |
| B6 | Orchestration/persistence | APScheduler 3.x (AsyncIO + SQLAlchemyJobStore) + SQLAlchemy 2.x + Alembic + pydantic-settings+keyring + tenacity/purgatory |
Event-driven: cron discovery (15–30m) + DateTrigger(close_T−lead) per soon-ending auction; SQLite WAL+busy_timeout+single-writer; jobs persist→crash-resume; startup reconciliation vs Gixen list-snipes; tz-aware UTC everywhere |
| B7→B8 | Tests/CI/Obs/Docs (DAY 1) | pytest+pytest-asyncio+pytest-cov+hypothesis+mutmut; respx+vcrpy/pytest-recording; WireMock + fake-bidder FastAPI for live-mock; GitHub Actions; pre-commit; ruff+mypy; uv lock; bandit+pip-audit+gitleaks/trufflehog; structlog(redaction)+OpenTelemetry+healthchecks.io; mkdocs-material+ADRs |
autouse network-egress blocker in tests (no real bid ever); BIDDING_ENV=mock refuses non-localhost bidding host; secret redaction in logs+cassettes; dead-man's-switch alerts if run misses an auction-close window |
3. Consolidated safety/compliance guardrails (hard-coded, non-optional)¶
- Human approval before any binding bid (default-on, not a toggle). Digest/ALERT is the product surface; snipe only on explicit per-item confirm.
- eBay listings: deterministic only, raw content purged <6h, no seller PII to LLM, ≤5k calls/day.
- No scraping — classifieds via native email alerts only; any source without alerts is excluded.
- EU-only auto-bid; non-EU items → ALERT (human reviews import/VAT).
- Reserved-budget ledger + global cap + per-item cap + max-concurrent-open-snipes; fail-closed kill-switch & DRY_RUN-default-true (must be auditably armed); jittered sub-market max bids; secret redaction; GDPR retention TTL; personal-use only.
4. Revised decision logic¶
landed = price + shipping + fx_fee + import_vat(country) + customs + gsp_fee + E[DOA_loss(tested,rating,protection)]
market_ref = staleness_guarded( MAD_trimmed_median( recent_sold_comps[capacity,modules,speed,cond,region] ) )
deal = clamp((market_ref - landed)/market_ref, 0, 1)
risk = Σ wᵢ·platform_verifiable_featureᵢ # seller rating/age/feedback, buyer-protection, return-policy ONLY; self-claims neutral
adjusted = deal · risk
max_bid = min( market_ref·0.80·risk − jitter, MAX_BID_CAP ) # risk-coupled, jittered, below true value, in AUCTION's native currency + FX buffer
RESERVE budget atomically; DECIDE BID(→human approve) / ALERT / ASK-SELLER / SKIP
reservation policy: secretary-style over the buying window (strict early, relax near deadline); BIN/Best-Offer compared as first-class alternative; never greedily spend the whole cap on the first mediocre deal
5. Roadmap (engineering baked in from v0)¶
- v0 — Foundation (1–2d): repo +
uv+ruff/mypy/pre-commit+ GitHub Actions skeleton + SQLAlchemy/Alembic schema + pydantic-settings/keyring + structlog + healthchecks ping + WireMock fakes + network-egress test blocker + first ADRs + mkdocs. No business logic ships without a test + a migration + a doc. - v1 — Alert-only digest (3–4d): B1(eBay+Gmail)→B2/B3 verify→B4 value→B7 digest. Golden-set (≥50 labelled listings incl. injection/DIMM-mislabel/1.35V/ECC) gates CI. No bidding.
- v2 — Approved snipe (2–3d): B5 ledger+Gixen+FSM, human-approval gate, DRY_RUN shadow 2 weeks, live-mock end-to-end, reconciliation. Arm only after shadow validation.
- v3 — Classifieds assist (2–3d): email-alert ingestion (NL/BE/FR), ASK-SELLER auto-draft (templated, no budget leak), MemTest86-on-receipt as a cost input.
- v4 — Generalize: pluggable compatibility profiles (SSD/GPU); revisit GDPR if scope leaves personal use.
6. Iterative update & "keep-everything-in-sync" strategy (your new requirement)¶
Definition of Done (enforced in CI, every change): code + unit test + property test (for money/parse) + migration (if schema) + docs/ADR update + passing live-mock + coverage ≥ gate. A PR that changes one without the others fails CI.
- Contract tests pin the shape of each external API (eBay/Gixen/Gmail/Anthropic) via recorded vcrpy cassettes; a nightly "drift" job replays against real sandboxes and alerts on schema drift → forces a synced update of client + mock + test together.
- Schema/state sync: Alembic migration required for any model change; CI runs upgrade→downgrade→upgrade; a check fails if models and migrations diverge.
- Prompt/version sync: Claude model id + extraction prompt + output schema are versioned together; the golden-set regression must pass before a prompt/model bump merges.
- Config sync: pydantic-settings is the single typed source; a CI check fails if .env.example, the settings model, and the docs table diverge.
- Mock/real parity: every external call goes through one client module; its WireMock stub + respx fixture live beside it; a "parity" test asserts the mock satisfies the same contract test as the recorded real response.
- Feature flags + incremental rollout: new sources/logic land behind flags (default off), validated in shadow/DRY_RUN, then enabled. The dead-man's-switch + daily digest header (DRY_RUN state, baseline age, discovery counts, budget committed) keep the human aware each day.
- Dependency sync: uv.lock committed; Renovate/Dependabot PRs run the full gate; pip-audit blocks known-vuln deps.
7. Antipatterns to avoid (from red-team)¶
LLM-as-source-of-truth for specs/prices; trusting self-reported "tested"/screenshots as positive signal; per-item-only budget checks; idempotency keyed on local id; fixed-formula guessable max bid; static baseline; daily-cron for time-critical snipes; fail-open safety; logging request bodies/secrets; scraping behind ToS; treating zero-results as success; naive local datetimes.
8. Open decisions¶
Budget cap + thresholds + 0.80 factor (calibrate in shadow); sold-comp source (pursue Insights vs active-ask proxy); host (local Task Scheduler vs small VPS — needs reliable wake before auction close); Postgres vs SQLite if concurrency grows; whether to keep eBay at all given its API-License LLM clause (mitigated by deterministic-only path).