Decisions¶
Lightweight decision log. Plan-affecting or plan-extending choices go here so code and
docs/plan-final.md never drift. Newest first.
2026-06-28 -- PyPI packaging: dist name qmaster + Trusted Publishing¶
quartermaster is taken on PyPI (the unrelated 2017 package), so the DISTRIBUTION name is qmaster
(pip install qmaster). The import package stays quartermaster; both quartermaster and a new
qmaster CLI alias point at the same entry point. Publishing uses Trusted Publishing (OIDC) via
.github/workflows/release.yml -- NO API token stored anywhere (matches the no-secrets ethos): a PyPI
pending-publisher trusts the repo's release workflow, which uv builds + uv publish
--trusted-publishing always on a published Release or manual dispatch (environment: pypi,
id-token: write). pyproject gained authors (Wombat164, no email), urls, keywords, classifiers, and a
PEP-639 license; uv build + twine check both pass. The README pip install line flips to qmaster
only AFTER the first publish lands (so it never points at a package that isn't live).
Published 2026-06-28 (manual workflow dispatch, OIDC, no token) -- qmaster 0.1.0 (wheel + sdist)
live at https://pypi.org/project/qmaster/; README + opt-in table flipped to pip install qmaster.
The pip path's Quickstart uses an inline stdin demo (the bundled examples/sample-alert.eml ships only
in the git tree, not the wheel); the from-source path keeps the sample command. Future GitHub Releases
auto-publish via the same workflow.
2026-06-28 -- Cleared the deferred audit items (Dependabot, v0.1.0, history rewrite)¶
Operator-authorized the items the external audit had deferred:
- Merged the 5 Dependabot PRs -- Actions to Node-24 majors (checkout v7, gitleaks v3, pages v5/v6),
still SHA-pinned; clears the Node-20 deprecation. (Caught + fixed a uv.lock drift the version bump
introduced.) CI green.
- Released v0.1.0 -- first tagged Phase-1 release: version 0.0.0 -> 0.1.0, uv.lock refresh,
annotated tag, GitHub Release. Adopters can pin git+...@v0.1.0.
- Rewrote history -- scrubbed the author email dev@vdhome.be -> the GitHub noreply alias across all
30 affected commits (filter-branch, targeted so Dependabot[bot] attribution survived), force-pushed;
git config set so future commits use noreply. origin/main now has 0 occurrences. (The email was
already public pre-rewrite, so this removes it from the reachable tip, not from existing clones/caches.)
- Still NOT done (need credentials / the Settings UI): publish to PyPI under an owned name; set the
social-preview image.
- CI + docs green on the rewritten HEAD.
2026-06-28 -- External gap + security audit (public repo, from the outside)¶
Two adversarial reviews from a fresh public clone + GitHub/PyPI/Pages recon. CORE security posture verified SOUND: gitleaks clean over all history, pip-audit clean (base+dev+gmail, no CVEs), all 7 Actions SHA-pinned, uv.lock hash-pinned, no workflow-injection, egress/DRY_RUN/SecretStr/LLM-boundary claims all hold. Findings + fixes (this commit):
- Dependency-confusion (headline): the README told adopters
pip install quartermaster, but PyPIquartermasteris an unrelated dormant 2017 package -> they'd install a stranger's code. Fixed: README installs from git (uvx/uv tool install/pipx/pip install git+...), states Python 3.13+, warns off the PyPI name. - Zero-setup overstated: no sample shipped, so the Quickstart produced nothing. Added
examples/sample-alert.eml; the Quickstart points at it (verified -> a 1-row digest). - README lower half read as "go build v0 / resolve Gate 0" (contradicting "runnable, tested"):
replaced with a Development section; Tech blurb corrected to the ACTUAL stack (dropped keyring /
tenacity / numpy / scipy / vcrpy / ... that aren't used); added a CI badge + "personal project" note;
removed
KICKOFF.md(internal agent-scaffolding clutter) from the public root. - SECURITY.md overclaim fixed: dropped "free of personal data" (git commit metadata carries the author email).
- docs.yml least-privilege:
id-token: writescoped to the deploy job only. - Added
CONTRIBUTING.md; set repo topics; de-named thercdecodename at HEAD. - 212 tests still pass; ruff/mypy-strict/bandit clean.
DEFERRED (operator calls, surfaced not done): publish to PyPI under an owned name (vs the git
install); cut a v0.1.0 tag + Release; set a social-preview image; merge the open Dependabot PRs; a
git-history rewrite to remove the author email + the rcde codename from past commits (it remains in
history -- the email is already public, so weigh the force-push cost).
2026-06-28 -- Final pass: healthchecks ping, security docs, Action SHA-pins, codename scrub¶
Tidy-up (the remaining Phase-1 plumbing + red-team follow-ons):
- healthchecks.io ping (
health.py): best-effort GET toQM_HEALTHCHECKS_PING_URLat the end of a successful run (a dead-man's-switch for scheduled runs); never raises. Wired into the CLI. 3 tests. - SECURITY.md: documented email-input credentials -- IMAP app-password (full-mailbox scope -> use a
dedicated account) + the Gmail OAuth token (live credential under
data/, revoke via Google; the 7-day restricted-scope caveat). - GitHub Actions pinned to commit SHAs (
ci.yml+docs.yml, version in a comment): a moving tag can't be repointed to malicious code. Dependabot bumps them. - Scrubbed an internal codename from the kickoff prompt (renamed it to
KICKOFF.md, de-codenamed its title); "private" -> public; README updated. (Git history retains the prior names.) - 212 tests pass; ruff/mypy-strict/bandit clean.
Phase-1 is feature-complete: provider-agnostic input -> ingest (regex + guard + LLM cross-val) -> fitment -> valuation (landed / ECB-FX / live+bootstrap baseline) -> deal% -> surface -> ranked digest, runnable end-to-end, gated by the golden set. Remaining is Phase-2 (bidding) + ops (a scheduler).
2026-06-28 -- Golden set: the v1 CI quality gate (53 labeled cases)¶
tests/test_golden.py -- a labeled corpus asserted against the funnel so extraction, fitment, and
the safety boundaries can't silently regress (plan sec.7/sec.9 v1 gate). 53 cases:
- fitment (26): happy / wrong-gen / wrong-form / oversized / too-many / buffered / ECC -> UNVERIFIED / 1.35V -> PASS+note / unknown-critical -> UNVERIFIED / non-positive -> REJECT.
- extraction (16): structured + spaced kits, single sticks, DDR3/4/5, the desktop-UDIMM mislabel, RDIMM, and price formats incl. the "DDR4 EUR 80" not-glued-digit trap + EU/US separators.
- injection (8): each body the guard must flag -> the funnel returns LOW confidence + deterministic- only (never silently trusts an injected listing).
- source-leak (3): only CLASSIFIEDS_EMAIL is LLM-eligible; SERPAPI / EBAY raise
LlmBoundaryViolation. - a meta-test keeps the corpus >= 50.
209 tests pass; ruff/mypy-strict/bandit clean. The injection corpus is the deterministic seed; live- LLM-extraction injection cases grow it when a live LLM run is wired.
2026-06-28 -- OSS-adoptability: provider-agnostic email input (reverses Gmail-API-as-core)¶
Red-team (2 agents: adoptability audit + prior-art research) found the Gmail-API input undermined the
"easily adopted by others" goal: google libs in CORE deps (pulled requests/httplib2/protobuf for
every adopter, incl. zero-key users), a 20-40 min GCP+OAuth onboarding, and the silent
gmail.readonly RESTRICTED-scope trap (unverified personal app -> refresh token expires after ~7
days). And Gmail was the ONLY live reader (provider lock-in). The RawListing boundary was already
clean, so the fix was cheap. Reversed to the idiomatic fetch/parse split + extras pattern (notmuch /
DVC):
mail.py(core, stdlib-only):parse_email(raw) -> RawListing(stdlibemail, RFC822; text/plain preferred, stripped-html fallback; subject->title; first link->url) + readersread_path(.eml parsed, .txt = bare body),read_mbox,read_stdin,read_imap(stdlibimaplib, ANY provider via app-password). No third-party deps.- Default
QM_MAIL_SOURCE=file-- drop.eml/.txtin a dir (or pipe to stdin) -> digest, no account, no keys, no extras. IMAP is the provider-agnostic live path. - Gmail API demoted to the optional
[gmail]extra; google libs OUT of core;gmail.pyslimmed to sharemail.parse_email(fetchformat=raw); CLI errors friendly if the extra is missing. Its caveats (7-day token, whole-mailbox scope) are documented in the file. (Supersedes the P1.3c entry.) - stdin also answers the "use a provider's Gmail MCP" idea for free: an agent fetches + pipes the
.eml; Quartermaster owns no provider code in that path. - config:
mail_source+mail_path+imap_*(password = SecretStr) +gmail_*paths;.env.exampleregenerated (Path defaults viaas_posix, can't drift cross-OS). Verified:printf ... | QM_MAIL_SOURCE=stdin python -m quartermasterrenders a digest. 155 pass; clean.
2026-06-28 -- P1.3c: Gmail one-label reader (read-only OAuth)¶
Operator chose the Gmail API (read-only OAuth) over IMAP. gmail.py, same two-layer shape as the
other adapters:
- PURE core (
message_to_raw,read_messages): a Gmail message resource (dict) ->RawListing(plaintext body; text/plain preferred, else stripped text/html; subject -> title; first link -> url). base64url-decoded, malformed-message-safe. 5 tests, no API. - WIRING (
load_credentials,read_label): scopegmail.readonly; loads/refreshes the token or runs the InstalledApp consent flow on first run; lists + gets messages under one label -> RawListings. OAuth client secret + token live underdata/(gitignored), never committed. Not CI-tested (needs real Google credentials); the google libs are lazy-imported + mypy-overridden (untyped SDK boundary). - Adds google-api-python-client + google-auth-oauthlib. 154 pass; ruff/mypy-strict/bandit clean.
- NEXT: wire it into the CLI (config: label + token/secret paths) so a run reads the alert label; then the healthchecks ping + the golden set.
2026-06-28 -- P1.4c: SerpApi live-baseline resolver (real comps -> APPROVE-eligible)¶
pipeline.make_baseline_resolver(fetch, fx, fallback=bootstrap_baseline) returns a BaselineResolver
tying the existing pieces -- query_for(spec) -> fetch_shopping_comps -> live_baseline -- into
real market comps, falling back to the bootstrap table when comps are thin (<5) OR the fetch fails
(SerpApiError / httpx errors caught -> bootstrap; never crashes the pass). Caches by query within a
pass (one fetch per distinct spec).
- CLI uses the live resolver when
QM_SERPAPI_API_KEYis set, else bootstrap (logged as thebaselinemode); the key is read once + bound into the fetch closure. - This is what unlocks APPROVE-eligible items: bootstrap (cold-start) keeps everything ALERT; a LIVE baseline + EU + EUR + fresh FX -> APPROVE_ELIGIBLE. Demo verified (live EUR 150 ref -> two APPROVE rows ranked by deal%).
- 4 tests (live comps -> LIVE; thin -> bootstrap; SerpApi error -> bootstrap; cache = 1 fetch/query). 149 pass; ruff/mypy-strict/bandit clean.
- REMAINING Phase-1: P1.3c the Gmail one-label reader (feeds RawListings; needs the access-method decision), healthchecks ping, golden set.
2026-06-28 -- P1.4b: assemble() + CLI -- runnable end-to-end¶
pipeline.py + __main__.py make the funnel runnable. assemble(extracted, *, profile, fx,
baseline_for, today) -> DigestItem | None runs one ExtractedListing through fitment + landed cost +
baseline -> surface (None when no price). run_pass() batches it. null_extractor is a
deterministic-only LLM, so the funnel runs on the regex extractor alone when no Anthropic key exists.
- CLI (
python -m quartermaster,[project.scripts]): reads classifieds body.txtfiles, runs the funnel, prints the digest. Runnable with NO live keys (deterministic extraction + bootstrap baseline); uses Claude whenQM_ANTHROPIC_API_KEYis set; structured logging (secret-redacted). - Phase-1 assembly defaults: classifieds = EU; EU + EUR -> costs-complete (price is the landed cost, domestic pickup); else costs-incomplete -> ALERT. Injected: LLM, FxSnapshot, baseline resolver, today.
- Demo verified: 3 fixtures -> the DDR5 kit dropped (incompatible with the DDR4 G513QR), 2 ALERT rows ranked by deal%.
- 8 tests (assemble paths, null_extractor, run_pass, CLI smoke x2 isolated from any local
.env). 145 pass; ruff/mypy-strict/bandit clean. - REMAINING Phase-1: a SerpApi live-baseline resolver (swap bootstrap for live comps), P1.3c the Gmail one-label reader (feeds RawListings), healthchecks ping, golden set.
2026-06-27 -- P1.4a: the ranked digest (the human-facing surface)¶
digest.py -- Phase-1's only output. DigestItem (listing identity + EvaluatedListing +
extraction confidence) -> rank() (drop DROP-surface; APPROVE-eligible first, then deal% desc) ->
render_digest() (ASCII; one-line header scorecard: DRY_RUN / FX age / approve+alert counts; per-row
tag, deal%, landed EUR, confidence, and the demoting reason). Pure + deterministic, ASCII-only.
- 4 tests (tier-then-deal ordering + DROP excluded; header + rows; alert reason on the row; empty / drop-only). 137 pass; ruff/mypy-strict/bandit clean.
- The pipeline now renders end-to-end (sample verified): listing -> fitment -> landed + FX -> baseline -> deal% -> surface -> ranked digest.
- REMAINING Phase-1 (I/O + assembly, none safety-critical): an
assemble()tying ExtractedListing -> DigestItem (one digest pass), P1.3c the Gmail one-label reader, the CLI entry point, the healthchecks ping, and the golden set. The funnel + cross-val + surface logic are all in + tested.
2026-06-27 -- P1.3b adapter: Anthropic structured-output extractor (llm.py)¶
The real LlmExtractor for ingest. Two layers so the parsing is unit-tested without the SDK or a
live call:
build_extractor(create=...): PURE core -- forces a single structured-output tool call (emit_listing, JSON-schema'd to the RAM fields), parses the tool input, defensively coerces values (numbers-as-strings/floats -> Decimal/int, junk -> None; never raises).anthropic_extractor(api_key=...): thin wiring -- constructsanthropic.Anthropic+ bindsmessages.create. Key passed in (fromconfig.anthropic_api_key), never read here.- Model: Haiku 4.5 default (cheap/fast for field extraction; overridable). System prompt re-states the data-not-instructions framing; the model has NO tools beyond the emitter (no agency), and its output is then cross-validated by ingest.
- Adds
anthropic(light SDK -- httpx/pydantic, no torch). 5 tests via a fakecreate(parse, string/float coercion, junk->None, no-tool->empty, forced-tool + wrapped-prompt). 133 pass; clean. - P1.3 is now complete except the I/O edge: NEXT P1.3c the Gmail one-label reader (bodies ->
extract_listing), then P1.4 digest.
2026-06-27 -- P1.3b(ii): ingest orchestration + cross-validation (the safety core)¶
ingest.py ties the P1.3 pieces together for one classifieds body: assert_llm_allowed (B1) ->
guard scan -> wrap_untrusted + INJECTED LLM (Callable[[str], LlmExtraction]; the real Anthropic
client is the next thin adapter) -> cross-validate vs the regex extractor -> ExtractedListing.
- Cross-validation is the safety core: per money-critical field (price, total capacity), the LLM is trusted only when it AGREES with the deterministic read; on conflict the DETERMINISTIC value wins and confidence drops to LOW. Single-source field -> MEDIUM; agreement -> HIGH. So an injected or hallucinated number can at worst yield a LOW-confidence spec, never a silently-trusted one.
- Capacity is reconciled on TOTAL GB (the money value); when totals agree the LLM's NxM config is preferred (it parses kits better than the regex's bare-capacity guess).
- The model's enum strings (ddr_gen / form / currency) are VALIDATED into our enums, never raw.
- Unsafe body (guard flagged) -> the LLM is never called; deterministic-only, LOW.
- Money bug caught + fixed:
parse_price("DDR4 EUR 80")matched the4in DDR4 as the price -> added a not-preceded-by-letter lookbehind + a regression test (found by the orchestration test). - Tier-A: 8 ingest tests (agreement HIGH; price/capacity conflict LOW + deterministic-wins; LLM-only MEDIUM; unsafe-skips-LLM; boundary-block; to_listing). 128 pass; ruff/mypy-strict/bandit clean.
- NEXT: the Anthropic adapter (the real
LlmExtractor, structured-output tool-call) + P1.3c the Gmail one-label reader -> then P1.4 digest.
2026-06-27 -- P1.3b(i): injection defense, middle ground (architecture-first, llm-guard pluggable)¶
Operator chose a MIDDLE GROUND between a light heuristic and the full (heavy) llm-guard dep.
guard.py is defense-in-depth where the strength is ARCHITECTURE, not one ML classifier:
wrap_untrusted(body): frames the body in hard delimiters as DATA, not instructions.HeuristicScanner(dependency-light): flags model-targeting injection patterns ("ignore previous instructions", fake<system>tags, "you are now", reveal-prompt, ...), strips control chars, caps length (8000) ->ScanResult(safe, sanitized, reasons).PromptScannerProtocol = the extension point: anLlmGuardScanner(ML) drops in behind a futureguardextra viadefault_scanner(); NO torch/transformers in the base install (pluggable-backend pattern, per the shared-memory reference).- The real blast-radius limiter is downstream (P1.3b-ii): a fixed structured-output schema, NO tools/ agency, and money-critical fields cross-validated vs the deterministic extractor -> an injection at worst yields a wrong RamSpec, caught -> UNVERIFIED.
- 7 tests; 120 pass; ruff/mypy-strict/bandit clean; stdlib only.
- NEXT P1.3b(ii): extraction orchestration (
assert_llm_allowed-> guard -> injected Claude structured-output -> cross-val ->ExtractedListing), LLM client injected/mocked; then the anthropic adapter + P1.3c Gmail one-label reader.
2026-06-27 -- P1.3a: deterministic extraction (heuristic-first, the cross-val oracle)¶
extract.py -- regex parse of classifieds text into a RamSpec + price, NO LLM. Applies the
edge-LLM doctrine (heuristic-first / LLM-enriches): this deterministic read is both a standalone
extractor for structured listings AND the oracle the LLM path (P1.3b) must corroborate on
money-critical fields (price, capacity), else the spec stays UNVERIFIED.
parse_spec(text) -> RamSpec: DDR gen, speed (MHz / MT/s / DDRx-NNNN), kit (NxM GB) or bare capacity (assume 1x; LLM cross-checks), form factor (SO-DIMM/UDIMM), ECC, registered. Unparsed fields stay None -> UNVERIFIED (never guesses a critical field into a PASS).parse_price(text) -> (Decimal, Currency) | None: currency symbol/code either side of the number; EU + US decimal/thousands separators normalised; non-positive / unparseable -> None.- RAM-aware patterns + generic price parsing live in
extract.py(the ingest concern), keeping the engine category-clean; generalises behind a protocol with category #2. - Tier-A: 14 tests incl. totality properties (parse_spec / parse_price never raise on arbitrary
text; any parsed price > 0). 113 pass; ruff/mypy-strict/bandit clean. No new deps (stdlib
re). - NEXT P1.3b: LLM enrichment -- Claude structured output over the email body (B1-allowed via
assert_llm_allowed) + llm-guard (injection defense) + THIS extractor as cross-validation; then P1.3c the Gmail one-label reader.
2026-06-27 -- Architecture: generic engine vs category plugins (RamSpec stays)¶
Operator question: is RamSpec the right name if the app does more than RAM? Verdict + refactor.
RamSpecis correct. It models RAM-specific fields (ddr_gen / ecc / registered / voltage) meaningless to other categories; a generic name would lie + invite cramming. Generalisation = category-specific specs behind a generic funnel, NOT one renamed spec.- The funnel is already generic: evaluate / Surface / Listing / LandedCost / FxRates / Comp /
live_baseline / deal_pct / ledger / FSM / the LLM-routing boundary are category-agnostic, and
evaluate()depends on a genericAssessment, notRamSpec. RAM-specificity is localized tofitment.py(compatibility). - Fixed two leaks: RAM bits had crept into generic modules --
bootstrap_baseline+BOOTSTRAP_EUR_PER_GB(in valuation) andquery_for(in serpapi). Moved both into a newram.py(RAM category glue: search-query builder + cold-start pricing).valuation.py+serpapi.pyare now category-clean;valuation._to_centsbecame publicto_cents. - Generalisation shape (deferred until category #2): add a sibling category module (e.g.
ssd.py: its spec + gates + query + bootstrap) and extract a thinFitmentprotocol soassess()takes an injected gate set. NOT built now -- you can't design that abstraction well from a single example.
99 tests pass; ruff/mypy-strict/bandit clean. Pure move, no behaviour change.
2026-06-27 -- Plumbing: structlog redaction (GAP-2) + Phase-1 Listing/LLM-routing (GAP-3)¶
The two deferred red-team prereqs, landed before P1.3's LLM path.
- logging.py (GAP-2):
configure_logging(settings)wires structlog (JSON, level filter, ISO timestamp) with a default-deny redaction processor -- any field whose KEY matches a secret marker (api_key / token / password / secret / authorization / ping_url / ...) is masked, so a secret can't reach a sink even if a caller puts it in an event. A test proves a loggedserpapi_api_keyvalue never appears in output. Born redacted, before the first client logs. - listings.py (GAP-3):
Listing(Phase-1 discovered-item record: source/title/url/price/..., separate from the Phase-2Snipe.Source) +ListingSource(CLASSIFIEDS_EMAIL / SERPAPI_SHOPPING / EBAY) + the B1/B2 boundary:llm_allowed/assert_llm_allowed-> only classifieds-email bodies may reach an LLM; SerpApi + eBay are deterministic-only and raiseLlmBoundaryViolation. The guard + its test exist NOW so P1.3 cannot regress the source-leak boundary. - Adds
structlog. 99 tests pass; ruff/mypy-strict/bandit clean (tests ignore S105/S106 -- fake secret fixtures).
This closes the red-team's "composition + plumbing" prereq (evaluate seam + GAP-2 + GAP-3). NEXT: P1.3 ingest (classifieds alert-email extraction via LLM + llm-guard + regex cross-val; eBay deterministic-only) -> P1.4 ranked digest + scorecard + CLI + golden set.
2026-06-27 -- P1.2b (ii): SerpApi Google-Shopping client (closes P1.2)¶
The data source that produces the Comp list for live_baseline. serpapi.py:
fetch_shopping_comps(query, *, api_key, gl, hl, ...)-> GET SerpApi google_shopping -> parseshopping_results->Comp(price, currency, source="serpapi_google_shopping"). Price fromextracted_price(Decimal via str, no float math); currency from the formatted-price symbol (EUR/GBP/USD) with the request locale as fallback. Skips price-less / non-positive items; raisesSerpApiErroron a SerpApi error payload,httpx.HTTPStatusErroron a bad status.query_for(spec)builds the search string from a RamSpec.- Deterministic-only (boundary B2): only numeric price + currency + source are kept; no listing text, never sent to an LLM.
- Secret handling: the key is a query param (SerpApi's design) so the URL holds it -- never
logged; passed in by the caller (unwrapped from
config.serpapi_api_key), never read from disk. - Adds
httpx. respx-mocked tests (7); the egress backstop guarantees no live call in CI; a live smoke run stays manual/local with the real Bitwarden key. - Follow-ups (noted): tenacity retry + pyrate-limiter rate-limit (plan sec.6); query tuning; seller/ condition filters. structlog + redaction (red-team GAP-2) still precedes any logger here.
P1.2 is now end-to-end: SerpApi comps -> trimmed-median market_ref -> deal% -> evaluate() surface.
2026-06-27 -- P1.2b (i): live trimmed-median baseline (pure money core)¶
The compare half of P1.2, money core first. valuation.py gains Comp (a deterministic price
comparable, NEVER sent to an LLM) + live_baseline(comps, fx):
- Each comp is FX-converted to EUR cents, then a 10%-trimmed median ->
market_ref(robust to a single mispriced listing). TaggedLIVE, carryingn_comps. - Below
MIN_LIVE_COMPS(5) ->None; the caller falls back to the bootstrap table (BOOTSTRAP tag), so the funnel keeps thin-comp listings ALERT-only (plan sec.3 "thin comps -> ALERT"). - DEVIATION: a stdlib trimmed median instead of numpy/scipy (plan sec.6) -- dep-light + exact on integer cents; revisit if weighting / IQR is needed.
- Tier-A: 5 tests (too-few -> None, exact median, outlier-resistance, currency conversion, + property: result within the [min, max] comp range). 83 pass; ruff/mypy-strict/bandit clean.
- NEXT P1.2b (ii): the SerpApi Google-Shopping client (httpx, respx-mocked) that produces the
Complist from live data, reading the key fromconfig.serpapi_api_key.
2026-06-27 -- The evaluate() seam: fit + value -> surface (red-team GAP-1)¶
src/quartermaster/evaluation.py composes the two pure cores into one EvaluatedListing with a
Surface decision -- the first place fitment + valuation are exercised together (they were disjoint
libraries; this was the red-team's #1 structural gap).
Surface: DROP (hard-incompatible, not shown) / ALERT_ONLY (surfaced for a human; the Phase-1 default, never auto-actioned) / APPROVE_ELIGIBLE ("would be approvable" -- no button in Phase-1).evaluate(assessment, landed_cents, baseline, *, is_eu, costs_complete, fx_age_days)(pure): REJECT -> DROP (blockers travel with it); else APPROVE_ELIGIBLE only when ALL of {verdict PASS, baseline LIVE, EU, costs complete, FX fresh (<= 4d)} hold; otherwise ALERT_ONLY with the demoting reasons attached. This is where the red-team's staleness->ALERT + incomplete-cost->ALERT now live.- Tier-A: 9 tests (one per surface path + a property: REJECT always DROPs; APPROVE_ELIGIBLE implies every safety condition). 78 pass; ruff/mypy-strict/bandit clean. No new deps.
is_eu/costs_complete+ the raw listing record are INPUTS here; they get a home on the Phase-1Listingmodel next (GAP-3). P1.2b's live baseline plugs straight intobaseline.
2026-06-27 -- Red-team pass: foundation hardening (pre-P1.2b)¶
Three independent adversarial reviews (money-correctness, security/safety, gap-analysis) before building further. The foundation was found sound (integer-cents math, ledger atomicity, release- exactly-once, no float leakage, no secrets/PII, no network/autonomous-action path yet). Fixes this increment:
- fitment (HIGH bug): a malformed spec (
capacity_gb_per_module <= 0/module_count <= 0/ negative) previously returned PASS -- surfacing garbage as biddable and feeding a negativemarket_ref. Both CRITICAL gates now REJECT non-positive values; property tests cover it. - FX hardening:
FxRatesenforces a per-currency plausibility band (0.1..10 EUR/unit) so a mis-scaled/inverted rate (e.g. USD=100) fails closed instead of silently 100x-ing landed cost;FxSnapshot.age_daysclamps at 0 (clock skew can't read as "fresh"). - LandedCost validation: rejects negative price/shipping and
import_vat_rateoutside [0,1) (catches the 21-vs-0.21 fat-finger). - FSM fail-closed (money): removed the
FIRED -> ERRORedge -- a fired-but-unconfirmed snipe routes only to NEEDS_HUMAN_RECONCILE (still holding); the sole budget-releasing edge out of FIRED is a confirmed LOST. Property test asserts it. - Egress backstop hardened: also guards
connect_ex+getaddrinfo(DNS); wording softened (respx is the primary control, this is the backstop). New test covers connect_ex + DNS. - bootstrap table: test asserts it covers every
DdrGen;.github/dependabot.ymladded (github-actions + uv, weekly).
69 tests pass; ruff/mypy-strict/bandit clean.
DEFERRED -- reviewers converged that a small composition+plumbing increment should precede P1.2b
(each is cheap now, expensive after network/LLM code):
1. evaluate() + EvaluatedListing -- the missing seam composing source + RamSpec + verdict +
landed cost + deal% + provenance; hosts the staleness->ALERT and incomplete-cost->ALERT rules.
2. structlog + secret redaction -- must precede the first secret-bearing client (SerpApi).
3. Phase-1 Listing model + source taxonomy (SERPAPI_SHOPPING) + llm_allowed routing + the B1
source-leak guard test -- so comp data is deterministic-only from the first byte.
THEN P1.2b (live SerpApi trimmed-median baseline) -> P1.3 ingest -> P1.4 digest + golden set.
Smaller deferred (documented, not yet fixed): landed-cost completeness flag (customs/GSP/payment-FX/
E[DOA] are an additive under-estimate -> inflates deal% cross-border; force ALERT until modelled);
create_snipe reserve-before-idempotency-insert (add SAVEPOINT + regression test); Snipe.budget_id
scoping for reconcile; VCR cassette redaction (before vcrpy lands); pin Actions to commit SHAs;
mid-lifecycle FSM timeout edges; 1.35V SO-DIMM -> RISK; the kickoff-prompt codename (scrub vs
remove -- operator call).
2026-06-27 -- P1.2a: valuation money core (EUR/USD/GBP first-class)¶
The Phase-1 "value" block, pure + network-free (src/quartermaster/valuation.py). Money in
Decimal, rounded to integer EUR cents (the ledger unit).
- EUR / USD / GBP are first-class (
Currencyenum +FxRates): no currency is special-cased -- EUR just has rate 1.FxRatesvalidates every currency is present + positive and EUR == 1. EUR is the settlement/comparison base (the ledger is EUR cents); listings + comps may be priced in any first-class currency and convert via one FX snapshot (sourced upstream, e.g. ECB). (Operator directive: make USD + GBP first-class just like EUR.) LandedCost.eur_cents(fx):(price + shipping) * fx -> + import VAT -> ROUND_HALF_UP to cents.import_vat_ratedefaults 0 (private EU classifieds add none); > 0 for cross-border/retail.bootstrap_baseline(spec): cold-start EUR/GB table -> aBaselinetaggedbootstrap(vslive/stale) so every valuation carries provenance (plan sec.3 cold-start + sec.7).deal_pct = clamp((market_ref - landed) / market_ref, 0, 1).- Tier-A: 15 worked examples + 4 hypothesis properties (eur_cents non-negative + monotonic in price;
EUR conversion is identity; deal_pct stays in [0,1]). valuation.py 98% cov; ruff/mypy-strict/bandit
clean; 59 tests pass. No new deps (stdlib
Decimal). - FX source (NOT hardcoded):
fx.py::ecb_fx_rates()builds a timestampedFxRatessnapshot from ECB euro reference rates via thecurrency_converterlibrary (bundled + refreshable;FxSnapshot.as_oflets callers guard staleness). The0.92 / 1.17literals exist only in tests. - Scope: the LIVE SerpApi Google-Shopping comp baseline (real comps -> trimmed median) is P1.2b.
2026-06-27 -- Repo renamed: deal-hunter-agent -> quartermaster¶
Aligned the repo slug + Pages URL with the project/package/brand name (operator request). All references updated; docs site now at https://wombat164.github.io/quartermaster/.
2026-06-27 -- Docs site: MkDocs Material + GitHub Pages (docs-as-code)¶
A docs site accompanies the repo (the plan's deferred "mkdocs + ADR site", sec.6). Chose MkDocs
Material -> GitHub Pages over the GitHub Wiki: docs-as-code (versioned with the repo, PR-reviewed,
CI-built --strict) instead of an out-of-band wiki.
mkdocs.yml(Material theme, light/dark, nav over the existingdocs/tree). Added adocsoptional-dependency extra (mkdocs-material).docs/index.mdlanding page.docs/security.md+docs/decisions.mdinclude the canonical rootSECURITY.md/DECISIONS.mdviapymdownx.snippets(--8<--) -- single source of truth, no drift..github/workflows/docs.yml: buildsmkdocs build --strict(fails on broken links/orphans) and deploys to Pages (least-privilegepages: write+id-token: write;configure-pages enablement).site/+.cache/gitignored; README links the site (https://wombat164.github.io/quartermaster/).
2026-06-27 -- Public-first posture: MIT license, config separation, security docs¶
The repo goes public from here (operator decision: public from the get-go). Pre-flip audit:
gitleaks full-history = no leaks; no PII / no tracked secrets or DB state; .gitignore already
blocks every secret/state class. Changes:
- License: MIT (was "Proprietary -- personal use only").
LICENSEadded;pyprojectupdated. Copyright "2026 Wombat164" (the repo's pseudonymous owner; keeps it PII-free). - Config separation (12-factor):
src/quartermaster/config.py(pydantic-settings, added as a runtime dep -- on-plan per sec.6). Runtime config via env (prefixQM_) or a gitignored.env;dry_rundefaults True (fail-safe). Secret fields areSecretStr(masked, never logged)..env.exampleis rendered FROM the model (render_env_example) with a drift test -- can't diverge. - Secrets in the operator's store (Bitwarden), injected via env at runtime
(
export QM_SERPAPI_API_KEY=$(bw get password serpapi)); never committed/read-from-disk in normal use. DEVIATION/extension from the plan's keyring-only note (sec.4/sec.6): Bitwarden is the operator's actual secret store; keyring stays a supported alternative backend. - SECURITY.md added: security model + private vuln reporting via GitHub security advisories.
- README: Configuration & secrets section + License/Security pointers.
- Tests:
tests/test_config.py-- fail-safedry_rundefault,.env.exampleno-drift, SecretStr never leaks inrepr/str.
2026-06-27 -- P1.1: fitment + compatibility core (the verify funnel)¶
First Phase-1 (search+compare) increment. src/quartermaster/fitment.py: a deterministic,
network-free RAM compatibility filter -- the "funnel before the click" (plan sec.4 verify block).
- Model:
RamSpec(all fields Optional;None= "not stated" = unknown) +FitmentProfile(target-machine constraints). A concreteG513QRprofile seeds it -- the project's founding use-case (ASUS ROG Strix G15 G513QR: DDR4-3200 SO-DIMM, 2 slots, 64 GB max [2x32], non-ECC, 1.2 V). - Gates: 9 pure
(spec, profile) -> GateResultpredicates (ddr_gen, form_factor, buffered, ecc, per_module_capacity, kit_fit, dual_channel, speed, voltage).assess()aggregates: REJECT dominates; else any RISK or any CRITICAL-gate UNKNOWN -> UNVERIFIED; else PASS. Honours the plan's "unknown on a critical gate -> never PASS" rule. - Verdict semantics: REJECT (hard-incompatible) / UNVERIFIED (missing critical data OR a risk like ECC-on-non-ECC) / PASS (fits; compatible-but-suboptimal stays PASS with a note, e.g. single-channel or below-rated speed).
- Choices / deviations: unstated
registeredassumed unbuffered (SO-DIMM RDIMM is exotic); ECC-on-non-ECC -> RISK (UNVERIFIED), not REJECT (usually boots as non-ECC, not guaranteed); speed + voltage are non-critical (a same-gen module always runs, just maybe slower / at JEDEC). - Tier-A rigor (sec.9): 13 worked examples (one per path) + 5 hypothesis properties (assess is
total; unknown-critical never PASSes; wrong-gen always REJECTs; oversized-module always REJECTs;
in-spec matched kits always PASS). fitment.py 100% cov; ruff + mypy-strict + bandit clean; suite
37 passing. No new runtime deps (stdlib
dataclasses+enum); no LLM, no network. - Scope: consumes an already-structured
RamSpec; free-text ->RamSpecextraction (LLM + llm-guard) is P1.3, and the gate set will grow (CAS latency, mislabel heuristics, the v1 golden set).
2026-06-26 -- STRATEGIC PIVOT: search+compare first; bid/buy deferred; eBay API dropped¶
Gate 0 RESOLVED -> NO-GO on the eBay API. The deep-research
(docs/background/ebay-gate0-research.md) found production eBay Buy/Browse API access is
partner-only, business-model-gated, "no guarantee", with no surfaced precedent of a
granted personal/hobbyist app -- too uncertain to build on. Our design is compliant; the
risk is commercial rejection, not legality. Decision:
- DROP the eBay API dependency. eBay becomes manual-only (browse + manual Gixen later). No EPN application blocks development.
- PIVOT the product to SEARCH + COMPARE first: discover available value-for-fit RAM and rank by landed cost vs a live market baseline -- the "funnel before the click" that was always the stated core value.
- DEFER bid/buy (Gixen snipe + approval + the ledger/FSM built in increment 2) to Phase 2. Increment 2 is NOT wasted -- it is the tested Phase-2 foundation, sitting ready.
- Data sources (compliant, no eBay API, no scraping): discovery = classifieds saved-search ALERT EMAILS (body = dataset); compare baseline = SerpApi Google Shopping (retail + Amazon + eBay listings AS PRICE COMPS, incl EU) + a bootstrap EUR/GB table. NOTE: dropping the API does NOT lose the eBay price signal (SerpApi surfaces eBay via Google Shopping); we lose only API-driven eBay sniping (deferred anyway).
- Roadmap reshaped: Phase 1 = search+compare (alert/compare-only, NO approve button); Phase 2 = bid/buy. plan-final.md carries a pivot banner; the detailed design there remains valid for Phase 2 + the engine.
2026-06-26 -- v0 increment 3: CI + network-egress blocker (the safety net)¶
- GitHub Actions
.github/workflows/ci.yml:uv sync --locked(lock-drift gate) -> ruff check -> ruff format --check -> mypy --strict -> pytest -> bandit (src) -> pip-audit, plus a separate gitleaks job. Runs on main + dev + PRs. - AUTOUSE network-egress blocker (
conftest.block_network): patchessocket.connect/socket.create_connectionto raiseNetworkBlockedon any non-loopback host, so NO test can ever reach eBay / Gixen / SerpApi / a classifieds inbox or place a real bid. Loopback passes through (SQLite uses no socket, so the DB tests are unaffected). Verified bytest_egress(external blocked; loopback reaches the OS error, not NetworkBlocked). - Ran
ruff formatacross the tree (consistent style; the CI format gate is green). - 19 tests pass; ruff + mypy --strict + bandit clean; pip-audit reports no CVEs.
2026-06-25 -- v0 increment 2: schema core (ledger + FSM + source-tag)¶
Tier-A money/safety -> full rigor (plan sec.9).
- Models (money as integer CENTS, never float): Budget (singleton; CHECK
0 <= committed <= cap) + Snipe (source immutable via @validates; idempotency
UNIQUE(account, ebay_item_id, snapshot_hash); UTC-aware timestamps via a
UTCDateTime TypeDecorator since SQLite has no native tz).
- Reserved-budget ledger (sec.3 CRIT fix): atomic reserve (guarded UPDATE, refuses to
exceed cap), guarded release (refuses underflow), reconcile (recomputes
committed = sum(reserved over HOLDING snipes) -- the leak detector).
- FSM (sec.5): hand-rolled enum + transition table. DEVIATION from python-statemachine
(sec.6): chosen for transparency + property-testability of the money invariant; can be
wrapped later without changing the table. The HOLDING set drives release -- a reservation
frees exactly when a snipe LEAVES holding; NEEDS_HUMAN_RECONCILE stays holding
(fail-closed on money).
- snipes service: create (reserve + insert) and transition (validate edge + release)
run in the caller's session transaction (release in the SAME tx as the state change).
- SQLite WAL + busy_timeout + foreign_keys via a connect listener.
- Alembic initial migration (reversible); custom-type import wired into the env mako
template so future autogen does not break.
- Tests (17 pass): a hypothesis STATEFUL property test (committed == sum(holding reserved)
and 0 <= committed <= cap across random create/transition sequences) + reserve/release/
reconcile units + FSM + source-immutability + idempotency + migration
upgrade -> downgrade -> upgrade. Still NO bidding/LLM/eBay/SPD-DB.
2026-06-25 -- v0 increment 1: toolchain + skeleton¶
- uv project (src-layout, hatchling build); dev extras: ruff, mypy (strict), pytest
(+asyncio, cov), respx, hypothesis, bandit, pip-audit, pre-commit.
uv.lockcommitted (reproducible builds, plan sec.9). - ruff lint selects S (security), DTZ (no-naive-datetime -> the plan's tz-aware-UTC rule), ASYNC, B, I, UP, E/F, RUF. mypy strict on src + tests.
- pre-commit: ruff + ruff-format + mypy + gitleaks (gitleaks mandatory, plan sec.4).
- DEVIATION: Python PINNED to 3.13 via a COMMITTED
.python-version(un-ignored it in.gitignore). Reason: plan sec.6 chose 3.13, and the later scientific/ML deps (numpy/scipy/llm-guard) may lack 3.14 wheels;>=3.13alone let uv pick 3.14.requires-pythonstays>=3.13. - No runtime deps yet (v0 foundation); deps are added per functional block as v1 lands.
- Validated: ruff clean, mypy strict clean, pytest green on 3.13.14.
2026-06-25 -- Architecture + UX artifacts distilled from plan-final¶
docs/architecture.md+docs/ux-mock.mdmakeplan-final.mdconcrete (they double as the eBay-application "data flows + mocks" artifacts). No plan changes -- pure distillation of the authoritative plan.- UX channel model: digest = OUTBOUND, batched daily; approval = OUT-OF-BAND signed token, distinct from the attacker-influenceable classifieds inbound inbox; critical push reserved for time-critical, verified, EU items only. (Implements trust boundary B2 + anti-alert-fatigue.)
- Anti-confirmation-fatigue: only verified + EU + costs-known items get an APPROVE button; everything else is ALERT-ONLY (no button). Type-to-confirm friction scales with spend above a threshold. (Threshold value = OPEN; see ux-mock open questions.)
- Confidence/provenance per row: badge (VERIFIED/UNVERIFIED/BOOTSTRAP) + comps state/N + source tag + landed-cost breakdown, per decision-support best practice.
- Open UX questions tracked in
docs/ux-mock.md(digest surface, friction threshold, push channel, progressive disclosure) -- resolve before/at v1.