Pathlight Integration
Three independent gates so the studio can run AI-driven research and rescans inside Canopy without ever exposing a buyer to a runaway bill. The architecture, the rejected alternatives, the failure modes I guarded against, and the line item that becomes a fixed monthly maximum by construction.
The question that gates this section: how do you let an operator press a button that costs the studio real money, without exposing the buyer to a runaway bill. That sentence sounds like a side-effect of caring about cost, but it is actually the load-bearing constraint that shaped the entire integration. Anywhere AI is in the path, cost is a wild-card unless something deliberately stops it from being one. The standard answer is 'usage-based billing with a soft alert threshold.' I have read enough postmortems on that pattern to know what happens when the alert fires at three in the morning UTC and the engineer who can flip the switch is asleep: the bill keeps growing for the entire window between the alert and the response. Every alerting layer I have ever shipped on top of usage-based billing has eventually leaked through. So I built the integration on the inverse posture: the bill cannot grow past the cap, by construction, regardless of whether anyone is awake. The integration ships behind three independent gates. Each gate is capable of stopping the call on its own. All three have to pass for the call to fire. Capability has to be turned on per install. The trigger has to be manual or rule-bounded with explicit human approval, never a silent background scheduler. The monthly budget cap has to have headroom, checked and reserved atomically against the very same write that records the spend. If any one gate fails the call aborts and the operator sees the gate that blocked it. The buyer never gets a surprise bill from a runaway loop because the architecture does not allow runaway loops. I considered scheduled background scans with usage-based billing as the obvious starting point. Wake up, scan the prospect list, surface findings to the operator. It is the pattern most agency tools ship. I rejected it for two reasons. The first is the cost-runaway problem above: a bug in the scheduler that re-enqueues tasks faster than they complete cannot be bounded by a soft threshold; only a hard cap with atomic reservation can stop it. The second is the operator-trust problem: when the system can scan on its own, the operator is no longer the agent who chose to spend the studio's money. Every Canopy install has to feel like a shop the buyer owns the keys to, and a system that bills the buyer for actions the buyer never authorized is the opposite of that. The mechanism for the budget cap is the part that took the most iteration. The naive shape is a check-then-reserve: read the current period's usage, compare it to the cap, if there is headroom, fire the call, then write the new usage row. Two concurrent admin clicks at the same instant when the budget has one scan of headroom remaining can both pass this check, both fire, and the buyer ends up over the cap. That is a textbook time-of-check-vs-time-of-use bug, and the textbook fix is atomic check-and-reserve: the same write that records the new usage row is conditional on the cap not being exceeded. Either the row lands and the call is allowed to fire, or the row does not land and the call gets a user-facing block. There is no window where two concurrent paths can both pass. I shipped that fix in the May 5 audit closure work; before that, the gate was correct in the single-operator case but had this concurrency window. The fix is a single conditional database write; either it affects one row or zero rows, never two. The capability toggle and the manual-trigger requirement are simpler in mechanism and just as important in posture. Capability is per-install: a Canopy install where the buyer has not opted into Pathlight integration cannot fire any call no matter how many other gates would pass. This means the integration ships dark on every new install and lights up only when the buyer signs the engagement that includes it. The manual-trigger requirement means there is no cron, no scheduler, no event-loop feedback that can fire a Pathlight call without an operator having clicked something. Rule-bounded triggers exist, where a deal moves to a stage where rescanning makes sense or a customer site flags a change worth checking, but every rule that fires Pathlight requires explicit human approval before the call goes through. The rule queues the call as awaiting approval; the operator approves; the call fires. Approval is not a checkbox on the rule itself; it is a per-call action. The integration surfaces in three places inside Canopy. Prospecting candidate research is the most common: an operator selects a contact or a list of contacts, kicks off a Pathlight scan against each prospect's domain, and the scan output flows into the pipeline as a record attached to the contact. The scan score becomes a sortable field; the recommendations attach to the prospect's notes; the operator can compose a follow-up email referencing specific findings. Change monitoring is the second: existing customer sites get rescanned on a schedule, but the rescans are queued for operator review rather than auto-fired, and the queue surfaces in the operations banner when something is waiting. Competitive intelligence is the third: an operator can scan a buyer's direct competitor and store the result alongside the buyer's own scan history, so the next conversation about positioning has data on both sides of the table. All three flow through the same three-gate path. There is no privileged code path that bypasses the gates for any reason. The failure mode I most carefully guarded against is the one that originally motivated the atomic check-and-reserve: two admin clicks at the same instant when the budget has one scan of headroom remaining. Both have to not pass. One scan fires, the other gets a user-facing block that names the budget cap as the reason. The block is a real UI surface, not a thrown error: the operator sees 'budget cap reached, this scan was not run' and a link to the operations banner where the cap and the period reset date are visible. The same shape applies if the capability toggle is off, or if the manual approval is missing for a rule-fired trigger. The architecture's job is to make the block legible, not just to prevent the call. A second failure mode worth naming: the gate fires, the call attempt happens, and the upstream Pathlight service errors out for an unrelated reason. The reservation has been written and the budget has been decremented; the call did not produce a result. The right behavior is a refund: a release that adds the reserved units back to the available budget. Canopy implements that release in the failure path so the buyer is never charged for a call that did not produce a deliverable. The release is atomic against the same row the reservation wrote, for the same reason the reservation has to be atomic in the first place. The honest framing on cost: AI as a recurring line item belongs in a productized engagement only when the cost is a known monthly maximum, not a wild-card. Canopy's pricing for installs that include Pathlight integration is structured around the budget cap. The buyer chooses a cap that fits the volume of scans they expect to run; the studio runs scans against that cap; the cap is the ceiling. There is no scenario where the buyer's bill exceeds the cap because there is no code path that allows a scan to fire when the cap has been reached. Every line item is bounded. The operational consequence the buyer feels is exactly the one the architecture targets: the AI line item is a fixed monthly maximum, every month, by construction. Not by hope, not by alerting, not by post-hoc reconciliation. The cap is the ceiling, the cap is enforced at the same write that records the spend, and the buyer never gets a surprise bill from a runaway loop. That is the version of AI integration that belongs in a productized engagement, and as far as I can tell, it is the only version that does.