(02) 9163 8811
← Back to Blog

Inside a $180K Custom AI Build: How a Sydney Logistics Firm Cut Ops Headcount 30% in 2026

· HornTech Australia ·AI Development
Inside a $180K Custom AI Build: How a Sydney Logistics Firm Cut Ops Headcount 30% in 2026

The 11pm Email That Started It All

The founder of a 24-person Sydney logistics firm, call him Marcus, sent his ops manager a one-line email at 11:14pm on a Tuesday in late January 2026. "We cannot keep doing this." The week before, the team had missed a customs deadline on a container of medical equipment because three different humans, in three different inboxes, had each assumed someone else was tracking it. The cost of the slip, including expedited re-clearance, demurrage, and the reputational hit with their largest client, came to more than half a year of one ops coordinator's salary.

Marcus had been hearing the same pitch from every consultant in his LinkedIn DMs since Q3 2025: just get an AI agent to do it. Every demo looked beautiful. None of them, he kept finding, knew anything about how a freight forwarder in Botany actually moves a container from a vessel manifest to a delivery confirmation. The off-the-shelf agents could write breezy customer emails. They could not read a Maersk EDI feed and cross-reference it against an internal job number while the cargo was still on the water.

This is the build notes for what happened next, the 12-week story of how that ops disaster became a custom Claude agent stack, what it actually cost, what broke in production, and what Marcus would do differently if he had to start over. Names and a few specifics are anonymised; the numbers and the timeline are not.

What the Old Stack Was Doing (And Why It Was Bleeding Money)

Before the build, Marcus's team ran on what he later called "the duct tape stack." Cargowise on the operational side, a separate accounting tool, three shared inboxes, two WhatsApp groups, a half-broken Zapier flow that nobody had touched in eighteen months, and a single Google Sheet that everyone secretly considered the source of truth. Five ops coordinators spent the bulk of their days reconciling these pieces by hand and chasing each other to confirm what had actually happened.

The visible problem was speed. Quotes were taking 36 to 48 hours when faster competitors were doing same-day. The invisible problem, the one that drove the late-night email, was that the duct tape was hiding error rates that compounded as volume grew. A missed shipping notice in February became a missed customs window in March became a missed delivery promise in April. By the time anything showed up in the P&L, four people had already touched the file.

Marcus had quoted out three solutions over the previous quarter. A new TMS implementation came in around AUD $240,000 over twelve months, locked in for three years, and required hiring a dedicated systems person to run it. A "AI-powered" SaaS add-on at AUD $4,800 a month was, on closer inspection, a rebrand of an OCR tool. A traditional dev shop offered a custom build at AUD $320,000 with a six month lead time. None of them were going to ship value before the next peak season.

The Decision: Build, Buy, or Hire an Agency

By early February, Marcus had narrowed the choice to three viable paths. The first was buying a vertical SaaS product and forcing his ops to bend around it. The second was hiring two more coordinators and a junior ops manager, an annual cost north of AUD $310,000 fully loaded, that would only buy him a year of headroom before the same problem returned at higher volume. The third was a custom Claude agent stack built specifically against his Cargowise data and his three inbox sources.

The argument that won, from his eventual development partner, was simple. SaaS solves a generic problem averagely. Custom development solves a specific problem precisely. For a freight workflow that touches a manifest format, an internal naming convention, and a customer relationship that nobody outside the building understands, the gap between generic and precise is exactly the gap between "polite assistant" and "saved deal." That is the kind of moat custom AI builds deliver in Sydney service businesses, and it is also the place where pre-packaged AI tools quietly fail.

The proposed budget was AUD $180,000 across a 12-week build, with a 90-day post-launch support retainer at AUD $9,500 per month. Roughly 60% of the build budget went to engineering, 25% to data integration and Cargowise plumbing, and 15% to evaluation infrastructure, the part most other quotes had skipped entirely. Marcus signed on the third Monday of February.

Sydney AI development team sketching custom Claude agent architecture on a whiteboard for a logistics client

The Build, Week by Week

What follows is what happened across the actual 12 weeks. There is no scrum poetry here, just the short version of where time went.

Weeks 1 and 2 were spent inside the warehouse and on the phone, not writing code. The development team shadowed two ops coordinators for three days each, recorded every micro-decision they made on a sample of 40 live shipments, and rebuilt the implicit decision tree on paper. This week alone surfaced eleven undocumented rules ("if the consignee is in regional NSW and the vessel berths after Friday, hold the customs declaration until Monday morning") that no SaaS product could have known about.

Weeks 3 to 5 were data plumbing. Cargowise read access via the official integration layer, EDI ingestion from two carrier feeds, parsing the three shared inboxes through a structured email pipeline, and building a clean canonical "shipment object" that the agent could reason about. This is the unglamorous work that makes or breaks a custom AI build, and the team spent more time on it than on anything else.

Weeks 6 to 8 were the agent itself. A Claude Sonnet 4.6 backbone (with Claude Opus 4.7 reserved for the harder reasoning steps) wired up with tool use against the canonical shipment store, an evaluation harness scoring the agent against the 40 shadowed shipments, and an internal Slack interface so the ops team could ask "what is the status of job 71244" and get a complete answer with citations back to the underlying records.

Weeks 9 and 10 were guardrails and evals. Every action the agent could take in the world (sending an email, updating Cargowise, marking a job complete) sat behind a two-level confirmation: a confidence score from the agent itself, and a human approval step for anything below a configurable threshold. The team also built a regression test set of 120 historical shipments with known outcomes, and required a 95% pass rate before any model or prompt change went to production.

Weeks 11 and 12 were soft rollout. Two ops coordinators worked alongside the agent on real jobs while the other three kept doing things the old way. By the end of week 12, the agent-paired coordinators were handling 2.3 times more shipments per day than the control group, with measurably fewer errors. That was the moment the rollout went company-wide.

What Broke in Production (And What We Fixed)

No build of this size ships clean. Three things broke inside the first 30 days of full rollout, and the way the team handled them is the actual story behind why a custom AI stack is different from a SaaS purchase.

Break one: in week 14, the agent started confidently confirming customs clearance on shipments that had not actually cleared. The root cause was a Cargowise webhook delay of up to four hours during peak periods, which meant the agent was reading stale data and treating it as live. The fix was a recency check baked into the canonical store: any record older than 30 minutes had to be re-fetched before the agent could act on it. Caught in 48 hours, fixed in 72.

Break two: in week 17, customer satisfaction on agent-drafted emails dropped sharply for one specific large customer. The cause was tone. The agent was writing in HornTech's neutral business voice, but this particular customer expected the slightly chattier, first-name, "g'day mate" register that one specific human coordinator had been using for years. The fix was a per-customer voice profile in the prompt, populated from the last 50 emails between that customer and the firm. Caught in 5 days, fixed same day.

Break three was the one Marcus credits with making him a believer. In week 22, the agent flagged a shipment for human review that no human had previously thought to question. It had spotted a tiny mismatch between the declared HS code and the commercial invoice description, the kind of thing that, if it had reached customs, would have triggered a four week investigation and a five-figure fine. The team caught the discrepancy, contacted the consignor, fixed the paperwork, and the shipment cleared on schedule. That is the moment the agent paid for itself.

"For the first six weeks I kept waiting for the catch. The catch was that we built the thing right and now we cannot imagine running the company without it. The hardest part was committing to a custom build instead of taking the easy SaaS option that would not have actually solved our problem."

Sydney logistics ops coordinator using a custom Claude agent via Slack to query shipment status in real time

The Numbers, 90 Days In

Three months after full rollout, the picture was clear enough to publish internally. The two coordinators who had asked to leave because the workload was unsustainable were now actively recommending the firm to ex-colleagues. Two of the five ops roles were redeployed to growth-side work (account management, business development) rather than made redundant; two of the three remaining ops staff handled what previously took five people; one role was eliminated through natural attrition.

Quote turnaround dropped from a 36 to 48 hour median to under 4 hours. Error rate on customs declarations fell from a baseline of 7.4% to under 1%. Revenue per ops headcount went up roughly 60%, even before counting the soft win of pitching new clients with same-day quotes. The agent's compute and API costs ran around AUD $4,200 per month, well inside the contingency allocated in the build budget.

The total first-year cost picture, build plus 9 months of post-launch retainer plus compute, came to roughly AUD $268,500. Compared against the AUD $310,000+ alternative of hiring three more humans (and not solving the underlying error rate), the payback was complete inside 11 months on cost savings alone, with the revenue growth from faster quoting on top of that.

For Sydney businesses weighing similar projects, the relevant lesson is not the headline number. The lesson is that the AUD $180,000 figure included AUD $27,000 of evaluation infrastructure, the part most cheaper quotes leave out, and that omission is exactly what separates an AI project that ships and one that quietly dies six months in.

Sydney logistics leadership team reviewing 90-day metrics from custom AI build in modern office boardroom

What This Means If You're a Sydney SME Considering Custom AI

Marcus's build worked because four conditions held. The team had a real, measurable problem (not a vague desire for "AI"). The workflow had enough proprietary logic that a generic tool would not solve it. There was internal capacity to support a 12-week project without going dark on the day job. And there was a development partner willing to invest the first two weeks in the warehouse instead of behind a laptop.

If those four conditions hold for your Sydney business, custom AI development pays back faster than most CFOs expect, especially against the alternative of hiring linearly into a process that does not scale. If they do not hold, the honest answer is that an off-the-shelf SaaS or a focused process redesign will probably get you further for less. The category of problem matters more than the appetite for the technology.

The build playbook covered above is the same one HornTech uses across our custom AI development engagements for Sydney service businesses. The patterns hold whether the workflow is freight forwarding, medical scheduling, accounting, or recruiting; what changes is the data plumbing and the per-customer logic. For teams already thinking about how AI search visibility plays into the same picture, our companion piece on AI Search Optimisation pricing in Sydney 2026 walks through the discovery side of the same problem.

Authority signal worth noting: Anthropic's own customer case library documents the same pattern across dozens of industries. Custom builds against Claude consistently outperform generic SaaS in workflows where the institutional knowledge sits outside any vendor's training data.

Frequently Asked Questions

How much does a custom AI build cost in Sydney in 2026?

A focused production build for a single workflow lands between AUD $80,000 and AUD $250,000, with most SME projects clustering in the AUD $120,000 to AUD $200,000 range. Anything below AUD $60,000 is typically a thin wrapper over an off-the-shelf API and will struggle to handle workflow-specific edge cases. The AUD $180,000 build described above is roughly the median for a 12-week, single-workflow Sydney engagement.

How long does a custom AI development project take?

A scoped, single-workflow build runs 8 to 14 weeks from kickoff to soft rollout, plus a 60 to 90 day stabilisation period. Multi-workflow or cross-system projects can run 4 to 6 months. Anything quoted under 6 weeks for a real production agent is almost certainly skipping evaluation infrastructure, which is the part you cannot skip.

Should I hire a custom AI development agency or build in-house?

Build in-house if you already have one or more senior engineers with production LLM experience and the bandwidth to dedicate 60% of their time for 3 months. Otherwise, an agency engagement is faster, cheaper in total cost, and meaningfully de-risked because the team has already made (and survived) the mistakes you would otherwise make. Most Sydney SMEs do not have the in-house option realistically available.

What does an AI consultant in Sydney actually do day-to-day?

The honest answer is that a competent AI consultant spends roughly 40% of their time inside your business understanding the workflow, 30% on data plumbing and integration, 20% on agent design and evaluation, and 10% on the model and prompt work that gets all the marketing attention. Anyone who flips that ratio is selling you the easy part.

What is the biggest mistake Sydney businesses make with custom AI?

Underspending on evaluation. The cheapest builds skip the regression test set and the per-action confidence scoring, which makes the agent look great in demo and quietly accumulate errors in production. By month four, trust collapses and the project is shelved. Spending 15% to 20% of the build budget on evaluation infrastructure is what separates AI projects that survive from ones that die quietly.

Want to See If Custom AI Fits Your Business?

Not every Sydney SME needs a custom AI build. The honest answer for many businesses is that an off-the-shelf tool, a process redesign, or a smarter hire will get you further for less money. HornTech runs a free 45-minute discovery conversation specifically to figure out which category your situation falls into, with no pitch attached if a custom build is not the right fit.

Browse our custom AI development services page for the full engagement model, or book a discovery call if you want to talk through your specific workflow before deciding anything. We typically reply within one Sydney business day.


Related HornTech services: AI Development Services

Keep reading: AI for Sydney SMEs in 2026: 14 Questions Every Owner Asks Before Spending a Dollar · Custom AI Development Sydney: 7-Step Business Guide 2026