Shipping AI Agents to Production: What Nobody Tells You
Building an LLM prototype takes an afternoon. Getting it to production takes months. After shipping 6 agent systems this year, here's the gap between the demo and reality.
Performance optimisation isn't a nice-to-have when your users are on 3G in Kampala or Nairobi. Here's the exact stack and techniques we use to build sub-2s load times across East Africa.
Frank Tamale
Founder & Lead Engineer

Most performance advice is written for a developer in San Francisco on a MacBook connected to gigabit fibre, testing against a CDN node that's 10ms away. That's not your user in Kampala. Their phone is a mid-range Android from 2021, their connection is fluctuating between 3G and 4G, and the nearest CDN node might be 80ms away on a good day.
We've shipped over 30 web products in East African markets. Here's what we've learned — often the hard way — about building web apps that feel fast for those users.
The standard Next.js performance tutorial will tell you to use next/image, enable ISR, and deploy to Vercel. That's all correct and you should do it. But it's the floor, not the ceiling. When we profiled a client's site against real users in Nairobi, we found that following all the standard recommendations still left us with a 4.8s LCP on a median device. We needed to get it under 2s.
LCP target: Aim for under 2.5s on a simulated mid-range Android (Moto G4 class) throttled to "Fast 3G" in Chrome DevTools. That's the baseline test environment we use for every East African deployment.
Before touching a single line of code, instrument your actual user population. Google Analytics 4's Core Web Vitals report and Chrome UX Report (CrUX) will tell you the 75th percentile LCP, CLS, and INP for your real traffic — not your local machine. The numbers are almost always worse than you expect.
class=class="text-svc-data">"text-ink-faint italic"># Pull CrUX data for your domain using the PageSpeed Insights API
curl "https://www.googleapis.com/pagespeedonline/v5/runPagespeed\
?url=https://yoursite.com\
&strategy=mobile\
&key=YOUR_API_KEY" | jq class="text-svc-data">'.loadingExperience.metrics'next/image handles WebP conversion and lazy loading. That's table stakes. What it won't do for you is enforce disciplined sizes attributes — and without them, Next.js will serve a 1200px image to a 390px phone screen.
class=class="text-svc-data">"text-ink-faint italic">// ❌ This serves a massive image to mobile users
<Image src={hero} alt=class="text-svc-data">"Hero" fill />
class=class="text-svc-data">"text-ink-faint italic">// ✅ Tells the browser exactly what size to request at each breakpoint
<Image
src={hero}
alt=class="text-svc-data">"Hero"
fill
sizes=class="text-svc-data">"(max-width: 640px) 100vw, (max-width: 1024px) 50vw, 800px"
priority class=class="text-svc-data">"text-ink-faint italic">// only for above-the-fold images
/>Beyond sizes, we pre-generate AVIF variants for hero images. AVIF averages 40–50% smaller than WebP for photographic content. The Next.js config is one line:
class=class="text-svc-data">"text-ink-faint italic">// next.config.js
module.exports = {
images: {
formats: [class="text-svc-data">"image/avif", class="text-svc-data">"image/webp"], class=class="text-svc-data">"text-ink-faint italic">// AVIF first, WebP fallback
deviceSizes: [class="text-svc-social">390, class="text-svc-social">640, class="text-svc-social">828, class="text-svc-social">1080, class="text-svc-social">1200], class=class="text-svc-data">"text-ink-faint italic">// match real device widths
minimumCacheTTL: class="text-svc-social">60 * class="text-svc-social">60 * class="text-svc-social">24 * class="text-svc-social">30, class=class="text-svc-data">"text-ink-faint italic">// class="text-svc-social">30-day CDN cache
},
};JavaScript parse time on a mid-range Android is 3–5× slower than a modern laptop. Every kilobyte of JS you send costs more than you think. Our rules:
@next/bundle-analyzer on every major dependency additionimport { format } from 'date-fns' not import * as dateFns)next/dynamic with ssr: false for any component that's below the fold or interaction-triggeredmoment.js with date-fns (saves ~65KB gzipped)lodash with native equivalents or cherry-picked importsclass=class="text-svc-data">"text-ink-faint italic">// Lazy-load heavy components — they don't block the initial render
import dynamic from class="text-svc-data">"next/dynamic";
const HeavyChart = dynamic(() => import(class="text-svc-data">"@/components/HeavyChart"), {
ssr: false,
loading: () => <div className=class="text-svc-data">"h-class="text-svc-social">64 animate-pulse bg-surface-hi rounded-xl" />,
});Vercel's Edge Network has nodes in Johannesburg and a growing presence in West Africa. For East African users, this means your ISR-cached pages can be served from a node that's 30–40ms away rather than Frankfurt or US-East. The configuration for aggressive edge caching looks like this:
class=class="text-svc-data">"text-ink-faint italic">// app/products/[slug]/page.tsx
export const revalidate = class="text-svc-social">3600; class=class="text-svc-data">"text-ink-faint italic">// revalidate at most every hour
export async function generateStaticParams() {
const products = await getTopProducts(class="text-svc-social">200); class=class="text-svc-data">"text-ink-faint italic">// pre-build top class="text-svc-social">200
return products.map((p) => ({ slug: p.slug }));
}
class=class="text-svc-data">"text-ink-faint italic">// Headers that tell Vercel's edge to cache aggressively
export async function generateMetadata() {
return {
other: { class="text-svc-data">"Cache-Control": class="text-svc-data">"s-maxage=class="text-svc-social">3600, stale-while-revalidate=class="text-svc-social">86400" },
};
}Custom fonts are a common source of CLS. The right pattern with Next.js is to use next/font with display: 'swap' and preload: true, and to declare font variables on :root rather than injecting them into body directly.
class=class="text-svc-data">"text-ink-faint italic">// app/layout.tsx
import { Syne, DM_Sans } from class="text-svc-data">"next/font/google";
const syne = Syne({
subsets: [class="text-svc-data">"latin"],
variable: class="text-svc-data">"--font-display",
display: class="text-svc-data">"swap",
preload: true,
weight: [class="text-svc-data">"class="text-svc-social">400", class="text-svc-data">"class="text-svc-social">700", class="text-svc-data">"class="text-svc-social">800"],
});
const dmSans = DM_Sans({
subsets: [class="text-svc-data">"latin"],
variable: class="text-svc-data">"--font-sans",
display: class="text-svc-data">"swap",
weight: [class="text-svc-data">"class="text-svc-social">300", class="text-svc-data">"class="text-svc-social">400", class="text-svc-data">"class="text-svc-social">500"],
});
export default function RootLayout({ children }) {
return (
<html className={class="text-svc-data">`${syne.variable} ${dmSans.variable}`}>
<body>{children}</body>
</html>
);
}Synthetic testing (Lighthouse, PageSpeed Insights) is useful for catching regressions in CI. But the number that matters is your real user p75 LCP from CrUX. Set up a monthly alert using the CrUX API if it crosses your threshold. We use this snippet in a GitHub Action that runs weekly:
Tip: The CrUX dashboard in Google Search Console is the easiest way to monitor your real-user Core Web Vitals without any code changes. Check it before and after any significant deployment.
sizes on every next/image componentnext.config.js image formats@next/bundle-analyzer — target <150KB first load JSnext/dynamicnext/font with display: swap for all custom fontsNone of these steps are individually heroic. It's the combination that gets you from a 4.8s LCP to a 1.4s LCP. Build the discipline into your process from the start and you won't be retrofitting it under deadline pressure.
Engineering deep-dives, design thinking, and practical AI — written for builders who care about craft. No fluff. No spray.
No spam. Unsubscribe any time.
Continue Reading
Building an LLM prototype takes an afternoon. Getting it to production takes months. After shipping 6 agent systems this year, here's the gap between the demo and reality.
Most design systems start as a solution and end up as a problem. Here's how we build component libraries that teams actually want to use — and keep using.