Project
FreightSense
AI-powered shipment delay intervention engine Deterministic risk scoring · LLM reasoning · Human-override audit
What is FreightSense?#
FreightSense is a two-layer AI decision system that helps operations teams decide exactly what to do when a shipment is at risk of arriving late. Instead of manually reviewing every delayed order, it:
- Scores the risk deterministically using historical benchmarks (delay magnitude, financial exposure, category/market late-delivery rates).
- Reasons with an LLM (LLaMA 3.3 70B via Groq) to classify the recommended intervention and estimate cost savings.
- Audits every decision — including human overrides — in a persistent log, so nothing is a black box.
The result: a one-click decision dashboard where analysts see the risk, the AI's recommendation, the reasoning, and any past overrides — all in one place.
Architecture#
┌─────────────────────────────────────────┐ │ Browser UI │ │ (Single-page, vanilla JS + CSS) │ └──────────────────┬──────────────────────┘ │ REST / JSON ┌──────────────────▼──────────────────────┐ │ FastAPI (main.py) │ │ POST /api/evaluate │ │ POST /api/evaluate/{id}/override │ │ GET /api/audit │ │ GET /api/audit/{id}/overrides │ │ GET /api/meta │ └───────────┬──────────────┬─────────────┘ │ │ ┌───────────────────────▼──┐ ┌──────▼──────────────────┐ │ Layer 1 — Deterministic│ │ Layer 2 — LLM │ │ deterministic.py │ │ llm_evaluator.py │ │ │ │ │ │ • Delay days │───▶│ • Groq llama-3.3-70b │ │ • Financial exposure │ │ • Structured JSON prompt │ │ • Risk score (0–100) │ │ • Confidence score │ │ • Guardrail flags │ │ • Cost-saving estimate │ │ • Benchmark lookup │ │ • Free-text reasoning │ └──────────────────────────┘ └──────────────────────────┘ │ ┌───────────▼──────────────┐ │ aiosqlite Database │ │ evaluations + overrides │ └──────────────────────────┘Decision matrix#
| Risk Score | Recommendation | Trigger condition |
|---|---|---|
| ≥ 75 | EXPEDITE | High delay + high exposure |
| 50–74 | DISCOUNT | Moderate delay, retention risk |
| 25–49 | MONITOR | Low delay, watch required |
| < 25 | NO_ACTION | Within acceptable variance |
When Layer 1 and the LLM disagree, the UI surfaces a disagreement badge and shows both recommendations side-by-side so the human can make the final call.
Quick Start#
Prerequisites#
- Python 3.12+
- uv (recommended) or pip
- A Groq API key
1 — Clone & install#
git clone https://github.com/your-org/freightsense.gitcd freightsenseuv sync # creates .venv and installs all dependencies2 — Configure#
cp .env.example .env # then edit .envGROQ_API_KEY=gsk_...DATABASE_URL=./freightsense.db # SQLite path (or /tmp/freightsense.db in Cloud Run)GROQ_MODEL=llama-3.3-70b-versatile3 — Run#
uv run uvicorn main:app --reload# → http://localhost:8000Or open the interactive notebook UI:
uv run marimo edit main.pyThe API docs are at http://localhost:8000/docs and the operational dashboard at http://localhost:8000.
API Reference#
POST /api/evaluate#
Submit a shipment for risk scoring and LLM intervention recommendation.
// Request{ "order_id": "ORD-00123", // optional — auto-generated if blank "customer_segment": "Corporate", // Consumer | Corporate | Home Office "market": "USCA", // USCA | Europe | LATAM | Pacific Asia | Africa "category_name": "Electronics", "shipping_mode": "Standard Class", "days_scheduled": 5, "days_actual_estimate": 9, "order_item_total": 1200.00, "profit_ratio": 0.18}// Response — 201 Created{ "evaluation_id": 42, "delay_days": 4.0, "risk_score": 71.3, "financial_exposure": 249.48, "confidence_tier": "HIGH", "layer1_recommendation": "DISCOUNT", "layer1_intervention_cost": 120.0, "llm_recommendation": "DISCOUNT", "confidence_score": 0.87, "reasoning": "A 4-day delay on a Corporate Electronics order in USCA …", "estimated_cost_saving": 180.0, "guardrail_flags": [], "llm_available": true, "layers_disagree": false}POST /api/evaluate/{id}/override#
Record a human override decision against any evaluation (supports multiple revisions).
{ "override_decision": "ACCEPT", // ACCEPT | REJECT | CUSTOM "override_reason": "Customer is a strategic account — expedite instead.", "outcome_notes": ""}GET /api/audit#
Paginated audit log of all evaluations with their latest override status.
GET /api/audit?skip=0&limit=50GET /api/audit/{id}/overrides#
Full override history for a single evaluation (shows every revision in order).
GET /api/meta#
Returns available categories and markets for populating UI dropdowns.
Benchmarks#
FreightSense ships with pre-computed benchmark statistics derived from ~180 k real supply-chain records (data/DataCoSupplyChainDataset.csv). These are compiled once into data/benchmarks.json and loaded at startup — no database query at inference time.
# Regenerate benchmarks after updating the source CSVuv run python scripts/build_benchmarks.pyEach benchmark group (category × market) stores:
- Average scheduled days
- Average delay days
- Late delivery rate
- Average profit ratio
- Sample size
Deployment — Google Cloud Run#
FreightSense is designed for Cloud Run with a single warm instance (minScale=1) so the SQLite audit log stays hot between requests.
One-time GCP setup#
# Set your GitHub details and runGITHUB_ORG=your-org GITHUB_REPO=freightsense bash scripts/setup_gcp.shThis script:
- Enables all required GCP APIs
- Creates an Artifact Registry Docker repo
- Creates a Service Account for GitHub Actions
- Configures Workload Identity Federation (no long-lived keys)
- Stores
GROQ_API_KEYin Secret Manager - Prints the three GitHub Variables to set
GitHub Variables to add#
After running the setup script, add these under Settings → Secrets and variables → Actions → Variables in your repo:
| Variable | Value |
|---|---|
GCP_PROJECT_ID |
your GCP project ID |
GCP_WORKLOAD_IDENTITY_PROVIDER |
printed by setup script |
GCP_SERVICE_ACCOUNT |
printed by setup script |
Continuous deployment#
Every push to main automatically:
push to main │ ▼ Checkout → Auth (Workload Identity) → Build image → Push to Artifact Registry │ ▼ gcloud run services replace service.yaml (tagged with $GITHUB_SHA) │ ▼ Cloud Run deploys new revision, old revision drainsManual deploys are also available via the Actions → Run workflow button in the GitHub UI.
Project Structure#
freightsense/├── main.py # FastAPI app entry point├── service.yaml # Cloud Run service spec├── Dockerfile├── pyproject.toml│├── app/│ ├── api/│ │ ├── routes.py # All 5 endpoints│ │ └── schemas.py # Pydantic I/O models│ ├── core/│ │ ├── benchmarks.py # Benchmark store (in-memory dict)│ │ ├── deterministic.py # Layer 1 — risk scoring engine│ │ ├── llm_evaluator.py # Layer 2 — Groq LLM integration│ │ └── config.py # Settings (pydantic-settings)│ ├── db/│ │ ├── database.py # aiosqlite init + connection│ │ └── models.py # Async CRUD helpers│ └── static/│ ├── index.html # Single-page operations dashboard│ └── styles.css│├── data/│ ├── benchmarks.json # Pre-computed benchmark stats (~180 k records)│ └── DataCoSupplyChainDataset.csv│├── scripts/│ ├── build_benchmarks.py # Regenerate benchmarks.json│ ├── setup_gcp.sh # One-time GCP resource bootstrap│ └── test_groq.py # Smoke-test Groq connectivity│└── .github/ └── workflows/ └── deploy.yml # GitHub Actions CI/CDTech Stack#
| Layer | Technology |
|---|---|
| API framework | FastAPI 0.115 |
| LLM inference | Groq API — LLaMA 3.3 70B Versatile |
| Async database | aiosqlite 0.22 + raw SQL |
| Data processing | pandas 2.2 |
| Config | pydantic-settings |
| Runtime | Python 3.12, uvicorn |
| Container | Docker (python:3.12-slim, non-root) |
| CI/CD | GitHub Actions |
| Cloud | Google Cloud Run + Artifact Registry + Secret Manager |