Email Finder Service - Detailed Implementation Plan

⚡Quick Start (TL;DR)

For experienced engineers, here's the condensed implementation path:

Phase 0: Create staging branch → Configure branch protection → Create CONTRIBUTING.md
Phase 1: Create 8 new KV namespaces + 1 Hyperdrive → Update wrangler.toml → Set secrets
Phase 2: Create 3 GitHub workflows → Set secrets/variables → Enable status checks
Phase 3: pnpm add @sentry/cloudflare → Wrap index.ts → Update error handler
Phase 4: Add [observability] to wrangler.toml → Configure Axiom export in CF Dashboard
Phase 5: Create 3 Axiom dashboards → Configure 4 monitors
Phase 6: Create 5 Checkly checks (health, find×2, validate×2)
Phase 7: Create 3 K6 scripts → Add to CI workflow
Phase 8: Install CodeRabbit app → Create .coderabbit.yaml
Phase 9: Add coverage thresholds → Create test factories

⚠️ Critical Path

Phase 0 → Phase 1 → Phase 2 must be sequential and blocks all other work.

📋Prerequisites

Required Access

Resource	Required Permission	Who Can Grant
GitHub Repository	Admin	Repository owner
Cloudflare Account	Edit Workers, KV, Hyperdrive	Account owner
Sentry Organization	Admin (to create project)	Sentry admin
Axiom Organization	Admin (to create datasets)	Axiom admin
Checkly Account	Admin	Account owner
Slack Workspace	Create webhook	Workspace admin

Required Tools (Local Machine)

# Verify installations
node --version    # >= 20.0.0
pnpm --version    # >= 9.0.0
wrangler --version # >= 4.0.0
git --version     # >= 2.40.0

# Optional but recommended
k6 --version      # For local load testing
jq --version      # For JSON processing

Pre-Implementation Checklist

✓ Complete before starting

All team members have GitHub repository access
Cloudflare API token created with Workers permissions
Slack channel created for notifications (#email-finder-alerts)
Slack incoming webhook URL created (Channel Settings → Integrations → Incoming Webhooks)
Budget approved for external services (Sentry, Checkly)
Team notified of upcoming workflow changes
Current production deployment is stable
Verify test:e2e script exists in package.json (or create it - see below)

test:e2e Script Setup

If the test:e2e script doesn't exist in package.json, add it:

{
  "scripts": {
    "test:e2e": "vitest run --config vitest.e2e.config.ts"
  }
}

Create vitest.e2e.config.ts if it doesn't exist:

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    include: ['tests/e2e/**/*.test.ts'],
    testTimeout: 30000, // 30s timeout for E2E tests
    hookTimeout: 30000,
  },
});

🔍Current State Analysis

Current Stack

Component	Technology
Runtime	Cloudflare Workers with Hono v4.7
Language	TypeScript with Valibot validation
Database	Neon PostgreSQL via Hyperdrive
Caching	Cloudflare KV (4 namespaces)
Linting	Biome v2.0.0
Testing	Vitest v4.0 with unit and E2E tests
Logging	Axiom (basic integration)

Gaps Identified

⚠️ Critical Issue

Staging and production share the same KV namespaces and Hyperdrive configuration, causing data isolation issues.

No documented Git flow or branch protection rules
No CI/CD pipeline (GitHub Actions)
No error tracking (Sentry)
No distributed tracing (OpenTelemetry)
No synthetic monitoring (Checkly)
No load testing (K6)
No AI code review (CodeRabbit)

🛠️Tool Stack Summary

Category	Tool	Purpose
CI/CD	GitHub Actions	Automated testing and deployment
Code Review	CodeRabbit	AI-powered PR reviews
Error Tracking	Sentry	Exception monitoring, performance insights
Tracing	Cloudflare OTEL	Distributed tracing with automatic instrumentation
Logging/Dashboards	Axiom	Events, monitors, dashboards
Synthetic Monitoring	Checkly	API health checks, multi-region testing
Load Testing	K6	Performance and stress testing
Linting	Biome	Code quality (already configured)
Unit/E2E Testing	Vitest	Automated test suite
Notifications	Slack	Team alerts and deployment notifications

0

Git Flow & Best Practices

Time: Day 1 Morning (~2-3 hours)

Git Flow Overview

┌─────────────────────────────────────────────────────────────────┐
│                        PRODUCTION                                │
│                           main                                   │
│                            ▲                                     │
│                            │ PR (requires approval + CI)         │
│                            │                                     │
│                        STAGING                                   │
│                         staging                                  │
│                            ▲                                     │
│                            │ PR (requires CI)*                   │
│                            │                                     │
│                       DEVELOPMENT                                │
│              feature/* | fix/* | chore/*                        │
└─────────────────────────────────────────────────────────────────┘

* CI status check requirement is enabled in Phase 2 after workflows are created.

Branch Naming Conventions

Type	Pattern	Example
Production	`main`	main
Staging	`staging`	staging
Feature	`feature/<ticket>-<desc>`	feature/EF-123-add-validation
Bug Fix	`fix/<ticket>-<desc>`	fix/EF-456-null-pointer
Chore	`chore/<desc>`	chore/update-dependencies
Hotfix	`hotfix/<desc>`	hotfix/critical-auth-fix

Mandatory Rules

⚠️ No Exceptions

1. Never push directly to main
2. Never push directly to staging
3. All code must pass through staging before production
4. All PRs require CI checks to pass*
5. PRs to main require approval

Branch Protection Configuration

Configure in GitHub Repository Settings → Branches:

Setting (main branch)	Value
Require pull request before merging	✅ Enabled
Require approvals	✅ 1 approval required
Dismiss stale approvals	✅ Enabled
Require status checks	❌ Disabled (enable in Phase 2)
Require up to date branches	✅ Enabled
Allow force pushes	❌ Disabled

Commit Message Convention

Follow Conventional Commits format: <type>(<scope>): <description>

Type	Description	Example
`feat`	New feature	feat(validation): add bypass_cache parameter
`fix`	Bug fix	fix(smtp): handle timeout errors gracefully
`docs`	Documentation only	docs(readme): update deployment instructions
`refactor`	Code change (no bug fix or feature)	refactor(cache): simplify key generation
`test`	Adding or updating tests	test(find): add edge case coverage
`chore`	Maintenance tasks	chore(deps): update @sentry/cloudflare
`ci`	CI/CD changes	ci: add K6 smoke test to staging

Verification Checklist

🛑 STOP: Complete before Phase 1

CONTRIBUTING.md created with Git flow documentation
Branch protection configured for main (WITHOUT status checks)
Branch protection configured for staging (WITHOUT status checks)
staging branch created from main
Test: Direct push to main is blocked
Test: Direct push to staging is blocked
Test: PR to main without approval is blocked
Team notified of new workflow rules

1

Infrastructure Separation (CRITICAL)

Time: Day 1 Afternoon (~3-4 hours)

⚠️ Critical Pre-requisite

This phase MUST be completed before CI/CD. Staging and production currently share the same KV namespaces and Hyperdrive, causing data isolation issues.

Create Staging KV Namespaces

# Create staging-specific KV namespaces
wrangler kv namespace create "PATTERN_CACHE_STAGING"
wrangler kv namespace create "DOMAIN_CACHE_STAGING"
wrangler kv namespace create "RESULT_CACHE_STAGING"
wrangler kv namespace create "NEGATIVE_CACHE_STAGING"

Create Production KV Namespaces

# Create production-specific KV namespaces
wrangler kv namespace create "PATTERN_CACHE_PRODUCTION"
wrangler kv namespace create "DOMAIN_CACHE_PRODUCTION"
wrangler kv namespace create "RESULT_CACHE_PRODUCTION"
wrangler kv namespace create "NEGATIVE_CACHE_PRODUCTION"

Create Staging Hyperdrive

wrangler hyperdrive create email-finder-staging \
  --connection-string="postgresql://..."

Update wrangler.toml

# Staging Environment
[env.staging]
name = "email-finder-service-staging"
vars = { ENVIRONMENT = "staging" }

[[env.staging.kv_namespaces]]
binding = "PATTERN_CACHE"
id = "<NEW_STAGING_PATTERN_CACHE_ID>"

[[env.staging.kv_namespaces]]
binding = "DOMAIN_CACHE"
id = "<NEW_STAGING_DOMAIN_CACHE_ID>"

# ... repeat for all KV namespaces

[[env.staging.hyperdrive]]
binding = "HYPERDRIVE"
id = "<NEW_STAGING_HYPERDRIVE_ID>"

Set Staging Secrets

wrangler secret put API_KEYS --env staging
wrangler secret put AXIOM_API_TOKEN --env staging
wrangler secret put SOCKS5_PROXY_USER --env staging
wrangler secret put SOCKS5_PROXY_PASS --env staging
wrangler secret put OCHECKER_API_KEY --env staging
wrangler secret put NO2BOUNCE_API_KEY --env staging
wrangler secret put MILLIONVERIFIER_API_KEY --env staging
wrangler secret put BYTEMINE_API_TOKEN --env staging

# Note: SENTRY_DSN will be set in Phase 3 after Sentry is configured

Set Production Secrets

# Skip if production secrets already exist from current deployment
# Verify with: wrangler secret list --env production

wrangler secret put API_KEYS --env production
wrangler secret put AXIOM_API_TOKEN --env production
wrangler secret put SOCKS5_PROXY_USER --env production
wrangler secret put SOCKS5_PROXY_PASS --env production
wrangler secret put OCHECKER_API_KEY --env production
wrangler secret put NO2BOUNCE_API_KEY --env production
wrangler secret put MILLIONVERIFIER_API_KEY --env production
wrangler secret put BYTEMINE_API_TOKEN --env production

# Note: SENTRY_DSN will be set in Phase 3 after Sentry is configured

Note: If this is an existing production service, these secrets may already be configured. Run wrangler secret list --env production to verify. Only set secrets that are missing.

Verification Checklist

🛑 STOP: Complete before Phase 2

Staging KV namespaces created with unique IDs (4 namespaces)
Production KV namespaces created with unique IDs (4 namespaces)
Staging Hyperdrive created (separate from production)
wrangler.toml updated with staging-specific IDs
wrangler.toml updated with production-specific IDs
Staging secrets configured (8 secrets, excluding SENTRY_DSN)
Production secrets verified or configured (8 secrets, excluding SENTRY_DSN)
Test deploy to staging: wrangler deploy --env staging
Verify staging /health/ready returns 200
Test deploy to production: wrangler deploy --env production
Verify production /health/ready returns 200
Write test data to staging KV, verify it does NOT appear in production

2

CI/CD Pipeline with GitHub Actions

Time: Day 2 (~4-5 hours)

GitHub Repository Variables

Go to Repository Settings → Secrets and variables → Actions → Variables:

Variable	Value
`STAGING_API_URL`	`https://email-finder-service-staging.<subdomain>.workers.dev`
`PRODUCTION_API_URL`	`https://email-finder-service.<subdomain>.workers.dev`
`SENTRY_ENABLED`	`false` (change to `true` after Phase 3)

Create CI Workflow

.github/workflows/ci.yml

name: CI

on:
  pull_request:
    branches: [main, staging]

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 9

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'pnpm'

      - run: pnpm install --frozen-lockfile

      - name: Type Check
        run: pnpm typecheck

      - name: Lint
        run: pnpm lint

      - name: Unit Tests
        run: pnpm test

      - name: Dead Code Check
        run: pnpm knip

Create Deploy Staging Workflow

.github/workflows/deploy-staging.yml

name: Deploy Staging

on:
  push:
    branches: [staging]

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with: { version: 9 }
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'pnpm' }
      - run: pnpm install --frozen-lockfile
      - run: pnpm typecheck
      - run: pnpm lint
      - run: pnpm test
      - run: pnpm knip

  deploy:
    needs: ci
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with: { version: 9 }
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'pnpm' }
      - run: pnpm install --frozen-lockfile

      - name: Deploy to Staging
        run: pnpm wrangler deploy --env staging
        env:
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}

      - name: Post-Deploy Smoke Test
        run: |
          sleep 5
          response=$(curl -s -o /dev/null -w "%{http_code}" \
            "${{ vars.STAGING_API_URL }}/health/ready")
          if [ "$response" != "200" ]; then
            echo "Smoke test failed! Got HTTP $response"
            exit 1
          fi
          echo "Smoke test passed!"

      - name: E2E Tests
        run: pnpm test:e2e
        env:
          API_URL: ${{ vars.STAGING_API_URL }}
          API_KEY: ${{ secrets.STAGING_API_KEY }}

      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1.25.0
        with:
          payload: |
            {"text": "Staging deploy ${{ job.status }}: ${{ github.event.head_commit.message }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Create Deploy Production Workflow

.github/workflows/deploy-production.yml

name: Deploy Production

on:
  push:
    branches: [main]

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with: { version: 9 }
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'pnpm' }
      - run: pnpm install --frozen-lockfile
      - run: pnpm typecheck
      - run: pnpm lint
      - run: pnpm test
      - run: pnpm knip

  # E2E tests must pass against staging BEFORE deploying to production
  e2e-staging-gate:
    needs: ci
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
        with: { version: 9 }
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'pnpm' }
      - run: pnpm install --frozen-lockfile
      - name: E2E Tests Against Staging
        run: pnpm test:e2e
        env:
          API_URL: ${{ vars.STAGING_API_URL }}
          API_KEY: ${{ secrets.STAGING_API_KEY }}

  deploy:
    needs: e2e-staging-gate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Required for Sentry commits

      - uses: pnpm/action-setup@v4
        with: { version: 9 }
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'pnpm' }
      - run: pnpm install --frozen-lockfile

      - name: Deploy to Production
        run: pnpm wrangler deploy --env production
        env:
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}

      - name: Post-Deploy Smoke Test
        run: |
          sleep 5
          response=$(curl -s -o /dev/null -w "%{http_code}" \
            "${{ vars.PRODUCTION_API_URL }}/health/ready")
          if [ "$response" != "200" ]; then
            echo "Production smoke test failed! Got HTTP $response"
            exit 1
          fi
          echo "Production smoke test passed!"

      - name: Create Sentry Release
        if: ${{ vars.SENTRY_ENABLED == 'true' }}
        uses: getsentry/action-release@v1
        env:
          SENTRY_AUTH_TOKEN: ${{ secrets.SENTRY_AUTH_TOKEN }}
          SENTRY_ORG: ${{ secrets.SENTRY_ORG }}
          SENTRY_PROJECT: ${{ secrets.SENTRY_PROJECT }}
        with:
          environment: production
          set_commits: auto

      - name: Notify Slack
        if: always()
        uses: slackapi/slack-github-action@v1.25.0
        with:
          payload: |
            {"text": "Production deploy ${{ job.status }}: ${{ github.event.head_commit.message }}"}
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Required GitHub Secrets

Secret	Purpose	When to Add
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare account for deployments	Phase 2
`CLOUDFLARE_API_TOKEN`	Wrangler authentication	Phase 2
`STAGING_API_KEY`	API key for staging E2E tests	Phase 2
`SLACK_WEBHOOK_URL`	Deployment notifications (GitHub Actions only)	Phase 2
`SENTRY_AUTH_TOKEN`	Sentry release management	Phase 3
`SENTRY_ORG`	Sentry organization slug	Phase 3
`SENTRY_PROJECT`	Sentry project slug	Phase 3

Note on Slack Webhooks: This webhook is used only for GitHub Actions deployment notifications. Axiom (Phase 5) and Checkly (Phase 6) require separate Slack integrations configured in their respective dashboards. You can use the same Slack channel but each service needs its own webhook/integration.

Enable Status Checks

📝 After CI workflow runs once

Update branch protection rules to require the CI / ci status check for both main and staging branches.

Verification Checklist

🛑 STOP: Complete before Phase 3

GitHub variables configured: STAGING_API_URL, PRODUCTION_API_URL, SENTRY_ENABLED
.github/workflows/ci.yml created and committed
.github/workflows/deploy-staging.yml created and committed
.github/workflows/deploy-production.yml created and committed
GitHub secrets configured: CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_API_TOKEN
GitHub secrets configured: STAGING_API_KEY, SLACK_WEBHOOK_URL
Test PR to staging triggers CI workflow
Update main branch protection to require CI / ci status check
Update staging branch protection to require CI / ci status check
Test: PR with failing CI cannot be merged
Staging deployment succeeds and smoke test passes
Slack notifications received

3

Sentry Error Tracking

Time: Day 3 (~2-3 hours)

Install Sentry SDK

pnpm add @sentry/cloudflare

Configure wrangler.toml

[version_metadata]
binding = "CF_VERSION_METADATA"

Update Type Definitions

src/config/bindings.ts

export interface Env {
  // ... existing bindings ...

  /** Environment name (staging, production) - set in wrangler.toml */
  ENVIRONMENT?: string;

  /** Sentry DSN for error reporting */
  SENTRY_DSN?: string;

  /** Cloudflare Worker version metadata (auto-populated) */
  CF_VERSION_METADATA?: {
    id: string;
    tag: string;
    timestamp: string;
  };
}

Wrap Worker Export

src/index.ts

import * as Sentry from '@sentry/cloudflare';
import { app } from './app';
import { scheduled } from './scheduled';
import type { Env } from './config/bindings';

export default Sentry.withSentry(
  (env: Env) => ({
    dsn: env.SENTRY_DSN,
    release: env.CF_VERSION_METADATA?.id,
    environment: env.ENVIRONMENT ?? 'development',
    tracesSampleRate: env.ENVIRONMENT === 'production' ? 0.1 : 1.0,
    sendDefaultPii: false,
  }),
  {
    fetch: app.fetch,
    scheduled,
  } as ExportedHandler<Env>
);

Error Handler Integration

Update src/http/middleware/error-handler.ts to capture exceptions:

src/http/middleware/error-handler.ts

import * as Sentry from '@sentry/cloudflare';

// Inside the error handler, before returning the response:
Sentry.captureException(err, {
  extra: {
    requestId,
    path: instance,
    method: c.req.method,
  },
  tags: {
    endpoint: c.req.path,
    environment: c.env.ENVIRONMENT,
  },
});

Set Sentry DSN Secrets

# Get DSN from Sentry project settings
wrangler secret put SENTRY_DSN --env staging
wrangler secret put SENTRY_DSN --env production

Enable Sentry in CI/CD

Add GitHub secrets: SENTRY_AUTH_TOKEN, SENTRY_ORG, SENTRY_PROJECT
Change GitHub variable SENTRY_ENABLED from false to true

Verification Checklist

🛑 STOP: Complete before Phase 4

@sentry/cloudflare installed in package.json
wrangler.toml has [version_metadata] binding
src/config/bindings.ts has SENTRY_DSN and CF_VERSION_METADATA types
src/index.ts wrapped with Sentry.withSentry()
SENTRY_DSN secret set for staging environment
SENTRY_DSN secret set for production environment
Deploy to staging and trigger a test error
Verify error appears in Sentry dashboard
GitHub variable SENTRY_ENABLED changed to true

4

OpenTelemetry Tracing

Time: Day 4 (~2-3 hours)

Enable Cloudflare Automatic Tracing

[observability]
enabled = true

[observability.logs]
enabled = true
invocation_logs = true
head_sampling_rate = 1

[observability.tracing]
enabled = true
head_sampling_rate = 1

Configure OTEL Export to Axiom

Navigate to Workers & Pages → Your Worker → Settings → Observability
Under "Trace export", click "Add destination"
Select "HTTP" as destination type
Configure:
- Endpoint: https://api.axiom.co/v1/traces
- Header: Authorization: Bearer <AXIOM_API_TOKEN>
- Header: X-Axiom-Dataset: email-finder-traces

Verification Checklist

🛑 STOP: Complete before Phase 5

wrangler.toml has [observability] section with tracing enabled
Deploy to staging with observability enabled
Cloudflare Dashboard shows traces in Workers → Observability
Axiom trace export configured in Cloudflare Dashboard
Make several API requests to staging
Verify traces appear in Axiom dataset

5

Axiom Dashboards & Monitors

Time: Day 4 (~2-3 hours)

Create Dashboards

Dashboard	Metrics
Operations	Requests/min, success/failure rate, response times (p50/p95/p99), cache hit ratio
Cost Analytics	Cost per request by validator, daily/weekly trends, validator efficiency
Infrastructure Health	Circuit breaker states, rate limit triggers, DB query latency

Create Monitors

Monitor	Condition	Action
High Error Rate	error_rate > 5% for 5m	Slack alert
Slow Response	p95_latency > 10s for 5m	Slack alert
Circuit Open	circuit_state = 'open'	Slack alert
Provider Down	provider_errors > 10 in 1m	Slack alert

Axiom Slack Integration Setup

In Axiom Dashboard, go to Settings → Integrations
Add Slack integration (OAuth or Webhook)
Select the #email-finder-alerts channel
Test the integration with a sample alert

Note: This is a separate integration from the GitHub Actions Slack webhook. Axiom manages its own Slack connection.

Verification Checklist

🛑 STOP: Complete before Phase 6

Operations Dashboard created in Axiom
Cost Analytics Dashboard created in Axiom
Infrastructure Health Dashboard created in Axiom
All 4 monitors configured
Slack integration configured for alerts
Test alert by temporarily lowering threshold

6

Checkly Synthetic Monitoring

Time: Day 5 (~2-3 hours)

API Check Strategy

Cache-hit checks (every 5 minutes): Use consistent test data that hits cache
Cache-bypass checks (every 60 minutes): Use bypass_cache=true for full validation

Check Volume Summary

Check	Frequency	Regions	Daily Volume	External Cost
Health	1 min	4	5,760/day	None
Find (cached)	5 min	1	288/day	None
Find (bypass)	60 min	1	24/day	Yes
Validate (cached)	5 min	1	288/day	None
Validate (bypass)	60 min	1	24/day	Yes

Total external API consumption: 48 calls/day

Checkly Slack Integration Setup

In Checkly Dashboard, go to Alerts → Alert Channels
Add Slack channel integration
Authorize Checkly to post to #email-finder-alerts
Configure alert conditions (failure, recovery, degraded)
Test with a manual alert

Note: This is a separate integration from GitHub Actions and Axiom. Each service maintains its own Slack connection for reliability.

Verification Checklist

🛑 STOP: Complete before Phase 7

Checkly account created and configured
Health check created (1 min frequency, 4 regions)
Find cached check created (5 min frequency)
Find bypass check created (60 min frequency)
Validate cached check created (5 min frequency)
Validate bypass check created (60 min frequency)
All checks passing for staging and production
Slack alert channel configured

7

K6 Load Testing

Time: Day 5 (~2 hours)

Create Load Test Scripts

Create tests/load/ directory with three scripts:

tests/load/smoke-test.js

import http from 'k6/http';
import { check, sleep } from 'k6';

const API_URL = __ENV.API_URL;
const API_KEY = __ENV.API_KEY || '';

export const options = {
  vus: 5,
  duration: '30s',
  thresholds: {
    http_req_duration: ['p(95)<2000'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const healthRes = http.get(`${API_URL}/health/ready`);
  check(healthRes, {
    'health status is 200': (r) => r.status === 200,
  });
  sleep(1);
}

Load Test Script

tests/load/load-test.js

import http from 'k6/http';
import { check, sleep } from 'k6';

const API_URL = __ENV.API_URL;
const API_KEY = __ENV.API_KEY || '';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up to 50 users
    { duration: '5m', target: 50 },   // Stay at 50 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<3000'],
    http_req_failed: ['rate<0.05'],
  },
};

export default function () {
  const healthRes = http.get(`${API_URL}/health/ready`);
  check(healthRes, {
    'health status is 200': (r) => r.status === 200,
  });

  if (API_KEY) {
    const findRes = http.post(
      `${API_URL}/find`,
      JSON.stringify({
        full_name: `User ${__VU}-${__ITER}`,
        domain: 'k6-load-test.example.com',
      }),
      {
        headers: {
          'Content-Type': 'application/json',
          'X-API-Key': API_KEY,
        },
      }
    );
    check(findRes, {
      'find responds': (r) => r.status === 200 || r.status === 404,
    });
  }

  sleep(1);
}

Stress Test Script

tests/load/stress-test.js

/**
 * WARNING: This test should ONLY run against STAGING with mocked external providers.
 * Running against production will consume significant external API credits.
 */
import http from 'k6/http';
import { check, sleep } from 'k6';

const API_URL = __ENV.API_URL;
const API_KEY = __ENV.API_KEY || '';

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Warm up
    { duration: '3m', target: 100 },  // Increase load
    { duration: '3m', target: 150 },  // Push further
    { duration: '3m', target: 200 },  // Near limit
    { duration: '2m', target: 0 },    // Recovery
  ],
  thresholds: {
    http_req_duration: ['p(95)<5000'],  // More lenient
    http_req_failed: ['rate<0.10'],     // Allow up to 10% failures
  },
};

export default function () {
  const healthRes = http.get(`${API_URL}/health/ready`);
  check(healthRes, {
    'health status is 200': (r) => r.status === 200,
  });

  // Only test find endpoint occasionally (1 in 10 iterations)
  if (API_KEY && __ITER % 10 === 0) {
    const findRes = http.post(
      `${API_URL}/find`,
      JSON.stringify({
        full_name: `Stress User ${__VU}`,
        domain: 'k6-stress-test.example.com',
      }),
      {
        headers: {
          'Content-Type': 'application/json',
          'X-API-Key': API_KEY,
        },
      }
    );
    check(findRes, {
      'find responds under stress': (r) => 
        r.status === 200 || r.status === 404 || r.status === 429,
    });
  }

  sleep(0.5);
}

Add K6 to CI/CD Pipeline

Add this step to deploy-staging.yml after E2E tests:

- name: Run K6 Smoke Test
  uses: grafana/k6-action@v0.3.1
  with:
    filename: tests/load/smoke-test.js
  env:
    API_URL: ${{ vars.STAGING_API_URL }}
    API_KEY: ${{ secrets.STAGING_API_KEY }}

Running Load Tests Manually

# Smoke test (CI)
k6 run tests/load/smoke-test.js -e API_URL=https://... -e API_KEY=...

# Load test (performance baseline)
k6 run tests/load/load-test.js -e API_URL=https://... -e API_KEY=...

# Stress test (staging only!)
k6 run tests/load/stress-test.js -e API_URL=https://staging... -e API_KEY=...

⚠️ Warning: Stress Tests

Stress tests should ONLY run against STAGING with mocked external providers. Running against production will consume significant external API credits.

Verification Checklist

🛑 STOP: Complete before Phase 8

tests/load/ directory created
smoke-test.js created
load-test.js created
stress-test.js created with warning comment
Run smoke test locally: all checks pass
K6 step added to deploy-staging.yml
K6 smoke test passes in CI pipeline

8

CodeRabbit AI Code Review

Time: Day 6 (~1-2 hours)

Install CodeRabbit GitHub App

Go to CodeRabbit GitHub App
Install on LeadMagic/email-finder-service repository
Grant permissions: read code, write PR comments

Create Configuration File

.coderabbit.yaml

language: en
early_access: false
reviews:
  auto_review:
    enabled: true
    drafts: false
    base_branches:
      - main
      - staging
  request_changes_workflow: false
  high_level_summary: true
  poem: false
  review_status: true
  path_filters:
    - '!**/pnpm-lock.yaml'
    - '!**/package-lock.json'
    - '!**/*.md'
  path_instructions:
    - path: 'src/**/*.ts'
      instructions: |
        Focus on:
        - Type safety and proper error handling
        - Performance implications for Cloudflare Workers
        - Security vulnerabilities
        - Proper async/await handling
    - path: 'tests/**/*.ts'
      instructions: |
        Focus on:
        - Test coverage completeness
        - Edge case handling
        - Mock correctness
chat:
  auto_reply: true

Verification Checklist

🛑 STOP: Complete before Phase 9

CodeRabbit GitHub App installed on repository
.coderabbit.yaml created and committed
Create test PR with a code change
CodeRabbit posts automatic review comment
Review contains high-level summary

9

Enhanced Test Suite

Time: Day 6 (~2 hours)

Coverage Requirements

vitest.config.ts

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    include: ['tests/**/*.test.ts'],
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      exclude: ['node_modules/', 'tests/', '**/*.d.ts', '**/*.config.*'],
      thresholds: {
        statements: 70,
        branches: 60,
        functions: 70,
        lines: 70,
      },
    },
  },
});

Test Data Factories

tests/factories/email.factory.ts

export function createFindRequest(overrides = {}) {
  return {
    full_name: 'Test User',
    domain: 'example.com',
    ...overrides,
  };
}

export function createValidateRequest(overrides = {}) {
  return {
    email: 'test@example.com',
    ...overrides,
  };
}

Verification Checklist

✅ All Phases Complete!

vitest.config.ts updated with coverage thresholds
tests/factories/ directory created
Factory functions created for common test data
Run pnpm test - all tests pass
Run pnpm test -- --coverage - coverage meets thresholds
Coverage report generated in coverage/ directory

📅Implementation Order

Day	Phase	Duration	Key Deliverables
Day 1 AM	Phase 0: Git Flow	~2-3h	CONTRIBUTING.md, branch protection, staging branch
Day 1 PM	Phase 1: Infrastructure (CRITICAL)	~3-4h	8 KV namespaces, Hyperdrive, wrangler.toml updates
Day 2	Phase 2: CI/CD Pipeline	~4-5h	3 GitHub workflows, secrets, status checks enabled
Day 3	Phase 3: Sentry Integration	~2-3h	SDK installed, error handler integrated, DSN secrets set
Day 4	Phase 4 & 5: OTEL + Axiom	~4-6h	Tracing enabled, 3 dashboards, 4 monitors
Day 5	Phase 6 & 7: Checkly + K6	~4-5h	5 Checkly checks, 3 K6 scripts, K6 in CI
Day 6	Phase 8 & 9: CodeRabbit + Tests	~3-4h	CodeRabbit app, coverage thresholds, test factories

📊 Timeline Comparison

Approach	Estimated Time	Notes
Manual coding	~5 weeks	Traditional development without AI assistance
AI-assisted (Claude + Cursor)	~1-2 weeks	3-4x faster - recommended approach

📝 Phase Dependencies

Sequential (must be in order): Phase 0 → Phase 1 → Phase 2
Parallelizable (after Phase 2): Phases 3-9 can be done in any order or simultaneously

🔄Rollback Procedures

Deployment Rollback

# List recent deployments
wrangler deployments list --env production

# Rollback to specific deployment
wrangler rollback --env production --deployment-id <ID>

# Or rollback to previous deployment
wrangler rollback --env production

Rollback Decision Matrix

Symptom	Severity	Action
Error rate > 10%	P1	Immediate rollback
Latency > 10s (p95)	P1	Immediate rollback
Error rate 5-10%	P2	Investigate, rollback if not resolved in 15 min
Feature not working	P3	Investigate, hotfix if possible
Minor issue	P4	Fix forward in next deployment

🚨Incident Response

Severity Levels

Level	Description	Response Time	Examples
P1	Service down, all customers affected	15 minutes	500 errors, deployment failure
P2	Degraded service, partial impact	1 hour	Slow responses, one provider down
P3	Minor issue, workaround available	4 hours	Non-critical feature broken
P4	Cosmetic, no functional impact	Next sprint	Typo in response

Incident Response Process

1. DETECT → Alert triggered or customer report
2. ASSESS → Check Sentry, Axiom, Cloudflare; determine severity
3. COMMUNICATE → Notify team in Slack #email-finder-alerts
4. MITIGATE → Rollback if deployment caused; enable circuit breaker if provider issue
5. RESOLVE → Identify root cause, implement fix, deploy via normal process
6. POST-MORTEM → Document timeline, define action items (P1/P2 only)

💻Local Development Guide

Initial Setup

# Clone repository
git clone git@github.com:LeadMagic/email-finder-service.git
cd email-finder-service

# Install dependencies
pnpm install

# Verify setup
pnpm typecheck && pnpm lint && pnpm test

Environment Configuration

Create a .dev.vars file in the repository root:

API_KEYS=["dev-key-12345"]
AXIOM_API_TOKEN=xaat-xxx
SOCKS5_PROXY_USER=xxx
SOCKS5_PROXY_PASS=xxx
OCHECKER_API_KEY=xxx
NO2BOUNCE_API_KEY=xxx
MILLIONVERIFIER_API_KEY=xxx
BYTEMINE_API_TOKEN=xxx

Running Locally

# Start development server
pnpm dev
# → Server running at http://localhost:8787

# Test health endpoint
curl http://localhost:8787/health/ready

Common Development Tasks

Task	Command
Format code	`pnpm format`
Lint code	`pnpm lint`
Fix lint issues	`pnpm lint:fix`
Type check	`pnpm typecheck`
Run tests	`pnpm test`
Run E2E tests	`pnpm test:e2e`
Check dead code	`pnpm knip`

🔧Troubleshooting Guide

Common Issues

1. "Wrangler not authenticated"

# Solution:
wrangler login
# Or set environment variable:
export CLOUDFLARE_API_TOKEN="your-token"

2. "KV namespace not found"

# Verify namespace exists:
wrangler kv namespace list
# Compare IDs with wrangler.toml

3. "Hyperdrive connection failed"

# 1. Verify Hyperdrive exists
wrangler hyperdrive list

# 2. Test database connection directly
psql "$DATABASE_URL"

# 3. Recreate if needed
wrangler hyperdrive create email-finder-staging --connection-string="..."

4. "CI workflow failing"

Error	Cause	Solution
Type error	TypeScript issue	Run `pnpm typecheck` locally
Lint error	Code style issue	Run `pnpm lint:fix`
Test failed	Broken test	Run `pnpm test` locally
Deploy failed	Missing secret	Add secret to GitHub repository

🔐Environment Variables Summary

GitHub Repository Variables

Variable	Initial Value	Purpose
`STAGING_API_URL`	`https://email-finder-service-staging.*.workers.dev`	Staging endpoint
`PRODUCTION_API_URL`	`https://email-finder-service.*.workers.dev`	Production endpoint
`SENTRY_ENABLED`	`false`	Controls Sentry releases

GitHub Actions Secrets

Secret	Purpose	When to Add
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare account	Phase 2
`CLOUDFLARE_API_TOKEN`	Wrangler auth	Phase 2
`STAGING_API_KEY`	E2E tests	Phase 2
`SLACK_WEBHOOK_URL`	Notifications	Phase 2
`SENTRY_AUTH_TOKEN`	Sentry releases	Phase 3
`SENTRY_ORG`	Sentry org slug	Phase 3
`SENTRY_PROJECT`	Sentry project slug	Phase 3

Cloudflare Worker Secrets

Set via wrangler secret put <NAME> --env <ENV>

Secret	Purpose	When to Add
`API_KEYS`	Application API keys	Phase 1
`AXIOM_API_TOKEN`	Axiom logging	Phase 1
`SOCKS5_PROXY_USER`	SMTP proxy credentials	Phase 1
`SOCKS5_PROXY_PASS`	SMTP proxy credentials	Phase 1
`OCHECKER_API_KEY`	OChecker validation	Phase 1
`NO2BOUNCE_API_KEY`	No2Bounce validation	Phase 1
`MILLIONVERIFIER_API_KEY`	MillionVerifier validation	Phase 1
`BYTEMINE_API_TOKEN`	ByteMine enrichment	Phase 1
`SENTRY_DSN`	Sentry project DSN	Phase 3

📖Glossary

Term	Definition
Circuit Breaker	Pattern that prevents cascading failures by stopping requests to failing services
Cold Start	Initial latency when a Worker instance is first created
DSN	Data Source Name - connection string for Sentry
E2E Test	End-to-end test that validates entire user flows
Hono	Lightweight web framework for Cloudflare Workers
Hyperdrive	Cloudflare's database connection pooling service
KV	Cloudflare's key-value storage service
MTTR	Mean Time To Recovery - average time to recover from failure
OTEL	OpenTelemetry - standard for distributed tracing
SLI/SLO	Service Level Indicator/Objective - metrics and targets for service health
Smoke Test	Quick test to verify basic functionality after deployment
Synthetic Monitoring	Automated tests that simulate user behavior
TTL	Time To Live - expiration time for cached data
Wrangler	Cloudflare's CLI tool for Workers development

⚠️Risk Assessment

Implementation Risks

Risk	Probability	Impact	Mitigation
Shared KV data corruption during migration	Medium	High	Phase 1 creates new namespaces (no migration needed)
CI/CD breaks existing workflow	Low	Medium	Gradual rollout, status checks added after CI works
Sentry adds latency	Low	Low	Async error reporting, minimal overhead
Checkly costs exceed budget	Low	Low	Optimized check frequency, 48 external calls/day
Team resistance to new workflow	Medium	Medium	Clear documentation, training session

Rollback Difficulty

Component	Difficulty	Time	Notes
Worker code	Easy	< 1 min	Cloudflare instant rollback
KV namespaces	Medium	5-10 min	Update wrangler.toml + deploy
Secrets	Easy	< 2 min	Re-set via wrangler CLI
GitHub workflows	Easy	< 2 min	Git revert + push
Branch protection	Easy	< 2 min	GitHub UI changes

🎯Success Metrics

Metric	Target
CI/CD	All deployments automated, < 10 min pipeline
Error Tracking	< 5 min MTTR for P1 issues
Monitoring	99.9% uptime visibility
Testing	> 70% code coverage
Code Review	100% PR coverage with AI review
Git Flow	100% compliance with branch protection rules

📁Files to Create/Modify

Files to Create

CONTRIBUTING.md
.github/
  workflows/
    ci.yml
    deploy-staging.yml
    deploy-production.yml
.coderabbit.yaml
tests/
  load/
    smoke-test.js
    load-test.js
    stress-test.js
  factories/
    email.factory.ts
  integration/
    (new integration tests)

Files to Modify

File	Changes	Phase
`wrangler.toml`	Add staging/production KV IDs, ENVIRONMENT vars, version_metadata, observability	1, 3, 4
`src/index.ts`	Wrap export with Sentry.withSentry()	3
`src/config/bindings.ts`	Add ENVIRONMENT, SENTRY_DSN, CF_VERSION_METADATA types	3
`src/http/middleware/error-handler.ts`	Add Sentry.captureException()	3
`vitest.config.ts`	Add coverage thresholds	9
`package.json`	Add @sentry/cloudflare dependency	3
`.github/workflows/deploy-staging.yml`	Add K6 smoke test step	7

📚References

Official Documentation

Resource	URL
Cloudflare Workers	developers.cloudflare.com/workers/
Cloudflare KV	developers.cloudflare.com/kv/
Wrangler CLI	developers.cloudflare.com/workers/wrangler/
Hono Framework	hono.dev
Vitest	vitest.dev
GitHub Actions	docs.github.com/en/actions
Sentry for Cloudflare	docs.sentry.io
Axiom	axiom.co/docs
Checkly	checklyhq.com/docs
K6	k6.io/docs
CodeRabbit	docs.coderabbit.ai
Biome	biomejs.dev

🔮Future Improvements

Planned improvements after all phases are implemented and stabilized:

Secrets Management

🔐 Doppler Integration

Centralize all secrets in Doppler for unified management with version history, audit trail, and automated rotation.

When to adopt: When team grows beyond 5 developers or compliance requires audit trails.

Infrastructure Improvements

Improvement	Benefit	Trigger
Separate Staging Database	Full data isolation, safe schema testing	High Priority
Preview Environments	Ephemeral environments per PR	PR queue > 3
Canary Deployments	Gradual rollout with auto-rollback	Deployment incidents > 2/quarter
Feature Flags	Runtime toggles without deployments	Need A/B testing

Enhanced Observability

Improvement	Description	Priority
Public Status Page	Real-time incident communication	P2
SLO/SLI Dashboards	Formal service level objectives with error budgets	P2
OpenAPI Documentation	Auto-generated API docs with interactive explorer	P2

Testing Enhancements

Tool	Purpose	Priority
Pact	Consumer-driven contract testing	P4
Stryker	Mutation testing for test quality	P4
Chaos Engineering	Deliberately inject failures to test resilience	P4

Security Enhancements

Tool	Purpose	Priority
Dependabot	Automated dependency updates	P1
Snyk	Deep vulnerability analysis	P2
Gitleaks	Secret detection in code	P2

📄 Document Information

Version: 1.0.0 | Last Updated: January 2026

For the complete markdown source, download detailed_plan.md

Testing & Monitoring SuiteComplete Documentation

📑 Table of Contents

🤖 AI-Accelerated Timeline

⚡Quick Start (TL;DR)

📋Prerequisites

Required Access

Required Tools (Local Machine)

Pre-Implementation Checklist

test:e2e Script Setup

🔍Current State Analysis

Current Stack

Gaps Identified

🛠️Tool Stack Summary

Git Flow Overview

Branch Naming Conventions

Mandatory Rules

Branch Protection Configuration

Commit Message Convention

Verification Checklist

Create Staging KV Namespaces

Create Production KV Namespaces

Create Staging Hyperdrive

Update wrangler.toml

Set Staging Secrets

Set Production Secrets

Verification Checklist

GitHub Repository Variables

Create CI Workflow

Create Deploy Staging Workflow

Create Deploy Production Workflow

Required GitHub Secrets

Enable Status Checks

Verification Checklist

Install Sentry SDK

Configure wrangler.toml

Update Type Definitions

Wrap Worker Export

Error Handler Integration

Set Sentry DSN Secrets

Enable Sentry in CI/CD

Verification Checklist

Enable Cloudflare Automatic Tracing

Configure OTEL Export to Axiom

Verification Checklist

Create Dashboards

Create Monitors

Axiom Slack Integration Setup

Verification Checklist

API Check Strategy

Check Volume Summary

Checkly Slack Integration Setup

Verification Checklist

Create Load Test Scripts

Load Test Script

Stress Test Script

Add K6 to CI/CD Pipeline

Running Load Tests Manually

Verification Checklist

Install CodeRabbit GitHub App

Create Configuration File

Verification Checklist

Coverage Requirements

Test Data Factories

Verification Checklist

📅Implementation Order

📊 Timeline Comparison

🔄Rollback Procedures

Deployment Rollback

Rollback Decision Matrix

🚨Incident Response

Severity Levels

Incident Response Process

💻Local Development Guide

Initial Setup

Environment Configuration

Running Locally

Common Development Tasks

🔧Troubleshooting Guide

Common Issues

1. "Wrangler not authenticated"

Testing & Monitoring Suite
Complete Documentation