System Architecture

Technical design, data flow, and infrastructure overview

Design Principles

RayRay is architected around four core principles that ensure compliance, auditability, and operational efficiency:

  1. Spatial Evidence Traceability — Every AI extraction maintains coordinate-based links to source text via bounding boxes
  2. Mandatory Human-in-the-Loop — LangGraph checkpoints enforce human review before any database write
  3. Immutable Audit Logs — Blockchain-style hash chains provide tamper-evident event history
  4. Labor Optimization Metrics — Automated time-savings calculation for agency ROI reporting

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         RAYRAY PLATFORM                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐        │
│  │   Frontend   │────▶│   Backend    │────▶│  PostgreSQL  │        │
│  │  Next.js 14  │     │   FastAPI    │     │   Database   │        │
│  │  React 18    │◀────│  Python 3.11 │◀────│   (Async)    │        │
│  └──────────────┘     └──────────────┘     └──────────────┘        │
│         │                    │                    │                 │
│         │                    ▼                    │                 │
│         │            ┌──────────────┐            │                 │
│         │            │  LangGraph   │            │                 │
│         │            │  Workflow    │            │                 │
│         │            └──────────────┘            │                 │
│         │                    │                    │                 │
│         │                    ▼                    │                 │
│         │            ┌──────────────┐            │                 │
│         │            │  LLM APIs    │            │                 │
│         │            │ Anthropic/   │            │                 │
│         │            │ OpenAI       │            │                 │
│         │            └──────────────┘            │                 │
│         │                                         │                 │
│         └────────────▶┌──────────────┐◀──────────┘                 │
│                        │  Audit Log   │                             │
│                        │  (Append-Only│                             │
│                        │   SHA-256)   │                             │
│                        └──────────────┘                             │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Extraction Workflow

The extraction pipeline enforces M-25-21 compliance through a multi-stage workflow with mandatory human checkpoints:

Stage 1: UPLOAD
    └── PDF received → Stored with metadata → Audit logged

Stage 2: PARSE
    └── PyMuPDF extracts text blocks with bounding boxes
    └── Each block: {id, text, x0, y0, x1, y1, page_number}

Stage 3: EXTRACT (AI-Assisted)
    └── LLM processes full document text
    └── Returns structured extractions with confidence scores
    └── Each extraction linked to source text blocks

Stage 4: VALIDATE (Redundancy Check)
    └── Confidence threshold check (< 60% = flagged)
    └── Duplicate detection (similarity > 80%)
    └── Required field validation
    └── Source coverage check

Stage 5: CHECKPOINT (Human Review) ◀── M-25-21 MANDATORY
    └── Workflow pauses
    └── Human reviews flagged and unflagged items
    └── Actions: APPROVE / MODIFY / REJECT

Stage 6: COMMIT
    └── Only after human approval
    └── Original values preserved if modified
    └── Audit log updated with reviewer info

Component Architecture

Frontend Stack

ComponentTechnologyPurpose
FrameworkNext.js 14 (App Router)SSR, routing, API routes
UI LibraryReact 18Component model
StylingTailwind CSSUtility-first CSS
PDF Viewerreact-pdf-viewerDocument display with overlays
StateReact ContextAuth state, theme

Backend Stack

ComponentTechnologyPurpose
FrameworkFastAPIAsync API server
ORMSQLAlchemy 2.0Async database access
WorkflowLangGraphHITL checkpoint enforcement
PDF ProcessingPyMuPDFText extraction with coordinates
Authpython-jose + passlibJWT + bcrypt
MFApyotpTOTP (RFC 6238)

Data Model

Core Entities

Document
├── id: int (PK)
├── filename: str
├── file_path: str
├── status: enum [pending, processing, reviewed, committed]
├── uploaded_by: int (FK → User)
└── created_at: datetime

Page
├── id: int (PK)
├── document_id: int (FK)
├── page_number: int
└── dimensions: {width, height}

TextBlock
├── id: int (PK)
├── page_id: int (FK)
├── text: str
├── x0, y0, x1, y1: float  ◀── Bounding box coordinates
└── confidence: float

Extraction
├── id: int (PK)
├── document_id: int (FK)
├── label: str
├── value: str
├── original_value: str      ◀── AI output (preserved)
├── confidence: float
├── review_status: enum [pending, approved, rejected]
├── reviewed_by: int (FK)    ◀── M-25-21: Human verifier
├── reviewed_at: datetime
├── workflow_run_id: str     ◀── Traceability
└── workflow_checkpoint: str

ExtractionSource (link table)
├── extraction_id: int (FK)
├── text_block_id: int (FK)  ◀── Spatial evidence link
└── relevance_score: float

AuditLog
├── _id: uuid
├── _timestamp: datetime
├── _prev_hash: str          ◀── Chain integrity
├── _hash: str
├── event_type: str
├── user_id: int
├── resource_type: str
├── resource_id: int
├── outcome: enum [success, failure]
└── details: json

Relationship Diagram

Document 1──────* Page 1──────* TextBlock
    │                              │
    │                              │
    └──────1* Extraction *────────┘
                  │         (via ExtractionSource)
                  │
                  *1
               Reviewer (User)

User ──────* AuditLog

LangGraph Workflow Implementation

from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

# Define state
class ExtractionState(TypedDict):
    document_id: int
    extractions: List[Dict]
    review_status: str  # pending, approved, rejected
    reviewer_id: Optional[str]
    ...

# Build workflow
workflow = StateGraph(ExtractionState)
workflow.add_node("extract", extract_from_pdf)
workflow.add_node("redundancy_check", redundancy_check_node)
workflow.add_node("human_review", human_review_node)  # CHECKPOINT
workflow.add_node("commit", commit_to_database)

# Edges
workflow.add_edge("extract", "redundancy_check")
workflow.add_edge("redundancy_check", "human_review")
workflow.add_conditional_edges("human_review", should_commit, {
    "commit": "commit",
    "reject": END,
    "pending": END,  # Pause for human input
})
workflow.add_edge("commit", END)

# Compile with checkpointing
checkpointer = MemorySaver()  # Use PostgresSaver in production
app = workflow.compile(checkpointer=checkpointer)

Key Implementation Details

  • Checkpoint Enforcement: The workflow pauses at human_review and cannot proceed until external input is provided via the review API
  • State Persistence: Production deployments should use PostgresSaver for durable checkpoint storage
  • Redundancy Checks: Pre-review validation flags low-confidence items (thresholds configurable)

Frontend Route Structure

apps/web/src/app/
├── layout.tsx              # Root layout (fonts, providers)
├── (marketing)/            # Public pages — no auth required
│   ├── page.tsx            # Landing (/)
│   └── login/page.tsx      # Sign in (/login)
├── (app)/                  # Protected pages — auth required
│   ├── layout.tsx          # App layout with nav + footer
│   ├── dashboard/          # Main dashboard
│   ├── documents/          # Document list + detail
│   ├── compare/            # Document comparison
│   ├── settings/           # User settings + MFA
│   └── profile/            # User profile
└── docs/                   # Documentation (this site)

Security Architecture

Authentication Flow

1. User submits credentials (email + password)
2. Backend verifies against bcrypt hash
3. If MFA enabled:
   a. Return partial JWT with mfa=false
   b. User submits TOTP code
   c. Verify against stored secret
   d. Issue full JWT with mfa=true
4. JWT stored in httpOnly cookie + localStorage
5. Subsequent requests include Authorization header

Authorization Model

RoleCan CommitCan Manage UsersNIST Mapping
ObserverNoNoAC-3 (Read-only)
AnalystNoNoAC-3 (Drafts only)
ReviewerYesNoAC-3 (Full CRUD)
AdminYesYesAC-3 (Privileged)

Infrastructure Requirements

Development

# Minimum requirements
- Node.js 18+
- Python 3.11+
- PostgreSQL 15+
- 4GB RAM

# Start services
docker-compose up -d db
cd apps/api && uvicorn app.main:app --reload
cd apps/web && npm run dev

Production

ComponentRecommendationNotes
DatabaseManaged PostgreSQL (RDS, Cloud SQL)Enable point-in-time recovery
API HostingContainerized (ECS, Cloud Run)Auto-scaling recommended
FrontendVercel, Cloudflare PagesEdge caching for static assets
File StorageS3, GCS with encryptionServer-side encryption required
SecretsVault, AWS Secrets ManagerNever commit to code

Related Documentation