Document Comparison & Scoring

Multi-Criteria Decision Analysis with Full Audit Trail

Overview

RayRay's comparison engine enables intelligence analysts and decision makers to evaluate multiple systems against configurable scoring profiles. The engine produces ranked results with full transparency into score calculations, supporting M-25-21 compliance through human-in-the-loop verification and immutable audit trails.

Key Capabilities

Configurable Scoring Profiles — Define what matters: weights, normalization methods, and target values
Multi-Document Comparison — Compare 2-4 systems simultaneously
Score Breakdown — Full transparency into how each score was calculated
Export Formats — JSON, CSV, and PowerPoint briefing exports
Profile Snapshots — Audit trail preserves scoring configuration at comparison time

Supported Comparison Types

Comparison Type	Use Case	Example Attributes
System Performance	Comparing technical specifications	Range, accuracy, reliability, power output
Cost-Benefit Analysis	Budget-constrained procurement	Unit cost, maintenance cost, lifecycle cost
Operational Suitability	Deployment feasibility	Weight, deployment time, crew requirements
Mission-Specific	Custom weighted profiles	User-defined attributes and weights

Scoring Engine Architecture

The scoring engine (ScoringService) implements a three-phase calculation pipeline that transforms raw extracted values into normalized, weighted scores.

Phase 1: Value Parsing

Raw extraction values (strings like "50km" or "$2.5M") are parsed into numeric values with unit normalization. The parser handles multiple unit formats:

# Distance units
"50km" → 50.0 (kilometers)
"30 miles" → 48.28 (converted to km)
"100nm" → 185.2 (nautical miles to km)

# Currency units (normalized to millions)
"$2,500,000" → 2.5
"$2.5M" → 2.5
"$500K" → 0.5

# Time units (normalized to months)
"6 months" → 6.0
"2 years" → 24.0
"18 mo" → 18.0

Phase 2: Normalization

Raw numeric values are normalized to a 0-1 scale using the configured method. RayRay supports three normalization modes:

MAX Normalization (Higher is Better)

Used when higher values indicate better performance (e.g., range, accuracy).

normalized = (value - min_value) / (max_value - min_value)

Example: Detection range comparison
  System A: 150km → normalized = 0.71
  System B: 120km → normalized = 0.43
  System C: 180km → normalized = 1.00

Where min=120, max=180

MIN Normalization (Lower is Better)

Used when lower values indicate better performance (e.g., cost, weight, deployment time).

normalized = (max_value - value) / (max_value - min_value)

Example: Unit cost comparison
  System A: $8.5M  → normalized = 0.64
  System B: $6.2M  → normalized = 1.00 (best)
  System C: $12.0M → normalized = 0.00 (worst)

Where min=6.2, max=12.0

TARGET Normalization (Closer to Target is Better)

Used when an optimal value exists and deviation in either direction is undesirable.

normalized = 1.0 - |value - target| / range

Example: Operating frequency with target 9.5 GHz
  System A: 9.2 GHz → normalized = 0.90
  System B: 9.5 GHz → normalized = 1.00 (optimal)
  System C: 10.1 GHz → normalized = 0.70

Where target=9.5, min=9.0, max=10.5, range=1.5

Phase 3: Weighted Score Calculation

Normalized values are multiplied by attribute weights and summed to produce the final score on a 0-100 scale:

final_score = Σ(normalized_value × weight) × 100

Example: "Mission Priority" profile
  detection_range: weight=0.30, normalized=0.71 → weighted=0.213
  accuracy:        weight=0.25, normalized=0.95 → weighted=0.238
  reliability:     weight=0.10, normalized=0.80 → weighted=0.080
  cost:            weight=0.15, normalized=0.64 → weighted=0.096
  weight_kg:       weight=0.05, normalized=0.60 → weighted=0.030
  power_output:    weight=0.10, normalized=0.80 → weighted=0.080
  frequency_band:  weight=0.05, normalized=0.50 → weighted=0.025
                                    Total = 0.762
                              Final Score = 76.2

Confidence Adjustment

When extraction confidence falls below 70%, the scoring engine applies a graduated penalty to account for data quality uncertainty:

if confidence < 0.70:
    penalty = 1.0 - (0.1 × (1.0 - confidence))
    normalized = normalized × penalty

Example:
  confidence = 0.50 → penalty = 0.95 → 5% reduction
  confidence = 0.30 → penalty = 0.97 → 3% reduction
  confidence = 0.60 → penalty = 0.96 → 4% reduction

Note: The maximum penalty is capped at 10% to prevent over-penalizing uncertain but potentially valuable data.

Scoring Profiles

Scoring profiles define the "recipe" for comparison. Each profile specifies:

Property	Type	Description
`name`	string	Profile identifier (e.g., "Mission Priority")
`method`	enum	Scoring algorithm: weighted_sum, topsis, ahp, custom
`attributes`	array	List of ScoringAttribute configurations
`is_default`	boolean	Whether this is the default profile for new comparisons

Attribute Configuration

{
  "attribute_name": "detection_range",  // Matches Extraction.label
  "display_name": "Detection Range",    // Human-readable label
  "weight": 0.30,                       // 30% of final score
  "normalize_method": "max",            // Higher is better
  "unit": "km",                         // Display unit
  "priority": 10                        // Sort order (lower = first)
}

Target Value Configuration

{
  "attribute_name": "operating_frequency",
  "display_name": "Operating Frequency",
  "weight": 0.15,
  "normalize_method": "target",
  "target_value": 9.5,                  // Optimal frequency in GHz
  "scale_min": 8.0,                     // Minimum expected
  "scale_max": 12.0,                    // Maximum expected
  "unit": "GHz"
}

Comparison Workflow

Step 1: Select Documents

Choose 2-4 documents with approved extractions. Only extractions withreview_status: "approved" are included in scoring calculations.

Step 2: Select Scoring Profile

Choose a pre-configured profile or create a custom one. The profile's attribute list determines which extraction labels are compared.

Step 3: Run Comparison

The scoring engine calculates normalized scores for each attribute, applies weights, and produces ranked results.

Step 4: Review Results

Results include final scores, rankings, and full score breakdowns showing how each attribute contributed to the final score.

Delta Analysis (Version Comparison)

Track changes across document versions by comparing extractions from different revisions of the same source material:

Delta Type	Trigger	Significance
Value Change	Numeric value differs by more than 5%	May indicate specification update
New Attribute	Attribute present in new version only	Added capability or data
Removed Attribute	Attribute present in old version only	Deprecated or removed capability
Confidence Change	Confidence score changed significantly	Data quality improvement or degradation

Delta comparisons are useful for:

Tracking specification changes across contract amendments
Validating data extraction consistency
Identifying document version drift

Radar Visualization

Comparison results can be visualized as radar (spider) charts, providing an intuitive view of system capabilities across multiple dimensions:

Axes — One axis per scoring attribute
Scale — 0-100 normalized scale for all axes
Overlay — Multiple systems displayed simultaneously
Area — Larger area indicates better overall performance

Radar visualizations are particularly effective for:

Quick visual comparison of 2-3 systems
Identifying strengths and weaknesses at a glance
Briefing presentations for decision makers

API Endpoints

Create Comparison

POST /api/comparisons/
Content-Type: application/json
Authorization: Bearer {token}

{
  "name": "Q1 Radar Comparison",
  "documentIds": [1, 2, 3],
  "profileId": 1,
  "createdBy": "analyst@example.com"
}

Response: 200 OK
{
  "id": 42,
  "name": "Q1 Radar Comparison",
  "profileId": 1,
  "status": "calculated",
  "results": [
    {
      "documentId": 3,
      "documentName": "EL_M_2084_Datasheet.pdf",
      "finalScore": 82.5,
      "rank": 1,
      "scoreBreakdown": {...},
      "rawValues": {...}
    },
    ...
  ]
}

List Comparisons

GET /api/comparisons/?status=calculated&limit=50
Authorization: Bearer {token}

Response: 200 OK
[
  {
    "id": 42,
    "name": "Q1 Radar Comparison",
    "profileId": 1,
    "documentIds": [1, 2, 3],
    "status": "calculated",
    "createdAt": "2026-03-12T10:00:00Z"
  },
  ...
]

Get Comparison Details

GET /api/comparisons/{comparison_id}
Authorization: Bearer {token}

Response: 200 OK
{
  "id": 42,
  "name": "Q1 Radar Comparison",
  "profileSnapshot": {
    "name": "Mission Priority",
    "method": "weighted_sum",
    "attributes": [...]
  },
  "results": [...]
}

Recalculate Comparison

POST /api/comparisons/{comparison_id}/recalculate
Authorization: Bearer {token}

Response: 200 OK
{
  "id": 42,
  "status": "calculated",
  "results": [...]  // Updated results
}

Delete Comparison

DELETE /api/comparisons/{comparison_id}
Authorization: Bearer {token}

Response: 200 OK
{
  "message": "Comparison deleted",
  "id": 42
}

Exporting Comparison Results

JSON Export

GET /api/comparisons/{comparison_id}/export?format=json
Authorization: Bearer {token}

Response: application/json
{
  "comparison": {
    "id": 42,
    "name": "Q1 Radar Comparison",
    "profileSnapshot": {...}
  },
  "results": [
    {
      "rank": 1,
      "documentName": "EL_M_2084_Datasheet.pdf",
      "finalScore": 82.5,
      "scoreBreakdown": {...},
      "rawValues": {...}
    },
    ...
  ]
}

CSV Export

GET /api/comparisons/{comparison_id}/export?format=csv
Authorization: Bearer {token}

Response: text/csv
Rank,Document,Final Score,Detection Range,Accuracy,Reliability,...
1,EL_M_2084_Datasheet.pdf,82.5,180 km,92%,10000 hrs,...
2,AN_TPQ-53_SpecSheet.pdf,76.2,150 km,95%,8000 hrs,...
3,GM200_MMIA.pdf,71.8,135 km,90%,7500 hrs,...

PowerPoint Briefing Export

GET /api/comparisons/{comparison_id}/export/pptx
Authorization: Bearer {token}

Response: application/vnd.openxmlformats-officedocument.presentationml.presentation

Slides included:
1. Title slide (comparison name, date, author)
2. Rankings summary with bar chart
3. Per-system score breakdown (one slide each)
4. Side-by-side specifications table
5. Methodology / audit trail slide

Time Savings Estimation

GET /api/comparisons/{comparison_id}/time-savings?sessions_per_year=50
Authorization: Bearer {token}

Response: 200 OK
{
  "comparisonId": 42,
  "documentCount": 3,
  "attributeCount": 7,
  "manualMinutes": 240,
  "aiMinutes": 45,
  "savedMinutes": 195,
  "savedPercent": 81.25,
  "annualSavedHours": 162.5,
  "annualSavedFteDays": 20.3,
  "m2521Compliant": true
}

Role-Based Permissions

Role	Create	View	Recalculate	Delete	Export
Observer	—	—	—	—	—
Analyst	—	✓	—	—	✓
Reviewer	✓	✓	✓	—	✓
Admin	✓	✓	✓	✓	✓

Audit Trail

Every comparison operation is logged with:

Profile Snapshot — Full scoring configuration preserved at comparison time
Raw Values — Original extracted values stored for verification
Score Breakdown — Step-by-step calculation transparency
User Attribution — Who created/recalculated the comparison
Timestamp — When each operation occurred

This audit trail ensures comparisons can be defended and reproduced, supporting M-25-21 requirements for AI-assisted decision documentation.

Best Practices

Profile Design

Ensure weights sum to 1.0 (100%) for meaningful final scores
Use TARGET normalization when optimal values are known (e.g., frequency bands)
Set scale_min/scale_max to bound normalization when data ranges are predictable
Document the rationale for weight distributions in profile descriptions

Data Quality

Only compare documents with approved extractions
Review low-confidence extractions before including in comparisons
Flag missing attributes for manual investigation
Use delta analysis to validate extraction consistency

Decision Documentation

Export comparison results before briefings for offline reference
Include profile snapshot in decision documentation
Archive comparisons rather than deleting for audit purposes
Use time savings metrics to demonstrate ROI to leadership

Extraction Workflow — How extraction data is generated
API Reference — Full endpoint documentation
Compliance — M-25-21 audit requirements
Architecture — System design overview