Skip to content

Architecture

Overview

Docling Studio architecture{ width="700" }

Two services communicating via REST. The frontend is a Vue 3 SPA served by Nginx in production. The backend is a FastAPI app that wraps Docling's document conversion engine.

Zooming into the backend

The schema above shows the macro view. Inside the backend, the code follows a Hexagonal Architecture (ports & adapters) with strict layer boundaries:

┌──────────────────────────────────────────────────────┐
│                     Backend                           │
│                                                      │
│   ┌──────────┐                                       │
│   │   api/   │  ← HTTP (FastAPI routes, Pydantic)    │
│   └────┬─────┘                                       │
│        │ calls                                       │
│   ┌────▼─────┐                                       │
│   │services/ │  ← Use case orchestration             │
│   └──┬────┬──┘                                       │
│      │    │                                          │
│  ┌───▼──┐ ┌▼───────────┐                             │
│  │domain│ │persistence/ │                             │
│  │      │ │             │                             │
│  │bbox  │ │ SQLite CRUD │  ← Storage (your blue box) │
│  │parse │ │ file store  │                             │
│  └──────┘ └─────────────┘                             │
│  ↑ pure Python, no deps   ↑ aiosqlite               │
└──────────────────────────────────────────────────────┘

Dependencies flow inward: api → services → domain. The domain layer has zero knowledge of HTTP or database.

Backend — Hexagonal Architecture (ports & adapters)

The backend follows the hexagonal / ports-and-adapters pattern. The domain layer defines ports (abstract protocols in domain/ports.py); infra/ provides adapters that implement them. Dependencies flow inward: API → Services → Domain. The domain layer has zero knowledge of HTTP, database, or any framework.

document-parser/
├── main.py                   # FastAPI app, CORS, lifespan, health endpoint
├── domain/                   # Pure domain — no HTTP, no DB
│   ├── models.py             # Document, AnalysisJob dataclasses
│   ├── ports.py              # Abstract protocols (DocumentConverter, DocumentChunker)
│   ├── value_objects.py      # ConversionResult, ChunkingOptions, ChunkResult
│   └── bbox.py               # Bounding box coordinate normalization
├── api/                      # HTTP layer (FastAPI routers)
│   ├── schemas.py            # Pydantic DTOs (camelCase serialization)
│   ├── documents.py          # /api/documents endpoints
│   └── analyses.py           # /api/analyses endpoints (create, rechunk, delete)
├── persistence/              # Data layer (SQLite via aiosqlite)
│   ├── database.py           # Connection management, schema init
│   ├── document_repo.py      # Document CRUD
│   └── analysis_repo.py      # AnalysisJob CRUD
├── infra/                    # Infrastructure adapters
│   ├── settings.py           # Environment-based configuration
│   ├── local_converter.py    # In-process Docling converter (local mode)
│   ├── serve_converter.py    # HTTP client for Docling Serve (remote mode)
│   ├── local_chunker.py      # In-process chunking (HierarchicalChunker, HybridChunker)
│   ├── rate_limiter.py       # Sliding-window rate limiting middleware
│   └── bbox.py               # Bbox coordinate normalization helpers
├── services/                 # Use case orchestration
│   ├── document_service.py   # Upload, delete, preview
│   └── analysis_service.py   # Async Docling processing + chunking
└── tests/                    # pytest (199 tests)

Layer responsibilities

Layer Role Depends on
domain Dataclasses, value objects, abstract ports Nothing (pure Python)
persistence SQLite CRUD, aiosqlite domain (models)
infra Adapters: converters, chunker, rate limiter, settings domain (ports, value objects)
services Orchestrate use cases, call converters/chunkers domain + persistence + infra
api HTTP endpoints, Pydantic DTOs, error handling services

API contract

The API uses camelCase serialization (via Pydantic alias_generator), while the backend uses snake_case internally. The pages_json field contains raw dataclasses.asdict() output, so page data uses snake_case (page_number, not pageNumber).

API design rule — no UX-shaped routes (#269)

The backend exposes DDD-granular services, not screens. One route ≈ one domain operation: chunk CRUD lives under /api/documents/{id}/chunks/*, store CRUD under /api/stores/*, document versions under /api/documents/{id}/versions/*, and so on.

If a UI screen needs several calls to render or submit, the sequencing is done client-side in a Pinia store action (features/*/store.ts). The backend never grows a "screen-shaped" aggregate route just because it would shorten one frontend function.

Exceptions are limited to atomicity / transactional guarantees the client cannot achieve by chaining calls — e.g. POST .../chunks/{id}/split writes two new chunks + an audit row atomically; doing it client-side would race the audit log. When such a bundled operation is added, the service docstring must spell out the atomicity argument.

Reviewers: any new route under /api/* that doesn't fit one of these two buckets (single domain op, or named atomicity exception) is a red flag. See docs/design/269-backend-ddd-audit.md for the full classification of every route as of 0.6.1.

Frontend — Feature-Based

The frontend is organized by feature, each with its own store, API client, and UI components.

frontend/src/
├── app/                      # App shell, router, global styles
├── pages/                    # Route-level pages
│   ├── HomePage.vue
│   ├── StudioPage.vue        # PDF viewer + config + results
│   ├── DocumentsPage.vue
│   ├── HistoryPage.vue
│   └── SettingsPage.vue
├── features/                 # Feature modules
│   ├── analysis/             # Analysis store, API, bbox scaling, UI
│   │   ├── store.ts
│   │   ├── api.ts
│   │   ├── bboxScaling.ts    # Pure math: page coords → pixel coords
│   │   └── ui/
│   │       ├── BboxOverlay.vue
│   │       ├── AnalysisPanel.vue
│   │       ├── StructureViewer.vue
│   │       └── ...
│   ├── chunking/             # Chunk panel UI + rechunk action
│   ├── document/             # Document store, API, upload
│   ├── feature-flags/        # Feature flag store (reads /api/health)
│   ├── history/              # History store, navigation
│   └── settings/             # Theme, locale, API URL
└── shared/                   # Cross-feature utilities
    ├── types.ts              # All shared TypeScript interfaces
    ├── i18n.ts               # FR/EN translations
    ├── format.ts             # Date/size formatters
    └── api/http.ts           # HTTP client (fetch wrapper)

Data flow

User action → Pinia store action → API client (fetch) → Backend REST endpoint
Backend response → Pinia store state → Vue reactivity → UI update

Key design decisions

  • Pinia stores per feature, not global. Each feature owns its state.
  • TypeScript strict mode with shared interfaces in shared/types.ts.
  • No component library — custom CSS with CSS variables for theming.
  • vue-tsc in CI to catch type errors before merge.

Feature Flags

The frontend adapts its UI based on the backend's capabilities. On startup, the feature flag store fetches /api/health and reads the engine and deploymentMode fields.

Flag Condition Effect
chunking engine === 'local' Shows chunking options in the analysis panel
disclaimer deploymentMode === 'huggingface' Shows a disclaimer banner at the top of the app

This allows the same frontend build to work with both local and remote backends without conditional compilation.

Rate Limiting

The backend applies a sliding-window rate limiter as middleware:

  • 60 requests per 60 seconds per client IP
  • The /api/health endpoint is excluded
  • When the limit is exceeded, the API returns 429 Too Many Requests with a Retry-After header

Analysis Lifecycle

An analysis job follows this state machine:

PENDING → RUNNING → COMPLETED
                  → FAILED
Status Description
PENDING Job created, waiting for a processing slot
RUNNING Docling conversion in progress
COMPLETED Conversion finished — results available (markdown, HTML, pages, chunks)
FAILED Conversion error — error_message contains details

The backend limits parallel jobs via MAX_CONCURRENT_ANALYSES (default: 3) to avoid overloading the CPU during Docling processing.

Local vs Remote Mode

The backend supports two conversion engines, selected via the CONVERSION_ENGINE environment variable:

Local Remote
Engine In-process Docling (PyTorch) HTTP client to Docling Serve
Chunking Available (in-process) Not available
Docker image latest-local (~1.9 GB) latest-remote (~270 MB)
ML models Downloaded on first run (~400 MB) Managed by Docling Serve
CPU/RAM 4+ CPUs, 6+ GB RAM 2 CPUs, 2 GB RAM

The converter is selected at startup in main.py via _build_converter(). The chunker (_build_chunker()) is only instantiated in local mode — in remote mode, the chunking feature flag is disabled and the UI hides the chunking panel.

Health Endpoint

GET /api/health returns the backend status:

{
  "status": "ok",
  "engine": "local",
  "version": "0.3.0",
  "deploymentMode": "self-hosted"
}

The frontend uses this response to:

  1. Verify the backend is reachable
  2. Evaluate feature flags (chunking, disclaimer)
  3. Display the app version