Architecture¶
Overview¶
{ width="700" }
Two services communicating via REST. The frontend is a Vue 3 SPA served by Nginx in production. The backend is a FastAPI app that wraps Docling's document conversion engine.
Zooming into the backend¶
The schema above shows the macro view. Inside the backend, the code follows a Hexagonal Architecture (ports & adapters) with strict layer boundaries:
┌──────────────────────────────────────────────────────┐
│ Backend │
│ │
│ ┌──────────┐ │
│ │ api/ │ ← HTTP (FastAPI routes, Pydantic) │
│ └────┬─────┘ │
│ │ calls │
│ ┌────▼─────┐ │
│ │services/ │ ← Use case orchestration │
│ └──┬────┬──┘ │
│ │ │ │
│ ┌───▼──┐ ┌▼───────────┐ │
│ │domain│ │persistence/ │ │
│ │ │ │ │ │
│ │bbox │ │ SQLite CRUD │ ← Storage (your blue box) │
│ │parse │ │ file store │ │
│ └──────┘ └─────────────┘ │
│ ↑ pure Python, no deps ↑ aiosqlite │
└──────────────────────────────────────────────────────┘
Dependencies flow inward: api → services → domain. The domain layer has zero knowledge of HTTP or database.
Backend — Hexagonal Architecture (ports & adapters)¶
The backend follows the hexagonal / ports-and-adapters pattern. The domain layer defines ports (abstract protocols in domain/ports.py); infra/ provides adapters that implement them. Dependencies flow inward: API → Services → Domain. The domain layer has zero knowledge of HTTP, database, or any framework.
document-parser/
├── main.py # FastAPI app, CORS, lifespan, health endpoint
│
├── domain/ # Pure domain — no HTTP, no DB
│ ├── models.py # Document, AnalysisJob dataclasses
│ ├── ports.py # Abstract protocols (DocumentConverter, DocumentChunker)
│ ├── value_objects.py # ConversionResult, ChunkingOptions, ChunkResult
│ └── bbox.py # Bounding box coordinate normalization
│
├── api/ # HTTP layer (FastAPI routers)
│ ├── schemas.py # Pydantic DTOs (camelCase serialization)
│ ├── documents.py # /api/documents endpoints
│ └── analyses.py # /api/analyses endpoints (create, rechunk, delete)
│
├── persistence/ # Data layer (SQLite via aiosqlite)
│ ├── database.py # Connection management, schema init
│ ├── document_repo.py # Document CRUD
│ └── analysis_repo.py # AnalysisJob CRUD
│
├── infra/ # Infrastructure adapters
│ ├── settings.py # Environment-based configuration
│ ├── local_converter.py # In-process Docling converter (local mode)
│ ├── serve_converter.py # HTTP client for Docling Serve (remote mode)
│ ├── local_chunker.py # In-process chunking (HierarchicalChunker, HybridChunker)
│ ├── rate_limiter.py # Sliding-window rate limiting middleware
│ └── bbox.py # Bbox coordinate normalization helpers
│
├── services/ # Use case orchestration
│ ├── document_service.py # Upload, delete, preview
│ └── analysis_service.py # Async Docling processing + chunking
│
└── tests/ # pytest (199 tests)
Layer responsibilities¶
| Layer | Role | Depends on |
|---|---|---|
| domain | Dataclasses, value objects, abstract ports | Nothing (pure Python) |
| persistence | SQLite CRUD, aiosqlite | domain (models) |
| infra | Adapters: converters, chunker, rate limiter, settings | domain (ports, value objects) |
| services | Orchestrate use cases, call converters/chunkers | domain + persistence + infra |
| api | HTTP endpoints, Pydantic DTOs, error handling | services |
API contract¶
The API uses camelCase serialization (via Pydantic alias_generator), while the backend uses snake_case internally. The pages_json field contains raw dataclasses.asdict() output, so page data uses snake_case (page_number, not pageNumber).
API design rule — no UX-shaped routes (#269)¶
The backend exposes DDD-granular services, not screens. One route ≈ one domain operation: chunk CRUD lives under /api/documents/{id}/chunks/*, store CRUD under /api/stores/*, document versions under /api/documents/{id}/versions/*, and so on.
If a UI screen needs several calls to render or submit, the sequencing is done client-side in a Pinia store action (features/*/store.ts). The backend never grows a "screen-shaped" aggregate route just because it would shorten one frontend function.
Exceptions are limited to atomicity / transactional guarantees the client cannot achieve by chaining calls — e.g. POST .../chunks/{id}/split writes two new chunks + an audit row atomically; doing it client-side would race the audit log. When such a bundled operation is added, the service docstring must spell out the atomicity argument.
Reviewers: any new route under /api/* that doesn't fit one of these two buckets (single domain op, or named atomicity exception) is a red flag. See docs/design/269-backend-ddd-audit.md for the full classification of every route as of 0.6.1.
Frontend — Feature-Based¶
The frontend is organized by feature, each with its own store, API client, and UI components.
frontend/src/
├── app/ # App shell, router, global styles
├── pages/ # Route-level pages
│ ├── HomePage.vue
│ ├── StudioPage.vue # PDF viewer + config + results
│ ├── DocumentsPage.vue
│ ├── HistoryPage.vue
│ └── SettingsPage.vue
│
├── features/ # Feature modules
│ ├── analysis/ # Analysis store, API, bbox scaling, UI
│ │ ├── store.ts
│ │ ├── api.ts
│ │ ├── bboxScaling.ts # Pure math: page coords → pixel coords
│ │ └── ui/
│ │ ├── BboxOverlay.vue
│ │ ├── AnalysisPanel.vue
│ │ ├── StructureViewer.vue
│ │ └── ...
│ ├── chunking/ # Chunk panel UI + rechunk action
│ ├── document/ # Document store, API, upload
│ ├── feature-flags/ # Feature flag store (reads /api/health)
│ ├── history/ # History store, navigation
│ └── settings/ # Theme, locale, API URL
│
└── shared/ # Cross-feature utilities
├── types.ts # All shared TypeScript interfaces
├── i18n.ts # FR/EN translations
├── format.ts # Date/size formatters
└── api/http.ts # HTTP client (fetch wrapper)
Data flow¶
User action → Pinia store action → API client (fetch) → Backend REST endpoint
│
Backend response → Pinia store state → Vue reactivity → UI update
Key design decisions¶
- Pinia stores per feature, not global. Each feature owns its state.
- TypeScript strict mode with shared interfaces in
shared/types.ts. - No component library — custom CSS with CSS variables for theming.
- vue-tsc in CI to catch type errors before merge.
Feature Flags¶
The frontend adapts its UI based on the backend's capabilities. On startup, the feature flag store fetches /api/health and reads the engine and deploymentMode fields.
| Flag | Condition | Effect |
|---|---|---|
chunking |
engine === 'local' |
Shows chunking options in the analysis panel |
disclaimer |
deploymentMode === 'huggingface' |
Shows a disclaimer banner at the top of the app |
This allows the same frontend build to work with both local and remote backends without conditional compilation.
Rate Limiting¶
The backend applies a sliding-window rate limiter as middleware:
- 60 requests per 60 seconds per client IP
- The
/api/healthendpoint is excluded - When the limit is exceeded, the API returns
429 Too Many Requestswith aRetry-Afterheader
Analysis Lifecycle¶
An analysis job follows this state machine:
| Status | Description |
|---|---|
PENDING |
Job created, waiting for a processing slot |
RUNNING |
Docling conversion in progress |
COMPLETED |
Conversion finished — results available (markdown, HTML, pages, chunks) |
FAILED |
Conversion error — error_message contains details |
The backend limits parallel jobs via MAX_CONCURRENT_ANALYSES (default: 3) to avoid overloading the CPU during Docling processing.
Local vs Remote Mode¶
The backend supports two conversion engines, selected via the CONVERSION_ENGINE environment variable:
| Local | Remote | |
|---|---|---|
| Engine | In-process Docling (PyTorch) | HTTP client to Docling Serve |
| Chunking | Available (in-process) | Not available |
| Docker image | latest-local (~1.9 GB) |
latest-remote (~270 MB) |
| ML models | Downloaded on first run (~400 MB) | Managed by Docling Serve |
| CPU/RAM | 4+ CPUs, 6+ GB RAM | 2 CPUs, 2 GB RAM |
The converter is selected at startup in main.py via _build_converter(). The chunker (_build_chunker()) is only instantiated in local mode — in remote mode, the chunking feature flag is disabled and the UI hides the chunking panel.
Health Endpoint¶
GET /api/health returns the backend status:
The frontend uses this response to:
- Verify the backend is reachable
- Evaluate feature flags (chunking, disclaimer)
- Display the app version