Getting Started¶

Docker Compose (recommended)¶

git clone https://github.com/scub-france/Docling-Studio.git
cd Docling-Studio
docker compose up --build

cd document-parser
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

cd frontend
npm install
npm run dev

The frontend runs on http://localhost:3000 and proxies API calls to http://localhost:8000.

BackendFrontend

cd document-parser
pip install pytest pytest-asyncio httpx
pytest tests/ -v

cd frontend
npm run test:run

These options map directly to Docling's PdfPipelineOptions.

Option	Default	Description
`do_ocr`	`true`	OCR for scanned pages and embedded images
`do_table_structure`	`true`	Table detection and row/column reconstruction
`table_mode`	`accurate`	`accurate` (TableFormer) or `fast`
`do_code_enrichment`	`false`	Specialized OCR for code blocks
`do_formula_enrichment`	`false`	Math formula recognition (LaTeX output)
`do_picture_classification`	`false`	Classify images by type
`do_picture_description`	`false`	Generate image descriptions via VLM
`generate_picture_images`	`false`	Extract detected images as separate files
`generate_page_images`	`false`	Rasterize each page as an image
`images_scale`	`1.0`	Scale factor for generated images (0.1–10)

All configuration is done via environment variables:

Variable	Default	Description
`CORS_ORIGINS`	`http://localhost:3000,...`	CORS allowed origins
`UPLOAD_DIR`	`./uploads`	File storage directory
`DB_PATH`	`./data/docling_studio.db`	SQLite database path
`CONVERSION_TIMEOUT`	`600`	Max seconds per Docling conversion

Resource	Minimum	Recommended
Memory	6 GB	8 GB+
CPUs	4	8+

All Docker images are multi-arch (linux/amd64 + linux/arm64). No GPU required.