Deployment and Optimization of DeepSeek-OCR
High-Performance GPU Inference Setup | CUDA 11.8 | vLLM 0.8.5
DeepSeek OCR is a powerful AI model that reads and understands text from images and documents with high accuracy. It works on GPUs using vLLM, allowing very fast processing of scanned PDFs, invoices.
Minimum Requirements for DeepSeek-OCR
Hardware Requirements
GPU
NVIDIA A30 (16–24GB VRAM)
A100 / RTX 6000 Ada (40–80GB VRAM)
CUDA Compute Capability
≥ 7.0
≥ 8.0
VRAM
16GB
96GB+
System RAM
64GB
128GB
Storage
40GB free
60GB+
Software Requirements
Ubuntu 22.04 / 24.04
NVIDIA Driver ≥ 520
CUDA Toolkit 11.8
Python 3.10
PyTorch 2.6.0 (CU118)
vLLM 0.8.5 (cu118 build)
Conda environment (Miniconda)
Verify GPU
Checks if your GPU is properly detected by the OS and NVIDIA drivers.

If GPU details appear : continue. If not : install NVIDIA drivers first.
Install Miniconda
Download and install Conda, which isolates dependencies for DeepSeek-OCR.
Downloads the Miniconda installer and saves it as miniconda.sh
Installs Miniconda silently (-b) into your home directory
Activates Conda so that conda command works in your terminal
Create Conda Environment
Run these:
Re-create environment
Creates an isolated environment to install DeepSeek OCR and its dependencies safely.

Install PyTorch (CUDA 11.8 build)
Installs GPU-enabled PyTorch 2.6.0 that matches CUDA 11.8.
Verify PyTorch GPU access
Checks if PyTorch can detect and use your GPU properly.

Clone DeepSeek-OCR and install requirements
Downloads the DeepSeek-OCR code and installs required Python libraries.
After Cloning DeepSeek-OCR You Will See These Files

Install CUDA Toolkit 11.8 (Required for FlashAttention)
Download installer

Install toolkit only (not the driver)
Installs CUDA 11.8 compiler tools required to build FlashAttention.
This will take about 5-10 minutes to download and install.
Verify installation
Set environment variables (put CUDA 11.8 FIRST in PATH)
Verify the correct nvcc is now found which nvcc

Install GCC-11 (required for CUDA 11.8 build tools)
Set GCC-11 as the compiler for this session
Install FlashAttention

This should work now. The compilation will take 5-10 minutes.
Make CUDA & GCC Paths Permanent in Conda Env
Ensures correct CUDA & GCC versions auto-load every time you activate the environment.
Install vLLM 0.8.5 (CUDA 11.8 wheel)
(If you already installed the correct vllm, this will just confirm/force it.)
Installs the version of vLLM that works correctly with CUDA 11.8 and PyTorch 2.6.

Prepare input / output paths and a sample image
Creates directories to store images and OCR output.
Now open the config file and set correct input/output paths

Save and exit:
CTRL + O Enter
CTRL + X Exit
Run DeepSeek-OCR
Successfully ran DeepSeek-OCR model.
Last updated