Deployment and Optimization of DeepSeek-OCR

High-Performance GPU Inference Setup | CUDA 11.8 | vLLM 0.8.5

DeepSeek OCR is a powerful AI model that reads and understands text from images and documents with high accuracy. It works on GPUs using vLLM, allowing very fast processing of scanned PDFs, invoices.

Minimum Requirements for DeepSeek-OCR

Hardware Requirements

Component
Minimum
Recommended

GPU

NVIDIA A30 (16–24GB VRAM)

A100 / RTX 6000 Ada (40–80GB VRAM)

CUDA Compute Capability

≥ 7.0

≥ 8.0

VRAM

16GB

96GB+

System RAM

64GB

128GB

Storage

40GB free

60GB+

Software Requirements

  • Ubuntu 22.04 / 24.04

  • NVIDIA Driver ≥ 520

  • CUDA Toolkit 11.8

  • Python 3.10

  • PyTorch 2.6.0 (CU118)

  • vLLM 0.8.5 (cu118 build)

  • Conda environment (Miniconda)

Verify GPU

Checks if your GPU is properly detected by the OS and NVIDIA drivers.

If GPU details appear : continue. If not : install NVIDIA drivers first.

Install Miniconda

Download and install Conda, which isolates dependencies for DeepSeek-OCR.

Downloads the Miniconda installer and saves it as miniconda.sh

Installs Miniconda silently (-b) into your home directory

Activates Conda so that conda command works in your terminal

Create Conda Environment

If you get ToS messages

Run these:

Re-create environment

Creates an isolated environment to install DeepSeek OCR and its dependencies safely.

Install PyTorch (CUDA 11.8 build)

Installs GPU-enabled PyTorch 2.6.0 that matches CUDA 11.8.

Verify PyTorch GPU access

Checks if PyTorch can detect and use your GPU properly.

Clone DeepSeek-OCR and install requirements

Downloads the DeepSeek-OCR code and installs required Python libraries.

After Cloning DeepSeek-OCR You Will See These Files

Install CUDA Toolkit 11.8 (Required for FlashAttention)

Download installer

Install toolkit only (not the driver)

Installs CUDA 11.8 compiler tools required to build FlashAttention.

This will take about 5-10 minutes to download and install.

Verify installation

Set environment variables (put CUDA 11.8 FIRST in PATH)

Verify the correct nvcc is now found which nvcc

Install GCC-11 (required for CUDA 11.8 build tools)

Set GCC-11 as the compiler for this session

Install FlashAttention

This should work now. The compilation will take 5-10 minutes.

Make CUDA & GCC Paths Permanent in Conda Env

Ensures correct CUDA & GCC versions auto-load every time you activate the environment.

Install vLLM 0.8.5 (CUDA 11.8 wheel)

(If you already installed the correct vllm, this will just confirm/force it.)

Installs the version of vLLM that works correctly with CUDA 11.8 and PyTorch 2.6.

Prepare input / output paths and a sample image

Creates directories to store images and OCR output.

Now open the config file and set correct input/output paths

Save and exit:

  • CTRL + O Enter

  • CTRL + X Exit

Run DeepSeek-OCR

Successfully ran DeepSeek-OCR model.

Last updated