Deployment and Optimization of DeepSeek-OCR

High-Performance GPU Inference Setup | CUDA 11.8 | vLLM 0.8.5

DeepSeek OCR is a powerful AI model that reads and understands text from images and documents with high accuracy. It works on GPUs using vLLM, allowing very fast processing of scanned PDFs, invoices.

Minimum Requirements for DeepSeek-OCR

Hardware Requirements

Component

Minimum

Recommended

GPU

NVIDIA A30 (16–24GB VRAM)

A100 / RTX 6000 Ada (40–80GB VRAM)

CUDA Compute Capability

≥ 7.0

≥ 8.0

VRAM

16GB

96GB+

System RAM

64GB

128GB

Storage

40GB free

60GB+

Software Requirements

Ubuntu 22.04 / 24.04
NVIDIA Driver ≥ 520
CUDA Toolkit 11.8
Python 3.10
PyTorch 2.6.0 (CU118)
vLLM 0.8.5 (cu118 build)
Conda environment (Miniconda)

Verify GPU

nvidia-smi

Checks if your GPU is properly detected by the OS and NVIDIA drivers.

If GPU details appear : continue. If not : install NVIDIA drivers first.

Install Miniconda

Download and install Conda, which isolates dependencies for DeepSeek-OCR.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

Downloads the Miniconda installer and saves it as miniconda.sh

bash miniconda.sh -b -p $HOME/miniconda

Installs Miniconda silently (-b) into your home directory

eval "$($HOME/miniconda/bin/conda shell.bash hook)"

Activates Conda so that conda command works in your terminal

Create Conda Environment

conda create -n deepseek-ocr python=3.10 -y

If you get ToS messages

Run these:

conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

Re-create environment

conda create -n deepseek-ocr python=3.10 -y
conda activate deepseek-ocr

Creates an isolated environment to install DeepSeek OCR and its dependencies safely.

Install PyTorch (CUDA 11.8 build)

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118

Installs GPU-enabled PyTorch 2.6.0 that matches CUDA 11.8.

Verify PyTorch GPU access

python - <<'PY'
import torch, sys
print("torch:", torch.__version__)
print("torch.cuda.runtime:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("device:", torch.cuda.get_device_name(0))
PY

Checks if PyTorch can detect and use your GPU properly.

Clone DeepSeek-OCR and install requirements

git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR
pip install -r requirements.txt

Downloads the DeepSeek-OCR code and installs required Python libraries.

After Cloning DeepSeek-OCR You Will See These Files

Install CUDA Toolkit 11.8 (Required for FlashAttention)

Download installer

// Download and install CUDA 11.8 Toolkit   
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

Install toolkit only (not the driver)

sudo sh cuda_11.8.0_520.61.05_linux.run --toolkit --silent --override

Installs CUDA 11.8 compiler tools required to build FlashAttention.

This will take about 5-10 minutes to download and install.

Verify installation

ls -la /usr/local/cuda-11.8/bin/nvcc

Set environment variables (put CUDA 11.8 FIRST in PATH)

 export CUDA_HOME=/usr/local/cuda-11.8 export PATH=/usr/local/cuda-11.8/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

Verify the correct nvcc is now found which nvcc

nvcc --version

Install GCC-11 (required for CUDA 11.8 build tools)

sudo apt-get install -y gcc-11 g++-11

Set GCC-11 as the compiler for this session

export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11

// Verify
gcc-11 --version
g++-11 --version

pip install psutil

Install FlashAttention

pip install flash-attn --no-build-isolation

This should work now. The compilation will take 5-10 minutes.

Make CUDA & GCC Paths Permanent in Conda Env

// After successful installation
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
EOF

Ensures correct CUDA & GCC versions auto-load every time you activate the environment.

Install vLLM 0.8.5 (CUDA 11.8 wheel)

(If you already installed the correct vllm, this will just confirm/force it.)

// prefer the cu118 wheel index; this installs vllm 0.8.5 built for cu118
pip install --upgrade "vllm==0.8.5" --extra-index-url https://wheels.vllm.ai/cu118

// quick check
python -c "import vllm, sys; print('vllm', getattr(vllm,'__version__',None))"

Installs the version of vLLM that works correctly with CUDA 11.8 and PyTorch 2.6.

Prepare input / output paths and a sample image

mkdir -p ~/deepseek_input ~/deepseek_output

Creates directories to store images and OCR output.

Now open the config file and set correct input/output paths

nano DeepSeek-OCR-vllm/config.py

Save and exit:

CTRL + O Enter
CTRL + X Exit

Run DeepSeek-OCR

time python run_dpsk_ocr_image.py

Successfully ran DeepSeek-OCR model.

PreviousSet up a new Portainer CE Server installation NextInter-Region Cloud Instance Migration

Last updated 2 months ago

hashtagMinimum Requirements for DeepSeek-OCR

hashtagSoftware Requirements

hashtagVerify GPU

hashtagInstall Miniconda

hashtagCreate Conda Environment

hashtagInstall PyTorch (CUDA 11.8 build)

hashtagClone DeepSeek-OCR and install requirements

hashtagAfter Cloning DeepSeek-OCR You Will See These Files

hashtagInstall CUDA Toolkit 11.8 (Required for FlashAttention)

hashtagInstall GCC-11 (required for CUDA 11.8 build tools)

hashtagInstall FlashAttention

hashtagMake CUDA & GCC Paths Permanent in Conda Env

hashtagInstall vLLM 0.8.5 (CUDA 11.8 wheel)

hashtagPrepare input / output paths and a sample image

hashtagRun DeepSeek-OCR