# Deployment and Optimization of DeepSeek-OCR

DeepSeek OCR is a powerful AI model that reads and understands text from images and documents with high accuracy. It works on GPUs using vLLM, allowing very fast processing of scanned PDFs, invoices.

#### Minimum Requirements for DeepSeek-OCR

Hardware Requirements

| Component               | Minimum                    | Recommended                        |
| ----------------------- | -------------------------- | ---------------------------------- |
| GPU                     | NVIDIA  A30 (16–24GB VRAM) | A100 / RTX 6000 Ada (40–80GB VRAM) |
| CUDA Compute Capability | ≥ 7.0                      | ≥ 8.0                              |
| VRAM                    | 16GB                       | 96GB+                              |
| System RAM              | 64GB                       | 128GB                              |
| Storage                 | 40GB free                  | 60GB+                              |

#### **Software Requirements**

* Ubuntu **22.04 / 24.04**
* NVIDIA Driver **≥ 520**
* CUDA Toolkit **11.8**
* Python **3.10**
* PyTorch 2.6.0 (CU118)
* vLLM 0.8.5 (cu118 build)
* Conda environment (Miniconda)

#### Verify GPU

```
nvidia-smi
```

Checks if your GPU is properly detected by the OS and NVIDIA drivers.

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FWqWFfhpvNurrFrqZMVdy%2Funknown.png?alt=media&#x26;token=fc99f94e-1d85-475b-b201-6bc896380e13" alt=""><figcaption></figcaption></figure>

> If GPU details appear : continue.\
> If not : install NVIDIA drivers first.

#### Install Miniconda

Download and install Conda, which isolates dependencies for DeepSeek-OCR.

```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
```

Downloads the Miniconda installer and saves it as `miniconda.sh`

```
bash miniconda.sh -b -p $HOME/miniconda
```

Installs Miniconda silently (`-b`) into your home directory

```
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
```

Activates Conda so that `conda` command works in your terminal

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FTgr1gsZiuM1aGAT7HBPi%2Funknown.png?alt=media&#x26;token=258f50a1-fba9-4969-8ac1-96e3be4ea389" alt=""><figcaption></figcaption></figure>

#### Create Conda Environment

```
conda create -n deepseek-ocr python=3.10 -y
```

{% hint style="info" %}
If you get ToS messages
{% endhint %}

Run these:

```
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
```

Re-create environment

```
conda create -n deepseek-ocr python=3.10 -y
conda activate deepseek-ocr
```

Creates an isolated environment to install DeepSeek OCR and its dependencies safely.

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FYI8owMg9ZY6k6DvnQz60%2Funknown.png?alt=media&#x26;token=c71217c9-ec03-44c0-bf19-9eb0c765780f" alt=""><figcaption></figcaption></figure>

#### Install PyTorch (CUDA 11.8 build)

```
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
```

Installs GPU-enabled PyTorch 2.6.0 that matches CUDA 11.8.

Verify PyTorch GPU access

```
python - <<'PY'
import torch, sys
print("torch:", torch.__version__)
print("torch.cuda.runtime:", torch.version.cuda)
print("cuda available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("device:", torch.cuda.get_device_name(0))
PY
```

Checks if PyTorch can detect and use your GPU properly.

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FOrCQqui9k7LgyMj9DwCu%2Funknown.png?alt=media&#x26;token=4d164d0c-f915-4039-a54f-c84c15846f7b" alt=""><figcaption></figcaption></figure>

#### Clone DeepSeek-OCR and install requirements

```
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR
pip install -r requirements.txt
```

Downloads the DeepSeek-OCR code and installs required Python libraries.

#### After Cloning DeepSeek-OCR  You Will See These Files

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2F6FVk5qxw7CeXWOSLPTty%2Funknown.png?alt=media&#x26;token=dba2e1ea-d478-47fe-8fd1-7c13122b0a2e" alt=""><figcaption></figcaption></figure>

#### Install CUDA Toolkit 11.8 (Required for FlashAttention)

Download installer

```
// Download and install CUDA 11.8 Toolkit   
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
```

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FD0deTazE8KHjyRVib5iL%2Funknown.png?alt=media&#x26;token=30794198-cfbf-465b-8b9c-5c5b0035a69a" alt=""><figcaption></figcaption></figure>

Install toolkit only (not the driver)

```
sudo sh cuda_11.8.0_520.61.05_linux.run --toolkit --silent --override
```

Installs CUDA 11.8 compiler tools required to build FlashAttention.

This will take about 5-10 minutes to download and install.

&#x20;Verify installation

```
ls -la /usr/local/cuda-11.8/bin/nvcc
```

Set environment variables (put CUDA 11.8 FIRST in PATH)

```
 export CUDA_HOME=/usr/local/cuda-11.8 export PATH=/usr/local/cuda-11.8/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
```

Verify the correct nvcc is now found which nvcc

```
nvcc --version
```

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2Fvb5RgvnFiHDubQcoYu6k%2Funknown.png?alt=media&#x26;token=a84ba80e-fc1d-4858-bfd9-4aa831d3814e" alt=""><figcaption></figcaption></figure>

#### Install GCC-11 (required for CUDA 11.8 build tools)

```
sudo apt-get install -y gcc-11 g++-11
```

Set GCC-11 as the compiler for this session

```
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
```

```
// Verify
gcc-11 --version
g++-11 --version
```

```
pip install psutil
```

#### Install FlashAttention

```
pip install flash-attn --no-build-isolation
```

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FGU12SH1wwRXhBJwIo1LI%2Funknown.png?alt=media&#x26;token=98760522-76c5-4b26-b2df-5376775d6a2f" alt=""><figcaption></figcaption></figure>

This should work now. The compilation will take 5-10 minutes.

#### Make CUDA & GCC Paths Permanent in Conda Env

```
// After successful installation
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export CC=/usr/bin/gcc-11
export CXX=/usr/bin/g++-11
EOF
```

Ensures correct CUDA & GCC versions auto-load every time you activate the environment.

#### Install vLLM 0.8.5 (CUDA 11.8 wheel)

> (If you already installed the correct vllm, this will just confirm/force it.)

```
// prefer the cu118 wheel index; this installs vllm 0.8.5 built for cu118
pip install --upgrade "vllm==0.8.5" --extra-index-url https://wheels.vllm.ai/cu118

// quick check
python -c "import vllm, sys; print('vllm', getattr(vllm,'__version__',None))"
```

Installs the version of vLLM that works correctly with CUDA 11.8 and PyTorch 2.6.

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FKwrTx3hz9PpnS6SeLRpV%2Funknown.png?alt=media&#x26;token=9ce8cb89-6d23-4d3f-b4ce-f7753f4539ad" alt=""><figcaption></figcaption></figure>

### Prepare input / output paths and a sample image

```
mkdir -p ~/deepseek_input ~/deepseek_output
```

Creates directories to store images and OCR output.

Now open the config file and set correct input/output paths

```
nano DeepSeek-OCR-vllm/config.py
```

<figure><img src="https://1876135298-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FEC5NwtFshv6EATOemuUn%2Fuploads%2FAyDvczhhonyuKXjRTz74%2Funknown.png?alt=media&#x26;token=a03bb457-2e2a-4e07-9ed2-22d17f384abc" alt=""><figcaption></figcaption></figure>

Save and exit:

* **CTRL + O** Enter
* **CTRL + X**  Exit

#### **Run DeepSeek-OCR**

```
time python run_dpsk_ocr_image.py 
```

**Successfully ran DeepSeek-OCR model.**
