# SCAN: dify
date: 2026-02-12 | program: Dify | repo: https://github.com/langgenius/dify | bounty: $500-$1500

## summary
raw_findings: 7 (manual review, semgrep/trufflehog blocked by sandbox) | real: 2 | high_conf: 0 | med_conf: 2 | low_conf: 2 | false_pos: 3

## findings

### F1: SSRF via remote file endpoints when SSRF proxy not configured
severity: high | confidence: medium | type: SSRF | cwe: CWE-918
file: /Users/sebas/Code/bug-bounty/data/repos/dify/api/controllers/console/remote_files.py:31 | tool: manual

```python
# Console endpoint - accepts any URL from path
@console_ns.route("/remote-files/<path:url>")
class GetRemoteFileInfo(Resource):
    @login_required
    def get(self, url: str):
        decoded_url = urllib.parse.unquote(url)
        resp = ssrf_proxy.head(decoded_url)
        if resp.status_code != httpx.codes.OK:
            resp = ssrf_proxy.get(decoded_url, timeout=3)
```

```python
# ssrf_proxy.py - falls back to plain httpx.Client when no proxy configured
def _build_ssrf_client(verify: bool) -> httpx.Client:
    if dify_config.SSRF_PROXY_ALL_URL:
        return httpx.Client(proxy=dify_config.SSRF_PROXY_ALL_URL, ...)
    if dify_config.SSRF_PROXY_HTTP_URL and dify_config.SSRF_PROXY_HTTPS_URL:
        return httpx.Client(mounts=_create_proxy_mounts(), ...)
    # NO PROXY - no IP validation at all
    return httpx.Client(verify=verify, limits=_SSRF_CLIENT_LIMITS)
```

analysis: The SSRF protection in Dify relies entirely on an external Squid proxy (configured via SSRF_PROXY_*_URL env vars). When these are not configured -- common in self-hosted deployments -- the ssrf_proxy module creates a plain httpx.Client with zero IP/host restrictions. An authenticated user can make the server request any internal URL (cloud metadata endpoints, internal services, etc.) via endpoints like `/console/api/remote-files/<url>`, `/console/api/remote-files/upload`, the HTTP Request workflow node, external knowledge API connections, and DSL import via URL. The same pattern exists in the web API at `/api/remote-files/<path:url>` (requires app token). Multiple entry points, same root cause.

attack_vector: `GET /console/api/remote-files/http%3A%2F%2F169.254.169.254%2Flatest%2Fmeta-data%2F` (authenticated console user). Or via workflow HTTP Request node targeting `http://internal-service:port/admin`.

impact: Access to cloud metadata (AWS IAM credentials), internal service enumeration, data exfiltration from internal network. On cloud deployments, could escalate to full infrastructure compromise via IMDS credentials.

recommendation: INVESTIGATE - This is a known design pattern (proxy-dependent SSRF protection). The question is whether huntr considers self-hosted misconfigurations in scope. High duplicate risk since this pattern is visible and has likely been reported. The lack of application-level IP validation is the actual bug.

---

### F2: Unsandboxed Jinja2 template execution in code executor
severity: medium | confidence: medium | type: Code Injection | cwe: CWE-94
file: /Users/sebas/Code/bug-bounty/data/repos/dify/api/core/helper/code_executor/jinja2/jinja2_transformer.py:47 | tool: manual

```python
# jinja2_transformer.py - uses plain jinja2.Template, NOT SandboxedEnvironment
def main(**inputs):
    template_code = b64decode('{cls._template_b64_placeholder}').decode('utf-8')
    template = jinja2.Template(template_code)  # UNSANDBOXED
    return template.render(**inputs)
```

```python
# Compare with email rendering which DOES use sandbox:
# libs/email_template_renderer.py
class SandboxedEnvironment(ImmutableSandboxedEnvironment):
    ...
```

analysis: Jinja2 templates in workflow template-transform nodes and advanced prompts are rendered using `jinja2.Template()` directly (not `SandboxedEnvironment`). This allows Jinja2 SSTI payloads to execute arbitrary Python code. HOWEVER, this code runs inside an external sandbox service (`CODE_EXECUTION_ENDPOINT`), not on the Dify API server itself. The security boundary depends entirely on the sandbox service's isolation quality (likely a Docker container with network enabled: `"enable_network": True`). A successful sandbox escape would be critical, but the attack surface is limited to authenticated workflow creators.

attack_vector: Create a workflow with a Template Transform node containing SSTI payload. This executes within the sandbox service.

impact: Code execution within the sandbox environment. If the sandbox has network access (code shows `enable_network: True`), could be used for lateral movement or data exfiltration from the sandbox's network perspective.

recommendation: INVESTIGATE - Need to evaluate the sandbox service isolation. The `enable_network: True` flag is concerning. If the sandbox shares network with internal services, this becomes a stepping stone. Moderate duplicate risk.

---

### F3: subprocess.getstatusoutput in helper.py
severity: low | confidence: low | type: Command Injection | cwe: CWE-78
file: /Users/sebas/Code/bug-bounty/data/repos/dify/api/libs/helper.py:92 | tool: manual

```python
def run(script):
    return subprocess.getstatusoutput("source /root/.bashrc && " + script)
```

analysis: The `run()` function concatenates user input directly into a shell command. However, searching the entire codebase for callers of `helper.run()` or `from libs.helper import run` yields ZERO results. This function appears to be dead code -- never called from any controller, service, or task. No reachable path from any HTTP endpoint.

attack_vector: None found -- function appears unreachable.

impact: If reachable, full RCE. Currently dead code.

recommendation: SKIP - Dead code, no attack path. Not reportable.

---

### F4: Unsafe deserialization on embedding data from database
severity: low | confidence: low | type: Deserialization | cwe: CWE-502
file: /Users/sebas/Code/bug-bounty/data/repos/dify/api/models/dataset.py:1128 | tool: manual

```python
def get_embedding(self) -> list[float]:
    return cast(list[float], unsafe_deserialize(self.embedding))
```

analysis: Unsafe deserialization is used on embedding data from the database. The data is always written by `set_embedding()` which serializes a `list[float]` that comes from the embedding model provider, not from user input. An attacker would need direct database write access to exploit this. If they have DB access, they already have full control.

attack_vector: Requires direct database write access to inject malicious serialized payload into the `embedding` column.

impact: RCE if attacker has DB write access (which implies they already have full control).

recommendation: SKIP - Requires pre-existing DB compromise. Defense-in-depth concern only.

---

## skipped
| file:line | rule/pattern | reason |
|-----------|-------------|--------|
| api/libs/helper.py:92 | subprocess.getstatusoutput | Dead code, `run()` never called from any reachable path |
| api/models/dataset.py:1128 | unsafe deserialization | Data source is server-generated embeddings, not user input; requires DB compromise |
| api/services/plugin/plugin_migration.py:204 | SQL f-string interpolation | table/column names are hardcoded constants, not user input |
| api/controllers/console/app/statistic.py:59 | convert_datetime_to_date in SQL | timezone uses `:tz` parameter binding, not string interpolation; account.timezone validated against IANA set |
| api/services/website_service.py:226 | httpx.get without ssrf_proxy | URLs are hardcoded to jina.ai/firecrawl external APIs, not user-controlled |
| api/configs/remote_settings_sources/apollo/python_3x.py:27 | urllib.request.urlopen | Server config fetching, URL from env vars not user input |
| api/core/helper/code_executor/*.py | Code execution | Delegated to external sandbox service, not running on API server |
| yaml.safe_load usage (multiple files) | YAML deserialization | All instances use safe_load, not unsafe yaml.load |
| all ORM queries | SQL injection | Using SQLAlchemy ORM .filter_by()/.where() with parameterized queries |
| Jinja2 in email rendering | Template injection | Uses ImmutableSandboxedEnvironment |
| File upload/download | Path traversal | UUID-based storage keys, HMAC-signed URLs, basename sanitization |

## notes

- Semgrep and trufflehog could not be run due to sandbox permissions. Findings are from manual code review only.
- The SSRF finding (F1) has HIGH duplicate risk -- this is a well-known pattern in Dify and the proxy-based architecture is documented.
- The Jinja2 finding (F2) depends on sandbox service isolation quality, which requires dynamic testing to fully evaluate.
- Dify's security architecture relies heavily on the external sandbox service for code execution and the Squid proxy for SSRF prevention. Both are deployment-dependent.
- The codebase shows good security practices in many areas: parameterized SQL queries, HMAC-signed file URLs, HTML content-disposition enforcement, YAML safe_load, rate limiting, and input validation on most endpoints.