CVE-2024-0243
February 24, 2024
With the following crawler configuration:
```python
from bs4 import BeautifulSoup as Soup
url = "https://example.com"
loader = RecursiveUrlLoader(
url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()
```
An attacker in control of the contents of `https://example.com` could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though `prevent_outside=True`.
https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51
Resolved in https://github.com/langchain-ai/langchain/pull/15559
Affected Packages
langchain (PYTHON):
Affected version(s) >=0.0.1 <0.1.0Fix Suggestion:
Update to version 0.1.0Related Resources (7)
Do you need more information?
Contact UsCVSS v4
Base Score:
1
Attack Vector
LOCAL
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
HIGH
User Interaction
PASSIVE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
LOW
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
8.1
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
HIGH
Integrity
HIGH
Availability
HIGH
Weakness Type (CWE)
Server-Side Request Forgery (SSRF)
EPSS
Base Score:
0.07