USENIX Security 2024 · Distinguished Artifact Award

// USENIX Security 2024 Paper

PentestGPT: Evaluating and Harnessing Large Language Models
for Automated Penetration Testing

// About USENIX Security

USENIX Security is one of the "Big 4" premier academic conferences in computer security, alongside IEEE S&P (Oakland), ACM CCS, and NDSS.

It represents the highest tier of security research venues and attracts the world's leading researchers and practitioners in cybersecurity.

// Abstract

Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. We explore the potential of Large Language Models (LLMs) to revolutionize this field.

We establish a comprehensive benchmark using real-world penetration testing targets, encompassing both vulnerable machines and CTF challenges. Our findings reveal that while LLMs demonstrate proficiency in specific sub-tasks—such as using testing tools, interpreting outputs, and proposing subsequent actions—they encounter difficulties maintaining an overall testing context.

To address these limitations, we introduce PENTESTGPT, an LLM-empowered automated penetration testing framework featuring three self-interacting modules. Our evaluation shows that PENTESTGPT achieves a 228.6% performance increase compared to GPT-3.5 and proves effective in real-world scenarios.

// System Architecture

architecture.txt

┌───────────────────────────────────────────────────┐
│                    PENTESTGPT                     │
├───────────────────────────────────────────────────┤
│                                                   │
│   ┌───────────┐  ┌───────────┐  ┌───────────┐   │
│   │ Reasoning │  │   Tool    │  │  Parsing  │   │
│   │  Module   │◄─►│  Module   │◄─►│  Module   │   │
│   └───────────┘  └───────────┘  └───────────┘   │
│         ▲              ▲              ▲          │
│         └──────────────┼──────────────┘          │
│                        ▼                         │
│               ┌──────────────┐                   │
│               │ Context Pool │                   │
│               └──────────────┘                   │
│                                                   │
└───────────────────────────────────────────────────┘

Reasoning Module

Generates testing strategies and makes decisions based on current context

Tool Module

Executes penetration testing tools and manages command interactions

Parsing Module

Interprets tool outputs and extracts actionable information

// Performance Results

Model Task Completion Improvement

GPT-3.5 (baseline) 35% —

GPT-4 47% +34.3%

// Distinguished Artifact Award

PentestGPT received the prestigious Distinguished Artifact Award at USENIX Security 2024.

This award recognizes outstanding research artifacts that are publicly available, well-documented, and demonstrate exceptional quality and reproducibility.

5 of 142 artifacts awarded

0.23% of all submissions

✓ Source verified

// Citation

citation.bib

@inproceedings{deng2024pentestgpt,
  title     = {PentestGPT: Evaluating and Harnessing Large Language
               Models for Automated Penetration Testing},
  author    = {Deng, Gelei and Liu, Yi and Mayoral-Vilches, Víctor and
               Liu, Peng and Li, Yuekang and Xu, Yuan and Zhang, Tianwei
               and Liu, Yang and Pinzger, Martin and Rass, Stefan},
  booktitle = {33rd USENIX Security Symposium},
  year      = {2024},
  address   = {Philadelphia, PA},
  publisher = {USENIX Association}
}

Read the Full Paper

Access our complete research on USENIX Security 2024.

View on USENIX →