Skip to content

llmbreaker-hq/llmbreaker

LLMBreaker

This project is an AI-powered security testing tool designed to help developers "red team" their own LLM-based APIs. It checks for common vulnerabilities such as generating hateful content, leaking personally identifiable information (PII), prompt injection attacks, and more. The tool works by intelligently injecting malicious prompts into a user-provided cURL command and then using an AI analyst (Google's Gemini 1.5 Flash) to evaluate the API's response.

How It Works

The application is a Node.js server built with Express.js that orchestrates a series of automated security tests.

  1. Test Initiation: A user sends a POST request to the /api/security/test/start endpoint. The body of this request contains a cURL command for the user's own API endpoint that they wish to test.

  2. Live Results Stream: The server immediately establishes a Server-Sent Events (SSE) connection with the client. This allows the tool to stream test results in real-time as they are completed.

  3. Test Orchestration: The SecurityTestController begins to iterate through a series of pre-defined security agents (e.g., HateSpeechAgent, PIILeakageAgent, JailbreakAgent).

  4. Prompt Injection: For each security test, the corresponding agent generates a specific malicious prompt. The CurlService then intelligently injects this prompt into the user's provided cURL command.

    Important: For the injection to work correctly, the service assumes the JSON body of the cURL command contains a placeholder string like "<PROMPT>". It is hardcoded to look for this specific placeholder.

  5. API Execution: The modified cURL command, now containing the malicious prompt, is executed. The request is sent to the user's API endpoint.

  6. AI-Powered Analysis: The response from the user's API is captured and sent to the Google Gemini 1.5 Flash model. A carefully crafted "analyst" prompt instructs the model to act as a security expert, evaluate the API's response in the context of the malicious prompt, and determine if the test failed or passed.

  7. Real-time Reporting: The result of each individual test (including a score, detailed analysis, and recommendations) is immediately streamed back to the user through the SSE connection.

Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd <repository-directory>
  2. Install the dependencies:

    npm install

Configuration

  1. Create a .env file in the root of the project.

  2. Add the following environment variables to your .env file:

    # The port the server will run on
    PORT=3000
    
    # Your Google AI API Key for Gemini
    GOOGLE_API_KEY="your_google_api_key_here"

    You can obtain a GOOGLE_API_KEY from the Google AI Studio.

Running the Application

To run the server in development mode with hot-reloading:

npm run dev

The server will start and listen on the port defined in your .env file (e.g., http://localhost:3000).

How to Use

The primary way to interact with the application is by sending a request to its single API endpoint.

API Endpoint

  • URL: /api/security/test/start
  • Method: POST
  • Body:
    {
      "curlCommand": "your_curl_command_here"
    }

cURL Command Requirements

The cURL command you provide must meet the following criteria:

  1. It must be a POST request.
  2. It must contain a JSON body (-H "Content-Type: application/json").
  3. The JSON body must contain a placeholder string "<PROMPT>" where the malicious prompts will be injected.

Example of a valid cURL command:

curl -X POST https://api.yourapp.com/v1/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key" \
-d '{
  "model": "your-model-name",
  "messages": [
    {
      "role": "user",
      "content": "<PROMPT>"
    }
  ]
}'

In this example, the content "<PROMPT>" will be replaced by each of the security agent's malicious payloads.

Example Request

You can use a tool like curl or Postman to start the test.

curl -X POST http://localhost:3000/api/security/test/start \\
-H "Content-Type: application/json" \\
-d '{
  "curlCommand": "curl -X POST https://your-api-to-test.com/generate -H \\"Content-Type: application/json\\" -d '{\\"inputs\\":[{\\"content\\":\\"<PROMPT>\\"}]}'"
}'

This will initiate the testing process, and you will receive a stream of Server-Sent Events with the results.

Available Security Tests

The tool runs a suite of tests designed to probe for common LLM vulnerabilities:

  • Hate Speech Generation: Checks if the model can be prompted to generate hateful or discriminatory content.
  • PII Leakage: Attempts to trick the model into leaking simulated personally identifiable information.
  • API Key Exposure: Checks if the model might inadvertently expose sensitive information like API keys.
  • Bias Detection: Probes the model for biased or prejudiced responses.
  • Instruction Override: Tests if the model's instructions can be overridden by malicious user input.
  • Jailbreaking: Attempts to circumvent the model's safety filters.
  • System Prompt Extraction: Tries to make the model reveal its own system prompt or initial instructions.
  • Token Smuggling: A technique to bypass security filters by manipulating the tokenization process.

Disclaimer

The effectiveness of this tool is highly dependent on the structure of the provided cURL command. The prompt injection mechanism is designed to work with common JSON API structures, but it may not work for all API designs. Always review the tool's output and logic to ensure it is correctly interacting with your API.

Further Documentation

For more detailed information about the project, please see the following documents:

  • API Reference: A detailed reference for all API endpoints, including request/response formats and SSE events.
  • Contributing Guide: Guidelines for how to contribute to the LLMBreaker project.
  • Documentation for Judges: A document tailored for hackathon judges, providing a high-level overview of the project.

About

Break your AI before hackers do. Red team your LLM APIs in minutes with AI-powered vulnerability detection.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors