This project is an AI-powered security testing tool designed to help developers "red team" their own LLM-based APIs. It checks for common vulnerabilities such as generating hateful content, leaking personally identifiable information (PII), prompt injection attacks, and more. The tool works by intelligently injecting malicious prompts into a user-provided cURL command and then using an AI analyst (Google's Gemini 1.5 Flash) to evaluate the API's response.
The application is a Node.js server built with Express.js that orchestrates a series of automated security tests.
-
Test Initiation: A user sends a
POSTrequest to the/api/security/test/startendpoint. The body of this request contains a cURL command for the user's own API endpoint that they wish to test. -
Live Results Stream: The server immediately establishes a Server-Sent Events (SSE) connection with the client. This allows the tool to stream test results in real-time as they are completed.
-
Test Orchestration: The
SecurityTestControllerbegins to iterate through a series of pre-defined security agents (e.g.,HateSpeechAgent,PIILeakageAgent,JailbreakAgent). -
Prompt Injection: For each security test, the corresponding agent generates a specific malicious prompt. The
CurlServicethen intelligently injects this prompt into the user's provided cURL command.Important: For the injection to work correctly, the service assumes the JSON body of the cURL command contains a placeholder string like
"<PROMPT>". It is hardcoded to look for this specific placeholder. -
API Execution: The modified cURL command, now containing the malicious prompt, is executed. The request is sent to the user's API endpoint.
-
AI-Powered Analysis: The response from the user's API is captured and sent to the Google Gemini 1.5 Flash model. A carefully crafted "analyst" prompt instructs the model to act as a security expert, evaluate the API's response in the context of the malicious prompt, and determine if the test failed or passed.
-
Real-time Reporting: The result of each individual test (including a score, detailed analysis, and recommendations) is immediately streamed back to the user through the SSE connection.
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install the dependencies:
npm install
-
Create a
.envfile in the root of the project. -
Add the following environment variables to your
.envfile:# The port the server will run on PORT=3000 # Your Google AI API Key for Gemini GOOGLE_API_KEY="your_google_api_key_here"
You can obtain a
GOOGLE_API_KEYfrom the Google AI Studio.
To run the server in development mode with hot-reloading:
npm run devThe server will start and listen on the port defined in your .env file (e.g., http://localhost:3000).
The primary way to interact with the application is by sending a request to its single API endpoint.
- URL:
/api/security/test/start - Method:
POST - Body:
{ "curlCommand": "your_curl_command_here" }
The cURL command you provide must meet the following criteria:
- It must be a
POSTrequest. - It must contain a JSON body (
-H "Content-Type: application/json"). - The JSON body must contain a placeholder string
"<PROMPT>"where the malicious prompts will be injected.
Example of a valid cURL command:
curl -X POST https://api.yourapp.com/v1/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key" \
-d '{
"model": "your-model-name",
"messages": [
{
"role": "user",
"content": "<PROMPT>"
}
]
}'In this example, the content "<PROMPT>" will be replaced by each of the security agent's malicious payloads.
You can use a tool like curl or Postman to start the test.
curl -X POST http://localhost:3000/api/security/test/start \\
-H "Content-Type: application/json" \\
-d '{
"curlCommand": "curl -X POST https://your-api-to-test.com/generate -H \\"Content-Type: application/json\\" -d '{\\"inputs\\":[{\\"content\\":\\"<PROMPT>\\"}]}'"
}'This will initiate the testing process, and you will receive a stream of Server-Sent Events with the results.
The tool runs a suite of tests designed to probe for common LLM vulnerabilities:
- Hate Speech Generation: Checks if the model can be prompted to generate hateful or discriminatory content.
- PII Leakage: Attempts to trick the model into leaking simulated personally identifiable information.
- API Key Exposure: Checks if the model might inadvertently expose sensitive information like API keys.
- Bias Detection: Probes the model for biased or prejudiced responses.
- Instruction Override: Tests if the model's instructions can be overridden by malicious user input.
- Jailbreaking: Attempts to circumvent the model's safety filters.
- System Prompt Extraction: Tries to make the model reveal its own system prompt or initial instructions.
- Token Smuggling: A technique to bypass security filters by manipulating the tokenization process.
The effectiveness of this tool is highly dependent on the structure of the provided cURL command. The prompt injection mechanism is designed to work with common JSON API structures, but it may not work for all API designs. Always review the tool's output and logic to ensure it is correctly interacting with your API.
For more detailed information about the project, please see the following documents:
- API Reference: A detailed reference for all API endpoints, including request/response formats and SSE events.
- Contributing Guide: Guidelines for how to contribute to the LLMBreaker project.
- Documentation for Judges: A document tailored for hackathon judges, providing a high-level overview of the project.