Skip to content

Latest commit

ย 

History

History
95 lines (59 loc) ยท 2.47 KB

File metadata and controls

95 lines (59 loc) ยท 2.47 KB

AI Generated Text Detection Using Adversarial Learning

This repository contains the implementation of our CS577 project, where we explore improving the robustness of AI-generated text detection by using adversarial learning, inspired by the RADAR framework.

Original paper reference: RADAR: Robust AI-text detection via adversarial learning


๐Ÿš€ Overview

Recent advancements in large language models (LLMs) have made it increasingly difficult to distinguish between machine-generated and human-written text. This project aims to:

  • Investigate the robustness of AI-text detectors under adversarial paraphrasing.
  • Develop a hybrid paraphrasing strategy combining backtranslation and neural paraphrasing.
  • Train a distilBERT-based detector to classify texts as AI-generated or human-written.

๐Ÿ“Œ Features

  • Hybrid paraphrasing pipeline: Combines multilingual backtranslation and lexical neural paraphrasing.
  • Adversarial training loop: Reinforcement learning-based paraphraser competes against a binary classifier.
  • Evaluation on real-world data: Includes a manually annotated dataset of LinkedIn posts.

โš™ Requirements

  • Python >= 3.9
  • PyTorch >= 2.6.0 with CUDA support
  • HuggingFace transformers & datasets
  • NLTK
  • Helsinki-NLP models

๐Ÿงฐ Datasets

Split Source Count
Training OpenWebText (filtered) 9,000
Validation OpenWebText (filtered) 1,000
Test LinkedIn posts 45

๐Ÿš€ Implementation Details

Hybrid Paraphrasing Pipeline

  • Backtranslation: English โ†’ French โ†’ English using Helsinki-NLP.
  • Neural paraphrasing: NLTK-based paraphraser to create more natural variations.

Model Architecture

  • Detector: distilbert-base-uncased binary classifier.
  • Paraphraser: t5-small fine-tuned with PPO.

Adversarial Training

  • Paraphraser generates samples to fool the detector.
  • Detector learns from these new samples to improve classification.

Evaluation Metrics

  • AUROC, Accuracy, F1, Precision, Recall.

๐Ÿ›  Usage

git clone https://github.com/Subangkar/cs577-project.git
cd cs577-project

python3 -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

pip install -r requirements.txt

Run training:

python radar.py

Run evaluation:

python radar_evaluate.py