🌿 Parsely

Herbarium Specimen Digitization Platform

Parsely Core + Studio help herbaria, museums, and researchers digitize large volumes of specimen labels with the latest AI models in a clean, intuitive workflow.

🔗 Live demo (pre-alpha): parselystudio.com

⚠️ This demo is a pre-alpha release. Features are incomplete and downtime is expected. For stable local use, see the Getting Started section below.

✨ What it does

Given a set of specimen label images, Parsely can:

Preprocess images — crop, deskew, and auto-rotate to prepare for AI extraction.
Run OCR — call Google Vision OCR (or other engines) to extract text from images.
Extract structured data — send OCR + images to an LLM via OpenRouter (currently Gemini 2.5 Pro) to parse into specimen fields (e.g., catalog number, taxon, collector) according to Darwin Core schema.
Edit + review — provide a simple web UI for curators to view images, edit predictions, and export results to CSV.

🚀 Getting Started

1. Prerequisites

Make sure the following are installed on your system:

Python 3.11
Node.js 20 and npm
System packages required by OpenCV and HEIF support. On Debian/Ubuntu:
```
sudo apt-get install -y libgl1 libglib2.0-0 libheif1 libde265-0
```

If you don’t have Poetry yet:

pip3 install poetry

2. Clone and install

git clone https://github.com/<your-user>/herbarium-processor.git
cd herbarium-processor
poetry install
cd src/herbarium_processor/web/frontend
npm ci
cd -

3. Configure environment

Create a .env file in the project root with:

OPENROUTER_API_KEY=your_openrouter_key_here
GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"

Optional: install pre-commit hooks (we use this to strip notebook metadata):

poetry run pre-commit install

🖥️ Usage

Option A: Web App

Start the server:
```
poetry run dev
```
Open the frontend at http://localhost:5173/
The API server runs at http://localhost:8000/
Upload images → edit predictions → finalize CSV.
Processed files are stored in /tmp.

Option B: Notebook

Open notebooks/herbarium_processor.ipynb.
Point it to a directory of images (img/bucket).

📜 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE for details. See NOTICE and COPYRIGHT for attribution and trademark information.

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github		.github
.vscode		.vscode
data		data
notebooks		notebooks
prompts		prompts
scripts		scripts
src/herbarium_processor		src/herbarium_processor
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
COPYRIGHT		COPYRIGHT
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
hypercorn.toml		hypercorn.toml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
robots.txt		robots.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌿 Parsely

✨ What it does

🚀 Getting Started

1. Prerequisites

2. Clone and install

3. Configure environment

🖥️ Usage

Option A: Web App

Option B: Notebook

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌿 Parsely

✨ What it does

🚀 Getting Started

1. Prerequisites

2. Clone and install

3. Configure environment

🖥️ Usage

Option A: Web App

Option B: Notebook

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages