Skip to content

amd/IRON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

47 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

๐Ÿฆพ - IRON: Unlocking the Full Potential of NPUs - ๐Ÿฆพ

Discord Latest Release GitHub downloads Iron Tests PRs Welcome license: Apache Code style: black

IRONCLAD Logo

IRON is an open-source & close-to-metal Python API enabling fast and efficient execution on AMD Ryzenโ„ข AI NPUs. It relies on language bindings around the MLIR-AIE dialect.

The IRON Python API for Ryzenโ„ข AI NPUs is described in the following paper:

E. Hunhoff, J. Melber, K. Denolf, A. Bisca, S. Bayliss, S. Neuendorffer, J. Fifield, J. Lo, P. Vasireddy, P. James-Roxby, E. Keller. "Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface". In 33rd IEEE International Symposium On Field-Programmable Custom Computing Machines, May 2025.

๐ŸŽฏ Operator Dashboard

Section Description Datatype AIE2 AIE2P Status Design Example
Element-wise Add Element-wise addition kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/elementwise_add/
Element-wise Mul Element-wise multiplication kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/elementwise_mul/
GEMM General Matrix Multiplication kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/gemm/
GEMV General Matrix-Vector Multiplication kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/gemv/
GQA Grouped Query Attention kernel (Single pipeline) bfloat16 โœ“ ๐ŸŸข iron/operators/mha/
MHA Multi-Head Attention kernel & Grouped Query Attention bfloat16 โœ“ ๐ŸŸข iron/operators/mha/
RMSNorm RMSNorm kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/rms_norm/
RoPE Rotary Positional Embedding kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/rope/
SiLU Sigmoid Linear Unit activation kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/silu/
Softmax Softmax kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/softmax/
Weighted RMSNorm Weighted RMSNorm kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/rms_norm/
Copy Copy bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/mem_copy/
Transpose Transpose bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/transpose/
AXPY AXPY bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/axpy/
Reduction Reduction bfloat16 ๐ŸŸก
Dequant Dequant Q4NX from AWQ to bfloat16 bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/dequant/
RELU RELU bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/relu/
Leaky RELU (WIP) Leaky RELU kernel bfloat16 โœ“ โšช iron/operators/leaky_relu/
GELU GELU bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/gelu/
LayerNorm LayerNorm bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/layer_norm/
Convolution Convolution bfloat16 ๐ŸŸก
MaxPool MaxPool bfloat16 โšช
AveragePool AveragePool bfloat16 โšช
Tanh Tanh kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/tanh/
Sigmoid Sigmoid kernel bfloat16 โœ“ โœ“ ๐ŸŸข iron/operators/sigmoid/

Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.

๐Ÿ“Œ Legend

Status Meaning
๐ŸŸข Done
๐ŸŸก In Development
โšช Not Assigned

Installation (Linux)

These instructions will guide you through everything required for building and executing a program on the Ryzenโ„ข AI NPU, starting from a fresh bare-bones Ubuntu 24.04 or Ubuntu 24.10 install.

Initial Setup

Be sure you have the latest BIOS on your laptop or mini-PC that enables the NPU. See here.

If starting from Ubuntu 24.04 you may need to update the Linux kernel to 6.11+ by installing the Hardware Enablement (HWE) stack:

sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo reboot
  1. Install XDNAโ„ข Driver and XRT:

    Instructions from mlir-aie repository

  2. Install the packages needed for IRON and MLIR-AIE:

    # Python versions 3.10, 3.12 and 3.13 are currently supported by our wheels
    sudo apt install \
    build-essential clang clang-14 lld lld-14 python3-venv python3-pip
  3. Setup a virtual environment and activate it:

    python3 -m venv ironenv
    source ironenv/bin/activate
    python3 -m pip install --upgrade pip
  4. Source XRT (installed in step 1):

    source /opt/xilinx/xrt/setup.sh
  5. Install required Python packages (from requirements.txt):

    pip install -r requirements.txt
  6. To test your installation, you can try to build and run the example below:

    ./iron/operators/axpy/test.py

Building/Using & Testing Operators

All available operators can be found in iron/operators. These each contain:

  • op.py: The Python operator interface -- an easy access point to integrate operators into your project that prescribes how to compile the operator (build artifacts) and how to call it at runtime (buffer sizes, etc.)
  • design.py: The implementation of the operator's NPU code. Often references a kernel in aie_kernels for the compute core code and describes the data movement using ObjectFIFOs.
  • reference.py: A reference CPU implementation to validate the correctness of the NPU implementation.
  • test.py: An end-to-end test that instantiates and builds the operator, runs it and verifies its outputs against the reference.

NOTE: Be sure the XRT setup script has been sourced and the Python environment is activated: source /opt/xilinx/xrt/setup.sh source /path/to/ironenv/bin/activate

To build and test all the operators:

pytest iron/operators/ -m "not extensive"

To run the extensive test suite:

pytest iron/operators/

To run a specific operator's tests:

pytest iron/operators/axpy/

Git Hooks (Optional but Recommended)

To ensure your code passes CI linting checks before pushing, install the pre-push hook:

cp scripts/hooks/pre-push .git/hooks/pre-push
chmod +x .git/hooks/pre-push

The hook will run the same linting checks as CI:

  • License checks (reuse)
  • Python formatting (black)
  • C++ formatting (clang-format)

To bypass the hook if needed: git push --no-verify


Copyrightยฉ 2025 Advanced Micro Devices, Inc

About

Close-to-metal programming for AMD NPUs

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages