Skip to content

PyTorch Integration

IREE supports compiling and running PyTorch programs represented as nn.Module classes as well as models defined using functorch.


Install IREE pip packages, either from pip or by building from source:

pip install \
  iree-compiler \

Install torch-mlir, necessary for compiling PyTorch models to a format IREE is able to execute:

pip install -f torch-mlir

A special iree_torch package makes it easy to compile PyTorch programs and run them on IREE:

pip install git+

Running a model

Going from a loaded PyTorch model to one that's executing on IREE happens in four steps:

  1. Compile the model to MLIR
  2. Compile the MLIR to IREE VM flatbuffer
  3. Load the VM flatbuffer into IREE
  4. Execute the model via IREE


In the following steps, we'll be borrowing the model from this BERT colab and assuming it is available as model.

Compile the model to MLIR

First, we need to trace and compile our model to MLIR:

model = # ... the model we're compiling
example_input = # ... an input to the model with the expected shape and dtype
mlir = torch_mlir.compile(

The full list of available output types can be found here and includes linalg on tensors, mhlo, and tosa.

Compile the MLIR to an IREE VM flatbuffer

Next, we compile the resulting MLIR to IREE's deployable file format:

iree_backend = "llvm-cpu"
iree_vmfb = iree_torch.compile_to_vmfb(mlir, iree_backend)

Here we have a choice of backend we want to target. See the Deployment Configurations section of this site for a full list of targets and configurations.

The generated flatbuffer can now be serialized and stored for another time or loaded and executed immediately.

Load the VM flatbuffer into IREE

Next, we load the flatbuffer into the IREE runtime. iree_torch provides a convenience method for loading this flatbuffer from Python:

invoker = iree_torch.load_vmfb(iree_vmfb, iree_backend)

Execute the model via IREE

Finally, we can execute the loaded model:

result = invoker.forward(example_input)


Training with PyTorch in IREE is supported via functorch. The steps for loading the model into IREE, once defined, are nearly identical to the above example.

You can find a full end-to-end example of defining a basic regression model, training with it, and running inference on it here.

Native / On-device Training

A small (~100-250KB), self-contained binary can be built for deploying to resource-constrained environments. An example illustrating this can be found in this example. This binary runs a model without a Python interpreter.


Colab notebooks
Inference on BERT Open in Colab
Example scripts
Basic Inference and Training Example
Native On-device Training Example