Install IREE pip packages, either from pip or by building from source:
pip install \ iree-compiler \ iree-runtime
torch-mlir, necessary for
compiling PyTorch models to a format IREE is able to execute:
pip install -f https://llvm.github.io/torch-mlir/package-index/ torch-mlir
iree_torch package makes it easy to compile PyTorch programs and
run them on IREE:
pip install git+https://github.com/iree-org/iree-torch.git
Running a model¶
Going from a loaded PyTorch model to one that's executing on IREE happens in four steps:
- Compile the model to MLIR
- Compile the MLIR to IREE VM flatbuffer
- Load the VM flatbuffer into IREE
- Execute the model via IREE
In the following steps, we'll be borrowing the model from
this BERT colab
and assuming it is available as
Compile the model to MLIR¶
First, we need to trace and compile our model to MLIR:
model = # ... the model we're compiling example_input = # ... an input to the model with the expected shape and dtype mlir = torch_mlir.compile( model, example_input, output_type=torch_mlir.OutputType.LINALG_ON_TENSORS, use_tracing=True)
The full list of available output types can be found here and includes linalg on tensors, mhlo, and tosa.
Compile the MLIR to an IREE VM flatbuffer¶
Next, we compile the resulting MLIR to IREE's deployable file format:
iree_backend = "llvm-cpu" iree_vmfb = iree_torch.compile_to_vmfb(mlir, iree_backend)
Here we have a choice of backend we want to target. See the Deployment Configurations section of this site for a full list of targets and configurations.
The generated flatbuffer can now be serialized and stored for another time or loaded and executed immediately.
Load the VM flatbuffer into IREE¶
Next, we load the flatbuffer into the IREE runtime.
iree_torch provides a
convenience method for loading this flatbuffer from Python:
invoker = iree_torch.load_vmfb(iree_vmfb, iree_backend)
Execute the model via IREE¶
Finally, we can execute the loaded model:
result = invoker.forward(example_input)
Training with PyTorch in IREE is supported via
functorch. The steps for
loading the model into IREE, once defined, are nearly identical to the above
You can find a full end-to-end example of defining a basic regression model, training with it, and running inference on it here.
Native / On-device Training¶
A small (~100-250KB), self-contained binary can be built for deploying to resource-constrained environments. An example illustrating this can be found in this example. This binary runs a model without a Python interpreter.
|Inference on BERT|
|Basic Inference and Training Example|
|Native On-device Training Example|