The IREE compiler and IREE runtime both have their own C/C++ APIs. This page
introduces the available APIs and describes how to use them from your
applications.
Note
There are multiple ways to distribute and depend on C/C++ projects, each
with varying levels of portability, flexibility, and toolchain
compatibility. IREE aims to support common configurations and platforms.
The IREE compiler is structured as a monolithic shared object with a dynamic
plugin system allowing for extensions. The shared object exports symbols for
versioned API functions.
graph TD
accTitle: IREE compiler linkage model diagram
accDescr {
The libIREECompiler.so or IREECompiler.dll shared object contains pipelines,
target backends, and general passes as private implementation details.
Compiler plugins interface with the compiler shared object to extend it with
custom targets, dialects, etc.
Applications interface with the compiler shared object through the compiler
C API's exported symbols.
}
subgraph compiler[libIREECompiler.so / IREECompiler.dll]
pipelines("Pipelines
• Flow
• Stream
• etc.")
targets("Target backends
• llvm-cpu
• vulkan-spirv
• etc.")
passes("General passes
• Const eval
• DCE
• etc.")
end
plugins("Compiler plugins
• Custom targets
• Custom dialects
• etc.")
application(Your application)
compiler <-- "Plugin API<br>(static or dynamic linking)" --> plugins
compiler -. "Compiler C API<br>(exported symbols)" .-> application
API definitions can be found in the following locations:
The compiler API is centered around running pipelines to translate inputs to
artifacts. These are modeled via sessions, invocations, sources, and
outputs.
stateDiagram-v2
accTitle: IREE compiler session and invocation state diagram
accDescr {
Input files are opened (or buffers are wrapped) as sources in a session.
Sources are parsed into invocations, which run pipelines.
Output files are written (or buffers are mapped) for compilation artifacts.
Sessions can contain multiple sources and run multiple invocations.
}
direction LR
InputFile --> Source1 : open file
InputBuffer --> Source2 : wrap buffer
state Session {
Source1 --> Invocation1
Source2 --> Invocation2
Invocation1 --> Invocation1 : run pipeline
Invocation2 --> Invocation2 : run pipeline
}
Invocation1 --> Output1File : write file
Invocation1 --> Output1Buffer : map memory
Invocation2 --> Output2Buffer : map memory
A plugin extends the compiler with some combination of target backends,
options, passes, or pipelines. For documentation on compiler plugins, see
compiler/PluginAPI/README.md.
#include<iree/compiler/embedding_api.h>#include<iree/compiler/loader.h>intmain(intargc,char**argv){// Load the compiler library then initialize it.ireeCompilerLoadLibrary("libIREECompiler.so");ireeCompilerGlobalInitialize();// Create a session to track compiler state and set flags.iree_compiler_session_t*session=ireeCompilerSessionCreate();ireeCompilerSessionSetFlags(session,argc,argv);// Open a file as an input source to the compiler.iree_compiler_source_t*source=NULL;ireeCompilerSourceOpenFile(session,"input.mlir",&source);// Use an invocation to compile from the input source to one or more outputs.iree_compiler_invocation_t*inv=ireeCompilerInvocationCreate(session);ireeCompilerInvocationPipeline(inv,IREE_COMPILER_PIPELINE_STD);// Output the compiled artifact to a file.iree_compiler_output_t*output=NULL;ireeCompilerOutputOpenFile("output.vmfb",&output);ireeCompilerInvocationOutputVMBytecode(inv,output);// Cleanup state.ireeCompilerInvocationDestroy(inv);ireeCompilerOutputDestroy(output);ireeCompilerSourceDestroy(source);ireeCompilerSessionDestroy(session);ireeCompilerGlobalShutdown();}
The IREE runtime is structured as a modular set of library components. Each
component is designed to be linked into applications directly and compiled
with LTO style optimizations.
The low level library components can be used directly or through a higher level
API.
The high level 'runtime' API sits on top of the low level components. It is
relatively terse but does not expose the full flexibility of the underlying
systems.
graph TD
accTitle: IREE runtime high level API diagram
accDescr {
The IREE runtime includes 'base', 'HAL', and 'VM' components, each with
their own types and API methods.
A high level "runtime API" sits on top of these component APIs.
Applications can interface indirectly with the IREE runtime via this
high level runtime API.
}
subgraph iree_runtime[IREE Runtime]
subgraph base
base_types("Types
• allocator
• status
• etc.")
end
subgraph hal[HAL]
hal_types("Types
• buffer
• device
• etc.")
hal_drivers("Drivers
• local-*
• vulkan
• etc.")
end
subgraph vm[VM]
vm_types("Types
• context
• invocation
• etc.")
end
runtime_api("Runtime API
• instance
• session
• call")
base_types & hal_types & hal_drivers & vm_types --> runtime_api
end
application(Your application)
runtime_api --> application
Each runtime component has its own low level API. The low level APIs are
typically verbose as they expose the full flexibility of each underlying
system.
graph TD
accTitle: IREE runtime low level API diagram
accDescr {
The IREE runtime includes 'base', 'HAL', and 'VM' components, each with
their own types and API methods.
Applications can interface directly with the IREE runtime via the low
level component APIs.
}
subgraph iree_runtime[IREE Runtime]
subgraph base
base_types("Types
• allocator
• status
• etc.")
end
subgraph hal[HAL]
hal_types("Types
• buffer
• device
• etc.")
hal_drivers("Drivers
• local-*
• vulkan
• etc.")
end
subgraph vm[VM]
vm_types("Types
• context
• invocation
• etc.")
end
end
application(Your application)
base_types & hal_types & hal_drivers & vm_types --> application
Runtime API header files are organized by component:
IREE uses its own Virtual Machine (VM) at runtime to interpret program
instructions on the host system.
Tip - EmitC alternate lowering path
VM instructions may be further lowered to C source code for static or
resource constrained deployment.
See the --output-format=vm-c compiler option and the samples in
samples/emitc_modules/
for more information.
The VM supports generic operations like loads, stores, arithmetic, function
calls, and control flow. The VM builds streams of more complex program logic and
dense math into HAL command buffers that are dispatched to hardware backends.
VM instances can serve multiple isolated execution contexts.
VM contexts are effectively sandboxes for loading modules and running
programs.
VM modules provide all functionality to execution contexts, including
access to hardware accelerators through the HAL. Compiled user programs are
also modules.
stateDiagram-v2
accTitle: Sample VM Modules
accDescr {
Bytecode modules contain program state, program functions, and debug
information.
HAL modules contain devices, executables, HAL functions, and HAL types.
Custom modules may contain external functions and custom types.
}
state "Bytecode module" as bytecode {
bytecode_contents: Module state<br>Program functions<br>Debug information
}
state "HAL module" as HAL {
hal_contents: Devices<br>Executables<br>HAL functions<br>HAL types
}
state "Custom module" as custom {
custom_contents: External functions<br>Custom types
}
IREE uses a Hardware Abstraction Layer (HAL) to model and interact with
hardware devices like CPUs, GPUs and other accelerators.
HAL drivers are used to enumerate and create HAL devices.
HAL devices interface with hardware, such as by allocating device memory,
preparing executables, recording and dispatching command buffers, and
synchronizing with the host.
HAL buffers represent data storage and buffer views represent views into
that storage with associated shapes and types (similar to "tensors").
#include<stdio.h>#include"iree/runtime/api.h"#include"iree/runtime/testdata/simple_mul_module_c.h"staticvoidiree_runtime_demo_run_session(iree_runtime_instance_t*instance);staticvoidiree_runtime_demo_perform_mul(iree_runtime_session_t*session);//===----------------------------------------------------------------------===//// 1. Entry point / shared iree_runtime_instance_t setup//===----------------------------------------------------------------------===//intmain(intargc,char**argv){// Create and configure the instance shared across all sessions.iree_runtime_instance_options_tinstance_options;iree_runtime_instance_options_initialize(&instance_options);iree_runtime_instance_options_use_all_available_drivers(&instance_options);iree_runtime_instance_t*instance=NULL;IREE_CHECK_OK(iree_runtime_instance_create(&instance_options,iree_allocator_system(),&instance));// All sessions should share the same instance.iree_runtime_demo_run_session(instance);iree_runtime_instance_release(instance);return0;}//===----------------------------------------------------------------------===//// 2. Load modules and initialize state in iree_runtime_session_t//===----------------------------------------------------------------------===//staticvoidiree_runtime_demo_run_session(iree_runtime_instance_t*instance){// TODO(#5724): move device selection into the compiled modules.iree_hal_device_t*device=NULL;IREE_CHECK_OK(iree_runtime_instance_try_create_default_device(instance,iree_make_cstring_view("local-task"),&device));// Create one session per loaded module to hold the module state.iree_runtime_session_options_tsession_options;iree_runtime_session_options_initialize(&session_options);iree_runtime_session_t*session=NULL;IREE_CHECK_OK(iree_runtime_session_create_with_device(instance,&session_options,device,iree_runtime_instance_host_allocator(instance),&session));iree_hal_device_release(device);// Load your user module into the session (from memory, from file, etc).constiree_file_toc_t*module_file=iree_runtime_testdata_simple_mul_module_create();IREE_CHECK_OK(iree_runtime_session_append_bytecode_module_from_memory(session,iree_make_const_byte_span(module_file->data,module_file->size),iree_allocator_null()));// Run your functions; you should reuse the session to make multiple calls.iree_runtime_demo_perform_mul(session);iree_runtime_session_release(session);}//===----------------------------------------------------------------------===//// 3. Call a function within a module with buffer views//===----------------------------------------------------------------------===//// func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) ->// tensor<4xf32>staticvoidiree_runtime_demo_perform_mul(iree_runtime_session_t*session){iree_runtime_call_tcall;IREE_CHECK_OK(iree_runtime_call_initialize_by_name(session,iree_make_cstring_view("module.simple_mul"),&call));// %arg0: tensor<4xf32>iree_hal_buffer_view_t*arg0=NULL;staticconstiree_hal_dim_targ0_shape[1]={4};staticconstfloatarg0_data[4]={1.0f,1.1f,1.2f,1.3f};IREE_CHECK_OK(iree_hal_buffer_view_allocate_buffer_copy(iree_runtime_session_device(session),iree_runtime_session_device_allocator(session),IREE_ARRAYSIZE(arg0_shape),arg0_shape,IREE_HAL_ELEMENT_TYPE_FLOAT_32,IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR,(iree_hal_buffer_params_t){.type=IREE_HAL_MEMORY_TYPE_DEVICE_LOCAL,.access=IREE_HAL_MEMORY_ACCESS_ALL,.usage=IREE_HAL_BUFFER_USAGE_DEFAULT,},iree_make_const_byte_span(arg0_data,sizeof(arg0_data)),&arg0));IREE_CHECK_OK(iree_hal_buffer_view_fprint(stdout,arg0,/*max_element_count=*/4096,iree_runtime_session_host_allocator(session)));IREE_CHECK_OK(iree_runtime_call_inputs_push_back_buffer_view(&call,arg0));iree_hal_buffer_view_release(arg0);fprintf(stdout,"\n * \n");// %arg1: tensor<4xf32>iree_hal_buffer_view_t*arg1=NULL;staticconstiree_hal_dim_targ1_shape[1]={4};staticconstfloatarg1_data[4]={10.0f,100.0f,1000.0f,10000.0f};IREE_CHECK_OK(iree_hal_buffer_view_allocate_buffer_copy(iree_runtime_session_device(session),iree_runtime_session_device_allocator(session),IREE_ARRAYSIZE(arg1_shape),arg1_shape,IREE_HAL_ELEMENT_TYPE_FLOAT_32,IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR,(iree_hal_buffer_params_t){.type=IREE_HAL_MEMORY_TYPE_DEVICE_LOCAL,.access=IREE_HAL_MEMORY_ACCESS_ALL,.usage=IREE_HAL_BUFFER_USAGE_DEFAULT,},iree_make_const_byte_span(arg1_data,sizeof(arg1_data)),&arg1));IREE_CHECK_OK(iree_hal_buffer_view_fprint(stdout,arg1,/*max_element_count=*/4096,iree_runtime_session_host_allocator(session)));IREE_CHECK_OK(iree_runtime_call_inputs_push_back_buffer_view(&call,arg1));iree_hal_buffer_view_release(arg1);IREE_CHECK_OK(iree_runtime_call_invoke(&call,/*flags=*/0));fprintf(stdout,"\n = \n");// -> tensor<4xf32>iree_hal_buffer_view_t*ret0=NULL;IREE_CHECK_OK(iree_runtime_call_outputs_pop_front_buffer_view(&call,&ret0));IREE_CHECK_OK(iree_hal_buffer_view_fprint(stdout,ret0,/*max_element_count=*/4096,iree_runtime_session_host_allocator(session)));iree_hal_buffer_view_release(ret0);iree_runtime_call_deinitialize(&call);}
#include<stdio.h>#include"iree/runtime/api.h"staticintiree_runtime_demo_main(void);staticiree_status_tiree_runtime_demo_run_session(iree_runtime_instance_t*instance);staticiree_status_tiree_runtime_demo_perform_mul(iree_runtime_session_t*session);#if defined(IREE_RUNTIME_DEMO_LOAD_FILE_FROM_COMMAND_LINE_ARG)staticconstchar*demo_file_path=NULL;// Takes the first argument on the command line as a file path and loads it.intmain(intargc,char**argv){if(argc<2){fprintf(stderr,"usage: session_demo module_file.vmfb\n");return1;}demo_file_path=argv[1];returniree_runtime_demo_main();}// Loads a compiled IREE module from the file system.staticiree_status_tiree_runtime_demo_load_module(iree_runtime_session_t*session){returniree_runtime_session_append_bytecode_module_from_file(session,demo_file_path);}#elif defined(IREE_RUNTIME_DEMO_LOAD_FILE_FROM_EMBEDDED_DATA)#include"iree/runtime/testdata/simple_mul_module_c.h"intmain(intargc,char**argv){returniree_runtime_demo_main();}// Loads the bytecode module directly from memory.//// Embedding the compiled output into your binary is not always possible (or// recommended) but is a fairly painless way to get things working on a variety// of targets without worrying about how to deploy files or pass flags.//// In cases like this the module file is in .rodata and does not need to be// freed; if the memory needs to be released when the module is unloaded then a// custom allocator can be provided to get a callback instead.staticiree_status_tiree_runtime_demo_load_module(iree_runtime_session_t*session){constiree_file_toc_t*module_file=iree_runtime_testdata_simple_mul_module_create();returniree_runtime_session_append_bytecode_module_from_memory(session,iree_make_const_byte_span(module_file->data,module_file->size),iree_allocator_null());}#else#error "must specify a way to load the module data"#endif // IREE_RUNTIME_DEMO_LOAD_FILE_FROM_*//===----------------------------------------------------------------------===//// 1. Entry point / shared iree_runtime_instance_t setup//===----------------------------------------------------------------------===//// Applications should create and share a single instance across all sessions.// This would live in your application startup/shutdown code or scoped to the// usage of IREE. Creating and destroying instances is expensive and should be// avoided.staticintiree_runtime_demo_main(void){// Set up the shared runtime instance.// An application should usually only have one of these and share it across// all of the sessions it has. The instance is thread-safe, while the// sessions are only thread-compatible (you need to lock if its required).iree_runtime_instance_options_tinstance_options;iree_runtime_instance_options_initialize(&instance_options);iree_runtime_instance_options_use_all_available_drivers(&instance_options);iree_runtime_instance_t*instance=NULL;iree_status_tstatus=iree_runtime_instance_create(&instance_options,iree_allocator_system(),&instance);// Run the demo.// A real application would load its models (at startup, on-demand, etc) and// retain them somewhere to be reused. Startup time and likelihood of failure// varies across different HAL backends; the synchronous CPU backend is nearly// instantaneous and will never fail (unless out of memory) while the Vulkan// backend may take significantly longer and fail if there are not supported// devices.if(iree_status_is_ok(status)){status=iree_runtime_demo_run_session(instance);}// Release the shared instance - it will be deallocated when all sessions// using it have been released (here it is deallocated immediately).iree_runtime_instance_release(instance);intret=(int)iree_status_code(status);if(!iree_status_is_ok(status)){// Dump nice status messages to stderr on failure.// An application can route these through its own logging infrastructure as// needed. Note that the status is a handle and must be freed!iree_status_fprint(stderr,status);iree_status_ignore(status);}returnret;}//===----------------------------------------------------------------------===//// 2. Load modules and initialize state in iree_runtime_session_t//===----------------------------------------------------------------------===//// Each instantiation of a module will live in its own session. Module state// like variables will be retained across calls within the same session.// Loads the demo module and uses it to perform some math.// In a real application you'd want to hang on to the iree_runtime_session_t// and reuse it for future calls - especially if it holds state internally.staticiree_status_tiree_runtime_demo_run_session(iree_runtime_instance_t*instance){// TODO(#5724): move device selection into the compiled modules.iree_hal_device_t*device=NULL;IREE_RETURN_IF_ERROR(iree_runtime_instance_try_create_default_device(instance,iree_make_cstring_view("local-task"),&device));// Set up the session to run the demo module.// Sessions are like OS processes and are used to isolate modules from each// other and hold runtime state such as the variables used within the module.// The same module loaded into two sessions will see their own private state.iree_runtime_session_options_tsession_options;iree_runtime_session_options_initialize(&session_options);iree_runtime_session_t*session=NULL;iree_status_tstatus=iree_runtime_session_create_with_device(instance,&session_options,device,iree_runtime_instance_host_allocator(instance),&session);iree_hal_device_release(device);// Load the compiled user module in a demo-specific way.// Applications could specify files, embed the outputs directly in their// binaries, fetch them over the network, etc.if(iree_status_is_ok(status)){status=iree_runtime_demo_load_module(session);}// Build and issue the call.if(iree_status_is_ok(status)){status=iree_runtime_demo_perform_mul(session);}// Release the session and free all resources.iree_runtime_session_release(session);returnstatus;}//===----------------------------------------------------------------------===//// 3. Call a function within a module with buffer views//===----------------------------------------------------------------------===//// The inputs and outputs of a call are reusable across calls (and possibly// across sessions depending on device compatibility) and can be setup by the// application as needed. For example, an application could perform// multi-threaded buffer view creation and then issue the call from a single// thread when all inputs are ready. This simple demo just allocates them// per-call and throws them away.// Sets up and calls the simple_mul function and dumps the results:// func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) ->// tensor<4xf32>//// NOTE: this is a demo and as such this performs no memoization; a real// application could reuse a lot of these structures and cache lookups of// iree_vm_function_t to reduce the amount of per-call overhead.staticiree_status_tiree_runtime_demo_perform_mul(iree_runtime_session_t*session){// Initialize the call to the function.iree_runtime_call_tcall;IREE_RETURN_IF_ERROR(iree_runtime_call_initialize_by_name(session,iree_make_cstring_view("module.simple_mul"),&call));// Append the function inputs with the HAL device allocator in use by the// session. The buffers will be usable within the session and _may_ be usable// in other sessions depending on whether they share a compatible device.iree_hal_device_t*device=iree_runtime_session_device(session);iree_hal_allocator_t*device_allocator=iree_runtime_session_device_allocator(session);iree_allocator_thost_allocator=iree_runtime_session_host_allocator(session);iree_status_tstatus=iree_ok_status();{// %arg0: tensor<4xf32>iree_hal_buffer_view_t*arg0=NULL;if(iree_status_is_ok(status)){staticconstiree_hal_dim_targ0_shape[1]={4};staticconstfloatarg0_data[4]={1.0f,1.1f,1.2f,1.3f};status=iree_hal_buffer_view_allocate_buffer_copy(device,device_allocator,// Shape rank and dimensions:IREE_ARRAYSIZE(arg0_shape),arg0_shape,// Element type:IREE_HAL_ELEMENT_TYPE_FLOAT_32,// Encoding type:IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR,(iree_hal_buffer_params_t){// Where to allocate (host or device):.type=IREE_HAL_MEMORY_TYPE_DEVICE_LOCAL,// Access to allow to this memory:.access=IREE_HAL_MEMORY_ACCESS_ALL,// Intended usage of the buffer (transfers, dispatches, etc):.usage=IREE_HAL_BUFFER_USAGE_DEFAULT,},// The actual heap buffer to wrap or clone and its allocator:iree_make_const_byte_span(arg0_data,sizeof(arg0_data)),// Buffer view + storage are returned and owned by the caller:&arg0);}if(iree_status_is_ok(status)){IREE_IGNORE_ERROR(iree_hal_buffer_view_fprint(stdout,arg0,/*max_element_count=*/4096,host_allocator));// Add to the call inputs list (which retains the buffer view).status=iree_runtime_call_inputs_push_back_buffer_view(&call,arg0);}// Since the call retains the buffer view we can release it here.iree_hal_buffer_view_release(arg0);fprintf(stdout,"\n * \n");// %arg1: tensor<4xf32>iree_hal_buffer_view_t*arg1=NULL;if(iree_status_is_ok(status)){staticconstiree_hal_dim_targ1_shape[1]={4};staticconstfloatarg1_data[4]={10.0f,100.0f,1000.0f,10000.0f};status=iree_hal_buffer_view_allocate_buffer_copy(device,device_allocator,IREE_ARRAYSIZE(arg1_shape),arg1_shape,IREE_HAL_ELEMENT_TYPE_FLOAT_32,IREE_HAL_ENCODING_TYPE_DENSE_ROW_MAJOR,(iree_hal_buffer_params_t){.type=IREE_HAL_MEMORY_TYPE_DEVICE_LOCAL,.access=IREE_HAL_MEMORY_ACCESS_ALL,.usage=IREE_HAL_BUFFER_USAGE_DEFAULT,},iree_make_const_byte_span(arg1_data,sizeof(arg1_data)),&arg1);}if(iree_status_is_ok(status)){IREE_IGNORE_ERROR(iree_hal_buffer_view_fprint(stdout,arg1,/*max_element_count=*/4096,host_allocator));status=iree_runtime_call_inputs_push_back_buffer_view(&call,arg1);}iree_hal_buffer_view_release(arg1);}// Synchronously perform the call.if(iree_status_is_ok(status)){status=iree_runtime_call_invoke(&call,/*flags=*/0);}fprintf(stdout,"\n = \n");// Dump the function outputs.iree_hal_buffer_view_t*ret0=NULL;if(iree_status_is_ok(status)){// Try to get the first call result as a buffer view.status=iree_runtime_call_outputs_pop_front_buffer_view(&call,&ret0);}if(iree_status_is_ok(status)){// This prints the buffer view out but an application could read its// contents, pass it to another call, etc.status=iree_hal_buffer_view_fprint(stdout,ret0,/*max_element_count=*/4096,host_allocator);}iree_hal_buffer_view_release(ret0);iree_runtime_call_deinitialize(&call);returnstatus;}
The compiler and runtime APIs may be used together to build a "just in time"
(JIT) execution engine. JIT compilation allows for last-minute specialization
with no prior knowledge of target devices and avoids issues with version drift,
but it can also constrain deployment options and usage scenarios.