Beyond TinyML: Balance inference accuracy and latency on MCUs

Abstract

Can an ESP32-based MCU run (tiny)ML models accurately and efficiently? This talk showcases how a tiny microcontroller can transparently leverage neighboring nodes to run inference on full, unquantized torchvision models in less than 100ms! We build on vAccel, an open abstraction layer that allows interoperable hardware acceleration and enable devices like the ESP32 to transparently offload ML inference and signal-processing tasks to nearby edge or cloud nodes. Through a lightweight agent and a unified API, vAccel bridges heterogeneous devices, enabling seamless offload without modifying application logic. This session presents our IoT port of vAccel (client & lightweight agent) and demonstrates a real deployment where an ESP32 delegates inference to a GPU-backed k8s node, reducing latency by 3 orders of magnitude while preserving Kubernetes-native control and observability. Attendees will see how open acceleration can unify the Cloud–Edge–IoT stack through standard interfaces and reusable runtimes.

When Jan 31, 2026 11:50 AM — 12:10 PM
Where UD2.120 (Chavanne) Brussels,

Code & Resources

Anastasia Mallikopoulou
Anastasia Mallikopoulou
Junior Systems Engineer - Observability & Applied ML

Systems engineer with expertise in observability stacks, runtime benchmarking, distributed instrumentation and applied machine learning for system reliability and performance optimization.

Charalampos Mainas
Charalampos Mainas
Systems Researcher

PhD candidate focusing on low-level systems programming, Linux kernel development, hypervisors (KVM, Xen) and unikernel runtime ecosystems.

Anastassios Nanos
Anastassios Nanos
Systems Researcher

My research interests include Systems software, virtualization, Operating Systems, Containers, unikernels etc.