Since 6/2025, I’ve been exploring how to run modern LLMs efficiently on mobile devices with limited Computation/Memory/power resources. The goal of the research is to learn how prompting can impact energy consumption. My approach centered around MLC LLM, a framework built on top of TVM that allows efficient inference on a wide range of devices
So far, I have run LLM models on below devices
LLM models covered:
I have gained consolidated understanding about the LLM model’s performance, energy consumption with different prompts.
pre-built package now is only avaialbe from https://mlc.ai/wheels/. You can only install pre-built package if there is a release matching your PC OS. For example, if you are using Debian OS, you may not be able to have it installed. If your PC is Windows, then you can easily install it, and start your Android App smoothly.
For Android, you should pre-compile it, then run it. For Linux OS to run Python API, you can do JIT compiling.
The example MLCChat is built on an older gradle version, it’s required to upgrade it from xx to xx.
I ran into this issue in Jan, 2026, when the TVM fix is not merged.
Feb, 2026, It’s required to build these 2 becasue there are no pre-built package to use yet. Something good to know which makes Orange PI 5 different:
Pre-built MLC LLM packages are convenient, but:
| Platform | Pre-built MLC LLM availability | Issues |
|---|---|---|
| Windows | ✔ Works well | No issues found |
| Ubuntu ARM devices | ❌ | Architecture mismatch, missing libs |
| Debian | ❌ | Not available on 6/2025 |
| Mac OS | ❌ | Not available on 6/2025 |
It can run LLMs with 1000+ prompts with same environment temperature
Fully custom build with:
Including:
Optimized using MLC’s pipeline:
I’m currently focused on getting MLC LLM running smoothly on the Orange Pi 5 Pro, which features:
Since no pre-built MLC/TVM packages exist for this board, I:
Everything now works correctly on the Orange Pi 5 Pro.
Many mobile/embedded devices require custom builds.
TVM generates optimized kernels tailored to your hardware.
Mismatched builds produce cryptic runtime failures.
Compiler flags, OPENCL drivers, and Python wheel compatibility all matter.
4-bit and 8-bit models run surprisingly well on RK3588S.
If you have questions or want to exchange ideas, feel free to reach out or open a discussion in the repo.