gptwarrior.github.io

Running MLC LLM on Mobile & Embedded Devices

Since 6/2025, I’ve been exploring how to run modern LLMs efficiently on mobile devices with limited Computation/Memory/power resources. The goal of the research is to learn how prompting can impact energy consumption. My approach centered around MLC LLM, a framework built on top of TVM that allows efficient inference on a wide range of devices

So far, I have run LLM models on below devices

LLM models covered:

I have gained consolidated understanding about the LLM model’s performance, energy consumption with different prompts.

Summary of learnings

MLC LLM and pre-built package

pre-built package now is only avaialbe from https://mlc.ai/wheels/. You can only install pre-built package if there is a release matching your PC OS. For example, if you are using Debian OS, you may not be able to have it installed. If your PC is Windows, then you can easily install it, and start your Android App smoothly.

Compile LLM models with mlc llm

For Android, you should pre-compile it, then run it. For Linux OS to run Python API, you can do JIT compiling.

Android Project upgrade for MLCChat

The example MLCChat is built on an older gradle version, it’s required to upgrade it from xx to xx.

Building TVM From Source for Android to fix an issue in TVM

I ran into this issue in Jan, 2026, when the TVM fix is not merged.

Building TVM + MLC LLM From Source for Orange Pi 5 Pro

Feb, 2026, It’s required to build these 2 becasue there are no pre-built package to use yet. Something good to know which makes Orange PI 5 different:

✨ Project Setup

Android MLCChat

Python API on Orange PI 5

📦 Why Build From Source?

Pre-built MLC LLM packages are convenient, but:

Platform Pre-built MLC LLM availability Issues
Windows ✔ Works well No issues found
Ubuntu ARM devices Architecture mismatch, missing libs
Debian Not available on 6/2025
Mac OS Not available on 6/2025

🔧 What I Built Myself

Android MLCChat automation

It can run LLMs with 1000+ prompts with same environment temperature

✔ TVM (deep learning compiler)

Fully custom build with:

✔ MLC LLM (model compiler + runtime)

Including:

✔ Compiled LLM Models

Optimized using MLC’s pipeline:

🟧 Running on Orange Pi 5 Pro

I’m currently focused on getting MLC LLM running smoothly on the Orange Pi 5 Pro, which features:

Since no pre-built MLC/TVM packages exist for this board, I:

Everything now works correctly on the Orange Pi 5 Pro.


📘 Key Learnings

1. Pre-built packages are limited

Many mobile/embedded devices require custom builds.

2. TVM source builds are crucial

TVM generates optimized kernels tailored to your hardware.

3. MLC LLM must match your TVM build

Mismatched builds produce cryptic runtime failures.

4. ARM SBCs (like Orange Pi) need careful tuning

Compiler flags, OPENCL drivers, and Python wheel compatibility all matter.

5. Quantization makes LLMs viable

4-bit and 8-bit models run surprisingly well on RK3588S.


🚦 Status


🔮 Future Work


🤝 Contributing


📫 Contact

If you have questions or want to exchange ideas, feel free to reach out or open a discussion in the repo.