Can I see a live demo?

Yes — open the public demo or request a private walkthrough.

Benchmark & route LLMs on real Android devices

Name: Evalyard
Brand: Evalyard
Price: 49 USD
Availability: InStock

Evalyard is a hosted dashboard + a real Android device lab — TTFT, tokens/sec, P50/P95, throttling, temperature, and battery metrics.

Self-service dashboard is coming soon.

Try the demo

TTFT

First-token latency per device & model

Tokens/sec

Sustained throughput under load

Thermals & Battery

Temperature, throttle, drain

Evalyard

Data in. Phones optional.

Stream your own metrics into Evalyard, then add real devices when you need them.

Bring your metrics

Latency, quality, power — sent via API.

Logs

Evalyard

Unified bench view

API metrics and device runs in one UI.

ModelDeviceScenario

Real devices layer

Attach phones for TTFT & on-device perf.

TTFT tokens/sec

Explore & export

Slice, share, export CSVs & snapshots.

CSV PNG Link

Two ways to use Evalyard

Run on real Android devices – or plug in your own metrics via API.

Devices path

Run on real phones

Use our Android lab or your own phones for real-device TTFT & throughput.

Install

Models

Run

Metrics

API path

Send metrics via API

Stream latency / quality logs into Evalyard and reuse the same dashboards.

API

Dashboard

Get early access

We can provision specific phones, build adapters, and share a read-only dashboard for your team. No spam.

FAQ

Quick answers

Do the plans include devices?

By default it’s BYOD (bring your own Android phones). Device rental / dedicated racks are available for Fabric and Enterprise on request.

What are device-hours?

Time a phone is actively running your jobs. Hitting the limit? Pause runs or enable pay-as-you-go overage.

How do I access the dashboard now?

The self-service dashboard is not publicly available yet. Please book a private demo. We’ll walk you through the metrics and provide screenshots from the current version.

Can I cancel anytime?

Yes — monthly billing, cancel anytime. No long-term lock-in.

Need fully isolated infrastructure or shipped devices? Ask about Enterprise Fabric.

Vote on what we build next

Tell us what to build →

High-load stress testing for on-device LLMs Automated output grading / evals Image-based / multimodal models Plugins / SDK for game engines Per-device battery & thermal tracking