Loading project...

ARena (UMass Hackathon II)

First time working with VR/AR

View on GitHub View Live

48 Hours

Build Time

<100ms

Sync Latency

E-Sports Belongs in Stadiums.

📜

Picture this: A party, everyone's bored, two people start playing Clash Royale on their phones. Within minutes, the whole room is crowding around a tiny screen, hyping up every move. The vibe was electric, but the experience sucked.

That's when the idea hit us: if this energy exists for two people and a phone, imagine a stadium full of fans watching the same match projected in 3D over a real field. Or a World Cup watch party in LA where fans see the game happening in Argentina recreated on the pitch in front of them. Or a concert where the artist performs "live" across ten cities simultaneously.

Demo 🍿

What we built

We built a shared AR experience where fans wear headsets and watch e-sports matches projected in 3D over a real field. Multiple people see the same game at the same moment, anchored to the exact same spot in physical space.

The hard part? Making AR headsets agree on where things are and when they happen. Each device has its own coordinate system and its own clock. We had to solve both problems at once: spatial sync and time sync, without a server.

My work

QR-based spatial anchoring that locks the arena to physical space and scales it automatically (576 lines)

Networking layer: a UDP broadcast protocol that syncs devices in real-time with sub-100ms accuracy (4 components, ~400 lines)

The Build

Tired but happy eyes. (PS: I'm on the left)

48 hours at HackUMass XIII. None of us had touched Unity or AR before.

When we finally saw two headsets show the same ball getting hit at the same instant, I took the headset off, hands shaking, and just yelled. Put it back on for a split second to confirm it wasn't a fluke. Took it right back off and started jumping around.

People definitely judged us.

The Clever Bit.

We Put QR Codes on Seats, Not the Field

Traditional AR stadium setups require massive markers installed on the playing field. Think $50K+ infrastructure, and you can't use the field during setup.

We flipped it: the audience becomes the anchor point. Small QR codes at seats, detected by headsets.

Why this works:

$5 of printed codes vs $50K installation

No field modification needed (doesn't block real events)

Scales to any venue—10 people or 10,000, same setup

The technical trick: Meta's API tells us the QR code's physical size in meters. We calculate qrSize / arenaSize and scale dynamically. Print a 5cm code? Arena scales down. Print 20cm? Scales up. No recalibration.

Technical Highlights

Team Setup: Four people, two days, zero Unity experience. We split work early and played to strengths. I handled networking and spatial tracking while others tackled XR tooling, replay data processing, and 3D modeling. Clean interfaces upfront meant we could work in parallel without blocking each other.

GitHub - Ben-Santana/Arena: ARena is a shared AR experience built for stadiums and live venues. With AR glasses, spectators can see e-sports games brought into vivid reality, projected right over the existing field. Instead of watching on a screen, fans experience the match as if it’s happening live before them, sharing reactions and energy together.

ARena is a shared AR experience built for stadiums and live venues. With AR glasses, spectators can see e-sports games brought into vivid reality, projected right over the existing field. Instead o...

https://github.com/Ben-Santana/Arena

Getting Multiple Headsets to See the Same Thing

The Problem: Each headset spawns the arena independently. It has its own clock. Device A starts at t=0. Device B starts 5 seconds later, also at t=0. So how do you make them show the same ball at the same moment?

My Solution: The first device to scan a QR code becomes the host. It broadcasts its replay timestamp every 100ms via UDP. Other devices listen, detect the host, and jump to whatever timestamp they're hearing. Simple conductor-and-orchestra logic.

The Result: Sub-100ms sync across devices. No server required. Works completely offline.

The protocol is dead simple. A JSON message with device ID, ball time, car time, and play state. 100 bytes total.

Why JSON? I could debug it with Wireshark. When things broke (and they absolutely did), being able to read the packets in plain text saved hours. No binary decoding, no guesswork. Just open Wireshark and see exactly what's being sent.

Auto-Scaling to Any QR Code Size

The Problem: QR codes come in different physical sizes. Hard-code the scale for a 10cm marker and it breaks when someone prints a 5cm version. The arena appears twice as large as it should. Not great.

My Solution: Meta's OVRBounded2D component reports the QR code's real-world dimensions in meters. I calculate the ratio between the detected QR size and the arena's reference size, then scale the entire 3D model dynamically. Print any size you want.

The Result: 5cm, 10cm, 20cm—doesn't matter. The arena scales automatically. No recalibration needed.

The arena prefab is authored at roughly 10 meters wide—standard Rocket League field scale. A 10cm QR code scales it down 100x. A 20cm code? 50x. The math just works.

Learning Meta's XR Kit

The hardest part wasn't the networking or the math. It was figuring out Meta's XR SDK with zero useful documentation.

The official docs assumed you already knew what OVRAnchor, OVRLocatable, and OVRBounded2D were. Sample code was sparse. I spent the first 6 hours just trying to get a QR code detected. The API has multiple layers: tracker configuration, anchor fetching, component enabling, pose retrieval. Miss one step and nothing works. No error messages. Just silence.

What finally clicked: treat it like a pipeline. Configure tracker → fetch trackables → get components → enable locatable → retrieve pose → instantiate object. Once I mapped that flow, everything else fell into place. Still took way longer than it should have.

Small Details That Made a Difference

Device ID filtering: Added deviceId to sync messages so headsets ignore their own broadcasts.

→ Prevented feedback loops where a device would sync to itself. Cut network traffic in half.

2-second discovery window: Headsets wait 2 seconds before claiming host role.

→ Prevents split-brain scenarios. If multiple devices start simultaneously, they won't all think they're the host.

The eduroam problem: Six hours in, sync worked perfectly in testing but failed completely in the demo space. Turns out university WiFi blocks UDP broadcast traffic. Solution: mobile hotspot.

→ Almost killed the demo. Now I always test on the target network first. Always.

JSON over binary: Used human-readable JSON for network messages.

→ Could debug with Wireshark in minutes instead of hours. Readability beats performance for a 48-hour build. Every time.

System Architecture

Here's how everything fits together.

The flow:

QR Detection: Each device scans QR codes using Meta's tracker API

Spatial Anchoring: QR position becomes the world anchor, arena spawns relative to it

Role Assignment: First device becomes host (broadcasts), others become clients (listen)

Replay Sync: Host sends timestamps, clients jump to match

Continuous Update: 10 broadcasts per second keep everyone aligned

The key: no central server. Devices coordinate peer-to-peer. First device to start becomes the authority. Simple hierarchy, zero configuration.

What I Learned

What Worked

Splitting work by interfaces, not features. Defining the sync protocol upfront (what data, what format) meant I could build networking while someone else built replay controllers. We integrated in the last 6 hours with minimal friction.

Starting with Meta's sample scenes. After wasting 4 hours reading incomplete docs, I just opened their sample scene and reverse-engineered it. Learning from working code beats learning from documentation. Always.

What I'd Do Differently

Test on the target network from day one. The eduroam issue cost us 3 hours of panic debugging. If I'd tested on university WiFi instead of my laptop hotspot earlier, we'd have caught it immediately.

Add visual debugging from the start. I spent hours staring at console logs trying to figure out why devices weren't syncing. On-screen debug text showing "Host" vs "Client" status and current timestamps would've saved half that time.

Unexpected Discoveries

AR development is still really early. Meta's SDK works, but it's rough. Even ChatGPT couldn't help much because the APIs are too new. This is what it must've felt like to build iPhone apps in 2008. Exciting but frustrating.

The hardest problems aren't always technical. Figuring out UDP broadcast took 2 hours. Debugging why it didn't work on eduroam took 3 hours. Sometimes the constraint isn't the algorithm, it's the environment.

End Credits

Somehow ended up listening to this the entire hackathon…

Tomo (UMass Hackathon I)

Tomo is an infinite arcade game with an evolving codebase. Instead of static levels, it uses AI to generate and execute new mechanics live, ensuring every playthrough is genuinely unique.