ARena (UMass Hackathon II)
First time working with VR/AR
48 Hours
Build Time
<100ms
Sync Latency
.png)
E-Sports Belongs in Stadiums.
Picture this: A party, everyone's bored, two people start playing Clash Royale on their phones. Within minutes, the whole room is crowding around a tiny screen, hyping up every move. The vibe was electric, but the experience sucked.
That's when the idea hit us: if this energy exists for two people and a phone, imagine a stadium full of fans watching the same match projected in 3D over a real field. Or a World Cup watch party in LA where fans see the game happening in Argentina recreated on the pitch in front of them. Or a concert where the artist performs "live" across ten cities simultaneously.
Demo 🍿

What we built
We built a shared AR experience where fans wear headsets and watch e-sports matches projected in 3D over a real field. Multiple people see the same game at the same moment, anchored to the exact same spot in physical space.
The hard part? Making AR headsets agree on where things are and when they happen. Each device has its own coordinate system and its own clock. We had to solve both problems at once: spatial sync and time sync, without a server.
My work
- QR-based spatial anchoring that locks the arena to physical space and scales it automatically (576 lines)
- Networking layer: a UDP broadcast protocol that syncs devices in real-time with sub-100ms accuracy (4 components, ~400 lines)
The Build

48 hours at HackUMass XIII. None of us had touched Unity or AR before.
When we finally saw two headsets show the same ball getting hit at the same instant, I took the headset off, hands shaking, and just yelled. Put it back on for a split second to confirm it wasn't a fluke. Took it right back off and started jumping around.
People definitely judged us.
The Clever Bit.
We Put QR Codes on Seats, Not the Field
Traditional AR stadium setups require massive markers installed on the playing field. Think $50K+ infrastructure, and you can't use the field during setup.
We flipped it: the audience becomes the anchor point. Small QR codes at seats, detected by headsets.
Why this works:
- $5 of printed codes vs $50K installation
- No field modification needed (doesn't block real events)
- Scales to any venue—10 people or 10,000, same setup
The technical trick: Meta's API tells us the QR code's physical size in meters. We calculate
qrSize / arenaSize and scale dynamically. Print a 5cm code? Arena scales down. Print 20cm? Scales up. No recalibration.Technical Highlights
Team Setup: Four people, two days, zero Unity experience. We split work early and played to strengths. I handled networking and spatial tracking while others tackled XR tooling, replay data processing, and 3D modeling. Clean interfaces upfront meant we could work in parallel without blocking each other.
Getting Multiple Headsets to See the Same Thing
The Problem: Each headset spawns the arena independently. It has its own clock. Device A starts at
t=0. Device B starts 5 seconds later, also at t=0. So how do you make them show the same ball at the same moment?My Solution: The first device to scan a QR code becomes the host. It broadcasts its replay timestamp every 100ms via UDP. Other devices listen, detect the host, and jump to whatever timestamp they're hearing. Simple conductor-and-orchestra logic.
The Result: Sub-100ms sync across devices. No server required. Works completely offline.

The protocol is dead simple. A JSON message with device ID, ball time, car time, and play state. 100 bytes total.
Why JSON? I could debug it with Wireshark. When things broke (and they absolutely did), being able to read the packets in plain text saved hours. No binary decoding, no guesswork. Just open Wireshark and see exactly what's being sent.
Auto-Scaling to Any QR Code Size
The Problem: QR codes come in different physical sizes. Hard-code the scale for a 10cm marker and it breaks when someone prints a 5cm version. The arena appears twice as large as it should. Not great.
My Solution: Meta's
OVRBounded2D component reports the QR code's real-world dimensions in meters. I calculate the ratio between the detected QR size and the arena's reference size, then scale the entire 3D model dynamically. Print any size you want.The Result: 5cm, 10cm, 20cm—doesn't matter. The arena scales automatically. No recalibration needed.
The arena prefab is authored at roughly 10 meters wide—standard Rocket League field scale. A 10cm QR code scales it down 100x. A 20cm code? 50x. The math just works.
Learning Meta's XR Kit
The hardest part wasn't the networking or the math. It was figuring out Meta's XR SDK with zero useful documentation.
The official docs assumed you already knew what
OVRAnchor, OVRLocatable, and OVRBounded2D were. Sample code was sparse. I spent the first 6 hours just trying to get a QR code detected. The API has multiple layers: tracker configuration, anchor fetching, component enabling, pose retrieval. Miss one step and nothing works. No error messages. Just silence.What finally clicked: treat it like a pipeline. Configure tracker → fetch trackables → get components → enable locatable → retrieve pose → instantiate object. Once I mapped that flow, everything else fell into place. Still took way longer than it should have.
Small Details That Made a Difference

- Device ID filtering: Added
deviceIdto sync messages so headsets ignore their own broadcasts.
→ Prevented feedback loops where a device would sync to itself. Cut network traffic in half.
- 2-second discovery window: Headsets wait 2 seconds before claiming host role.
→ Prevents split-brain scenarios. If multiple devices start simultaneously, they won't all think they're the host.
- The eduroam problem: Six hours in, sync worked perfectly in testing but failed completely in the demo space. Turns out university WiFi blocks UDP broadcast traffic. Solution: mobile hotspot.
→ Almost killed the demo. Now I always test on the target network first. Always.
- JSON over binary: Used human-readable JSON for network messages.
→ Could debug with Wireshark in minutes instead of hours. Readability beats performance for a 48-hour build. Every time.
System Architecture
Here's how everything fits together.
.jpg)
The flow:
- QR Detection: Each device scans QR codes using Meta's tracker API
- Spatial Anchoring: QR position becomes the world anchor, arena spawns relative to it
- Role Assignment: First device becomes host (broadcasts), others become clients (listen)
- Replay Sync: Host sends timestamps, clients jump to match
- Continuous Update: 10 broadcasts per second keep everyone aligned
The key: no central server. Devices coordinate peer-to-peer. First device to start becomes the authority. Simple hierarchy, zero configuration.
What I Learned
What Worked
- Splitting work by interfaces, not features. Defining the sync protocol upfront (what data, what format) meant I could build networking while someone else built replay controllers. We integrated in the last 6 hours with minimal friction.
- Starting with Meta's sample scenes. After wasting 4 hours reading incomplete docs, I just opened their sample scene and reverse-engineered it. Learning from working code beats learning from documentation. Always.
What I'd Do Differently
- Test on the target network from day one. The eduroam issue cost us 3 hours of panic debugging. If I'd tested on university WiFi instead of my laptop hotspot earlier, we'd have caught it immediately.
- Add visual debugging from the start. I spent hours staring at console logs trying to figure out why devices weren't syncing. On-screen debug text showing "Host" vs "Client" status and current timestamps would've saved half that time.
Unexpected Discoveries
- AR development is still really early. Meta's SDK works, but it's rough. Even ChatGPT couldn't help much because the APIs are too new. This is what it must've felt like to build iPhone apps in 2008. Exciting but frustrating.
- The hardest problems aren't always technical. Figuring out UDP broadcast took 2 hours. Debugging why it didn't work on eduroam took 3 hours. Sometimes the constraint isn't the algorithm, it's the environment.
End Credits

.jpeg)
