Production win

Offline-first Kiosk & Mobile Punching

Disputes dropped from 50/week to 10/week

Mobile / Real-time UXPlatform Performance / Reliability

Architecture diagram

Offline-first punching and verification

Punches are captured locally first, verified with biometric checks, then reconciled when connectivity returns.

Employee device

Kiosk or mobile punch action

Enqueue punch

SQS queue

Buffered punch events for async processing

Verify identity

Face verification

AWS Rekognition match check

Validated event

Sync service

Dedupe, retry, conflict handling

Reconcile

Attendance system

Accepted punch timeline

Auditable state

Manager review

Fewer disputes and clearer status

SQS buffering decoupled punch capture from downstream processing availability.
Face verification reduced abuse but still required fallback/review paths.
Sync status stayed explainable so disputes could be resolved from event state.

Attendance systems sound boring until you have to make one work in patchy field conditions with real fraud pressure. Workers needed to punch in and out from kiosks and mobile devices, often with unstable connectivity. If the app required a healthy network every time, valid punches would fail; if it accepted everything with no checks, abuse would rise fast. So I went offline-first: punch events are captured locally and stored until connectivity returns, with SQLite handling on-device durability and a sync path reconciling later. The employee's action no longer depends on the network behaving at that exact moment.

The failure I am most glad I caught was in load testing, not production. Offline-first has an obvious-in-hindsight failure mode: everything queues while offline, then hits the server all at once when the network returns. I ran roughly 150 punches across about ten devices offline, then opened the internet, and the queued writes surged the API in one burst. The low-capacity staging server fell over. It was bad, and on a real fleet it would have been far worse. The lesson stuck: load testing is non-negotiable for offline-first, and the fix was to throttle on both sides, client-side so each device drains its queue at a sane rate, and server-side so a reconnect storm cannot take the system down. You design for the burst, not the steady state.

To deter buddy punching I added face verification with AWS Rekognition, but I refused to treat it like unquestionable truth, because it is not. In practice it verified cleanly on roughly eight of ten punches; the other two failed on the predictable things: poor lighting, a cap, a partially obscured face. The wrong move would be to block those workers or pretend the model is perfect. So I designed for the miss. When verification failed, the system logged it for review and guided the person to fix the input, remove the cap, look at the camera, improve the lighting, and the failed case was reviewed against the stored photo rather than silently rejected. The failure became a recoverable workaround with an audit trail, not a locked-out worker or a quiet false accept.

The system also had to stay explainable. In attendance products, somebody eventually asks, "Was the punch captured? Was it synced? Was it rejected? Why?" A system that cannot answer that clearly creates new disputes even when the data model is fine. The outcome was a much better balance than either extreme would have given: disputes dropped from around fifty a week to roughly ten, handled, not eliminated, and legitimate punches stopped getting lost to weak connectivity.

My rule on this one, and generally: address the problem, do not kick it down the road. The burst-on-reconnect failure got a real throttling fix, and the model's imperfection got an honest fallback, because that is how things actually get solved for good rather than papered over.

Tech stack

React NativeSQLiteAWS Rekognition