ai technical write-up

PitchCoach

Embodied AI presentation coach with live speech, audio, and webcam feedback

PitchCoach is a live practice environment for pitches, speeches, interviews, demos, and presentations. Instead of only analyzing an uploaded recording afterward, it simulates a coach in the room with real-time transcript capture, avatar reactions, delivery signals, and post-session feedback.

All projects GitHub Live site

Problem / motivation

Presentation coaching is most useful when it catches both content and delivery issues: unclear problem framing, filler words, pace, poor camera presence, weak calls to action, and audience mismatch. PitchCoach combines those signals into a single practice loop.

Key technical challenges

Designing a product loop that feels live without depending on fragile real-time AI assumptions.
Combining transcript, visual, audio, audience, and timeline data into one normalized feedback payload.
Keeping API keys and AI-provider calls on the backend while preserving a smooth browser demo.
Producing useful fallback coaching so the app remains demoable without external AI credentials.

Architecture / workflow

How the system fits together

Browser UI captures webcam, microphone, presentation context, audience mode, and live session state.

Speech recognition produces transcript chunks while browser speech synthesis can deliver spoken coach feedback.

Client-side visual heuristics track camera presence, face position, lighting, eye movement, and head movement signals.

Audio analysis summarizes volume, vocal energy, pitch range, pauses, and rhythm during the session.

Node backend exposes feedback endpoints and keeps LLM API keys server-side.

Gemini is preferred when configured, with OpenAI support available for future comparison and local fallback feedback when no key is present.

What I built

Live coaching mode with transcript capture, timeline events, audience types, and coaching intensity settings.
Feedback dashboard for pace, filler words, problem clarity, user specificity, demo clarity, impact, and call to action.
Upload path for transcripts, captions, audio, and video practice files.
LLM integration point using structured JSON feedback from Gemini, with OpenAI comparison support planned.

Outcomes / metrics

A technically impressive AI product prototype with real multimodal signals instead of a chat-only wrapper.
Clear demonstration of frontend interaction design, backend API boundaries, AI orchestration, and graceful fallback behavior.

Lessons learned

AI products feel better when deterministic signals and LLM judgment support each other.
Keeping the payload normalized across live and upload flows makes the backend easier to evolve.
A useful demo should still work when provider keys are missing or model calls fail.

Screenshots / media

Visual evidence placeholders

Replace these panels with screenshots, demos, diagrams, or notebook exports as each artifact becomes ready for publishing.

Media

Live coaching flow

Choose audience and intensity, practice a pitch, answer a follow-up, and receive structured feedback.

Media

Multimodal signal pipeline

Transcript chunks, webcam heuristics, microphone metrics, and presentation context feed the coach response.