ai technical write-up
PitchCoach
Embodied AI presentation coach with live speech, audio, and webcam feedback
PitchCoach is a live practice environment for pitches, speeches, interviews, demos, and presentations. Instead of only analyzing an uploaded recording afterward, it simulates a coach in the room with real-time transcript capture, avatar reactions, delivery signals, and post-session feedback.
Problem / motivation
Presentation coaching is most useful when it catches both content and delivery issues: unclear problem framing, filler words, pace, poor camera presence, weak calls to action, and audience mismatch. PitchCoach combines those signals into a single practice loop.
Key technical challenges
- Designing a product loop that feels live without depending on fragile real-time AI assumptions.
- Combining transcript, visual, audio, audience, and timeline data into one normalized feedback payload.
- Keeping API keys and AI-provider calls on the backend while preserving a smooth browser demo.
- Producing useful fallback coaching so the app remains demoable without external AI credentials.
Architecture / workflow
How the system fits together
Browser UI captures webcam, microphone, presentation context, audience mode, and live session state.
Speech recognition produces transcript chunks while browser speech synthesis can deliver spoken coach feedback.
Client-side visual heuristics track camera presence, face position, lighting, eye movement, and head movement signals.
Audio analysis summarizes volume, vocal energy, pitch range, pauses, and rhythm during the session.
Node backend exposes feedback endpoints and keeps LLM API keys server-side.
Gemini is preferred when configured, with OpenAI support available for future comparison and local fallback feedback when no key is present.
What I built
- Live coaching mode with transcript capture, timeline events, audience types, and coaching intensity settings.
- Feedback dashboard for pace, filler words, problem clarity, user specificity, demo clarity, impact, and call to action.
- Upload path for transcripts, captions, audio, and video practice files.
- LLM integration point using structured JSON feedback from Gemini, with OpenAI comparison support planned.
Outcomes / metrics
- A technically impressive AI product prototype with real multimodal signals instead of a chat-only wrapper.
- Clear demonstration of frontend interaction design, backend API boundaries, AI orchestration, and graceful fallback behavior.
Lessons learned
- AI products feel better when deterministic signals and LLM judgment support each other.
- Keeping the payload normalized across live and upload flows makes the backend easier to evolve.
- A useful demo should still work when provider keys are missing or model calls fail.
Screenshots / media
Visual evidence placeholders
Replace these panels with screenshots, demos, diagrams, or notebook exports as each artifact becomes ready for publishing.
Media
Live coaching flow
Choose audience and intensity, practice a pitch, answer a follow-up, and receive structured feedback.
Media
Multimodal signal pipeline
Transcript chunks, webcam heuristics, microphone metrics, and presentation context feed the coach response.