Design: Audio Perception Layer¶
Problem¶
LLMs cannot listen. Symbolic evaluation alone can produce scores that rise while actual listening quality degrades.
Solution¶
Audio feature extraction via librosa + pyloudnorm. A PerceptualReport frozen dataclass is produced after every audio render. Symbolic-acoustic divergence detection catches cases where the score looks correct but sounds wrong.
Components¶
AudioPerceptionAnalyzer: extracts LUFS, spectral features (centroid, rolloff, flatness), onset density, tempo stability, 7-band energy distribution, masking riskPerceptualReport: frozen dataclass with loudness, spectral, and temporal dimensionsListeningSimulator: orchestrates post-render perception, persistsperceptual.json, mood divergence detection via intent keywordsUseCaseEvaluator: 7 use cases (YouTube BGM, Game BGM, Advertisement, Study Focus, Meditation, Workout, Cinematic) with context-specific scoring- 5 acoustic divergence critique rules: symbolic_acoustic_divergence, lufs_target_violation, spectral_imbalance, brightness_intent_mismatch, energy_trajectory_violation
Pipeline Position¶
Step 7.5 — after audio render, before optional feedback loopback.
Files¶
src/yao/perception/audio_features.py— AudioPerceptionAnalyzersrc/yao/perception/listening_simulator.py— ListeningSimulatorsrc/yao/perception/use_case_evaluator.py— UseCaseEvaluatorsrc/yao/verify/acoustic/divergence_rules.py— 5 critique rules