WildTalker: Talking Portrait Synthesis
in the Wild

Chung-Ang University, Seoul, Korea
equal contribution*
ECCV 2024 Workshop(Wild3D)

WildTalker is a novel method for synthesizing high-quality talking portraits, employing flow-guided temporal masking and multi-scale spectral subtraction to ensure realistic lip synchronization and visual coherence in both controlled and real-world environments.

Abstract

We introduce WildTalker, a novel approach for synthesizing high-quality talking portraits that effectively addresses the challenges of real-world environments. Traditional methods often struggle with transient movements and noisy audio. WildTalker overcomes these issues by integrating flow-guided temporal masking, which adeptly processes dynamic regions by capturing and de-emphasizing transient areas that could disrupt visual coherence. Additionally, WildTalker employs multi-scale spectral subtraction for robust audio denoising, ensuring accurate and natural lip synchronization even under challenging auditory conditions. This comprehensive approach allows WildTalker to excel in both controlled and variable scenarios, producing highly realistic and synchronized talking portraits. Our experiments demonstrate that WildTalker significantly enhances the quality of audio-driven 3D talking portraits in dynamic settings, achieving superior lip synchronization under challenging audio conditions. These results highlight that our method outperforms existing approaches not only in real-world scenarios but also in controlled environments, underscoring its potential for practical applications.



Proposed Method

wildtalker


BibTeX