Abstract: Audio-Visual Speech Enhancement (AVSE) has the vision to enhance speech quality and requite intelligibility by using both audio and video inputs, which is quite useful in noisy conditions.
Abstract: Audio-Visual Speech Recognition (AVSR) combines lip-based video with audio and can improve performance in noise, but most methods are trained only on English data. One limitation is the lack ...