Technology continues to grow by leaps and bounds, drawing on various areas to explore new capabilities and features. One of them is power. “reconstruct” a person’s face through a fragment of voice.
The study Speech2Face presented in 2019 at a Vision and Recognition Patterns conference showed that an Artificial Intelligence (AI) can decipher what a person looks like through short audio segments.
The document explains that the goal of researchers Tae-Hyun On, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, and Michael Rubinstein of the MIT Science and Research Program is not to identically reconstruct the faces of the people but to make an image with the physical characteristics that are related to the analyzed audio.
To achieve this they used designed and trained a deep neural network that analyzed millions of videos taken from YouTube where people are talking. During training the model learned to correlate voices with facesallowing you to produce images with physical attributes similar to speakersincluding the age, gender and ethnicity.