How to Identify AI Vocals and my toughts on the new snippet
Let’s finally put an end to this discussion about whether something is AI or not. First, we need to understand what an AI voice model actually is (since, based on some comments in the subreddit, it’s clear that some people don’t know how it works).
An AI voice model is nothing more than a filter. In other words, a real voice must first be recorded, and then that voice is processed and transformed into the timbre used by the model. However, this technology still has many limitations, especially when it comes to the naturalness of the sound compared to a normal recording.
With that in mind, let’s list some ways to identify when a track uses AI:
Plosives: AI models still struggle with plosive sounds (explosive consonants like “p” and “t,” which require a burst of air to be produced). Many times, AI sounds robotic when processing these consonants in a track. This can be observed, for example, in Sky City, in the famous “pant£££house” line. However, this issue is not present in the snippet shared by Ye, which suggests that the preview is not AI-generated.
Glitched breathing: AI voice models interpret breaths as speech (because most of these models are designed for conversations rather than music). Because of this, AI sometimes glitches on breaths, producing a metallic sound. This issue is also not present in the new snippet.
Long notes: Since AI voice models are primarily built for speech rather than music, they often struggle to sustain long notes for extended periods, once again producing that characteristic robotic sound. In the snippet, we can clearly hear Ye holding longer notes multiple times, and there are no signs of glitches or robotic voice artifacts.
Pattern recognition in vocals: AI models look for patterns in a person’s voice. The AI model previously used by Ye was heavily based on his rhyming style and speech patterns (which were slightly more aggressive and high-pitched) from the pre-Ye album era. This is mainly because there are more available vocal samples from that period than from after. As a result, AI-generated vocals would always sound somewhat uniform—which is not the case in the new snippet.
Final Thoughts
I’d also like to address a common argument: “This doesn’t sound like Ye; Ye doesn’t sing like this.” Ironically, this would actually prove that an AI model wasn’t used. AI-generated vocals would follow Kanye’s usual singing patterns almost exactly. The model doesn’t adapt well to variations, such as singing in a higher pitch. Anyone can test this—the generated vocals would end up sounding horrible, almost like a screeching bird.
AI voice technology has clear limitations. If you’re not familiar with the topic (and many people in this subreddit claim they understand it but have probably never taken a single class on machine learning), you should educate yourself before forming opinions that have no basis in reality.
You’re ruining a rollout that, for once, is actually being done right by Ye. We should be happy that we got a new snippet, new merch, and an interview all within 24 hours.