Digital Breath

 



It's interesting how AI-generated vocals phrase melodies, and determine where to put in breaths. It's not necessary for AI singers to actually breathe, but melodic phrases have breaths built into them.

Breaths are built into the logic of musical phrasing. When you're writing string parts you don't have to be concerned about where the players are going to put in breaths, but it is necessary for wind instruments and vocal music.

In a recent piece, Holy Smoker, based on a poem from the late 90s, AI made an interesting phrase interpretation that spanned three lines (breadth) and over the break between two verses, which is pretty brilliant. Obviously it's a mistake that AI made because typically it is making symmetrical assessments based on the underlying stems that it's using for the main riff--so it's a nice happy accident. I doubt a human singer would do it that way, and if they did, they'd be very creative in their interpretation of the lyric. I never think about it that way when I'm writing an oboe part for example, other than thinking a slur is required, but players can put in breaths as they see fit, which makes it more expressive for them rather than someone telling them how to breathe.

From the book Sentics by Manfred Clynes: "A good musician, however, feels and experiences the' expres­siveness of the rhythm and will--unless constrained by faulty aca­demic teaching--be faithful to that inner form. A musician may indeed rely on that form with greater trust than on his ability to re­ produce mechanical subdivisions of time. (By "mechanical" we mean here arbitrary subdivisions that have no biological counterpart in terms of essentic form programming.)"

"Discuss musical phrasing"  

The subject phrase:

Tell us all the stories
Show us what is real
Kneel before the altar 
[To grovel for the meal 
___________________
Let the light shine
We're lost in a daze]


AI prefers symmetry and does well with 4-line verses and choruses--obviously because pop music is based on 2s, 4s and 8s. It can't suss 3, 5 or 7--or even 6, like 6/8 or 12/8. The phrase "Oranges are sweeter than apples in May" naturally falls into a triple meter because that's how it is spoken, although you could sing it in 4/4.

I've always written minimal economical lyrics, sung in plainsong with one syllable per note--probably because of my predominant harmonic and rhythmic approach to music, so I"m of two minds about loose phrasing in pop songs. Musicians who have a predominant melodic approach would tend to disregard symmetry and glide above the music and disregard bar lines. But if you play a rhythmic instrument (drums, bass, guitar, keyboard) you're mostly locked to them. This is why I found that one phrase from Holy Smoker so interesting. The verse crossing was brilliant. Since they don't have to consider breaths, they can do things like this. A human vocalist could replicate it but would AI? Probably not because TTS voices don't have a style. Styles are unique to humans precisely because of our ability to express them in the flow of life and not have to be "trained" to have style.   

Humans generally opt for making things easier to sing and we are unique in how we pronounce words and how we create phrases. Some words are clunky, both for humans and TTS voices. There are some words it can't pronounce and sometimes you have to convert them to algospeak--like the word "Rael", which I had to spell "Rayelle". It could not pronounce "quaternity"--no matter how many times I tried--or "cafe late", which had to be spelled "cafe latay", which is interesting because I didn't have to use the algo equivalent for cafe as "cafay".
 

Comments

Popular Posts