Isn't That Spatial?
After 80 years of experiencing music in a stereo field, it is unlikely that the ear would quickly adapt to a new audio milieu.
The ear evolved to detect spatial cues. Sounds are perceived by the ear in order to locate them. If you tap your right hand on the table, then the left, you know where the sound is coming from. This is distinctly different from a recording of those two sounds played directly into the ear canal.
Recorded music is always artifice, and VR will not necessarily make it seem more real, although it can add realism where it might be needed to be less fake or to provoke empathy.
Ideally, if music exists in a VR space it must be presented in a virtualized stereo space within the VR environment, such as a stereo speaker pair that exists in the scene. Seeing a "live" performance in VR is really not much different than stereo, and is in actuality, more of a mono experience. Musicians on a stage aren't that far apart from one another to have any appreciable spatial effect.
It is not necessary to have spatialization for a band on a stage, with separate emitters for each band member. In VR it might be interesting to have sounds coming from behind, or over the shoulder.
I have always liked earbuds that have lots of leakage because they create a figure/ground experience. Some environments, especially urban areas, have lots of keynotes created by cars and trucks (horns, brakes, etc.) that become ground to what's playing in the earbuds. Binaural recordings are especially interesting because they are spatialized stereo that don't require artificial placement of emitters to square visual and audio clues.
ASMR (Autonomous Sensory Meridian Response) is an aspect of binaural recording that activates the nerves in the proximity of the ear, causing a tingling feeling, such as the effect of a whisper in the ear. Binaural recording in music has interesting applications such as having background vocalists record in an over-the-shoulder shot, rather than hard-panning in a stereo field. But if the music is good, any recording technology should suffice. (Music memory doesn't include it anyway--an earworm is an earworm in the space of your head.)