i got very simple masking problem where i just dont find a working solution.

the black squares should be masked (and get a projection from a diff. layer - so i can`t mask the whole composition, right) while on the red square i would like to project another image which should react in its size to the audio.input -
the problem: if i mask the image or the layer itself the mask gets as well tranformed -
any ideas or i am already wrong with this approach?
cheers
(ps. sry i hope this question belongs in this category)