sure.... not out of the box and it may take some testing and searching...
I see there a combination of scale, position & delay. (if the clip in the middle is clip without effects)
Question is how many layers you need. I guess there is a elegant method with around 2-5 layers and the straight forward solution with a layer for each "lvl"
Laptop: XMG P507 // Intel i7-5500 / GTX-1060 / 1tb SSD / 32gb RAM // Lemur / BirdDog Studio NDI
~self employed AV technician / Schu.VT|a|posteo.de~