sure.... not out of the box and it may take some testing and searching...
I see there a combination of scale, position & delay. (if the clip in the middle is clip without effects)
Question is how many layers you need. I guess there is a elegant method with around 2-5 layers and the straight forward solution with a layer for each "lvl"
Laptop: XMG P507 // Intel i7-5500 / GTX-1060 / 1tb SSD / 32gb RAM // Touch OSC / Lemur/ APC-40
~self employed AV technician / Schu.VT@posteo.de