RTX A4000 - Dual GPU - Multiple output setup

Bro, does your rig even lift?
Post Reply
isprod.se
Met Resolume in a bar the other day
Posts: 12
Joined: Tue Nov 23, 2021 10:56

RTX A4000 - Dual GPU - Multiple output setup

Post by isprod.se »

The whole point of this post is to share some info regarding dual GPU for Resolume. Today the info is limited and hard to find and mostly people asking question don’t share their findings, and even though we can participate in any discussion often we don’t know how that certain project ended. Anyways :-)

I was in need of building 2 desktops to serve as a cheaper alternative to Barco eventmaster/Christie spyder system. Obviously Resolume is a great option for people on a budget. When working a lot alone as a one man company as many does, providing services both to end clients and as a sub-supplier and there for have a limited budget Arena is a bang for the buck software. Going the custom build-Resolume way still allow the use of the same machines for vMix, After Effects and all other software that will be used by any client.
2 units, one as the main and the other as a backup.

OS: WIN 10 (
Res: Arena 7 (7.8 atm)

Hardware:
mobo:
ASUS x299 WS SAGE 10G
cpu:
INTEL i9 10980XE
ram:
CORSAIR VENGEANCE LPX 32 (x2)
ssd:
SAMSUNG EVO 980 m.2 1TB (sys drive)
SAMSUNG EVO 970 m.2 2TB (content storage)
SAMSUNG QVO 970 2,5” 4TB (x2 RAID 1 - recording)
fans:
ALL NOCTUA
gpu:
PNY RTX A4000 (16x) (slot 1, gpu dedicated, bios assigned)
tb3:
ASUS THUNDERBOLT EX3 (4x) (slot 2) (can't get this to work properly)
gpu:
PNY RTX A4000 (16x) (slot 3) (got it running 16x now, but would prefer 8x so I can utilize PCIEX_4)
n/a:
empty slot (0x) (slot 4)
sdi:
DECKLINK QUAD 2 (8x) (slot 5)
sdi:
DECKLINK QUAD 2 (8x) (slot 6)
audio:
ESI MAYA 44 EX (1x) (slot 7)


CPU offer 48 lanes, mobo utilize 44 of these + DMI can have the last 4 CPU lanes assigned in bios directly toward the southbridge chipset for increased speed between south and northbridge.

MOBO LANES (added picture of lane appendix)

32 lanes directed to PLX chipset.
From PLX chipset 64 lanes to PCIE-bridge.
Mobo has 7 slots of 16x mechanically and offers electrical configs such as follows:

16x, 0x, 16x, 0x, 16x, 0x, 16x
16x, 8x, 16x, 0x, 16x, 0x, 8x (Could this work? this is not announced)
16x, 8x, 16x, 0x, 8x, 8x, 8x (Could this work? this is not announced)
16x, 8x, 8x, 8x, 16x, 0x, 8x (Could this work? this is not announced)
16x, 8x, 8x, 8x, 8x, 8x, 8x
Screenshot-2022-03-07-at-16.19.10.jpg
So far (and this is a disturbing issue atm) I have not managed to optimize the system as I want.
If I have PGU in PCIEX_1 + PCIEX_5 (this is the recommended setting from ASUS) then I only get the second gpu to run in 8x electrically. Beside all PCIEX_slots that are 8/0x are set at 0x and I can not do anything about it :-/
If I have the GPU PCIEX_1 + PCIEX_3 (here I can get both GPUs to operate in 16x electrically) and also here I can reach all my other PCIEX_slots. Here all other PCIEX_slots that are 8/0x seems to be in working order without issues. They are all set at 8x where connected cards uttulize lanes accordingly to their needs.
- I would love some imput from you guys regarding this :-)
Preferrably I would love to get the GPU going at 8x electrically in PCIEX_3 so I can utilize all other PCIEX_ports in the near future.
Also, no matter how I config atm, the tb3 card at slot 2 is just never found :-///

TECHNICAL DETAILS vs NEEDS
I built this system to be used as a main distribution unit prebuild in a case with monitors and all, it will partially be used as core infrastructure, there it needs a lot of inputs and outputs.
And as FOH most often use up to 3 screens for GUI we needed the extra synced outs (and in the end of the day one more GPU is by far more cheap then FX4, and even more versatile for these needs and workflow).
This ended with me adding the second GPU to make some testing and I will focus on that part shortly.

As you can see I’ve also added 2x decklink quad 2 card, providing 16 SDI connectors with bi-direction capability. But as I am not able sync the decklink outputs with my GPU (atm at least) SDI is only an option for comfort monitoring and outputs as such in no need of sync + SDI is perfect as an input source which I will need as well.
All monitoring requiring “proper” timing goes through the second GPU, so big LEDs, edge blending etc will get outputted on synched outputs.

On a sidenote:
I will try to get my hands on Quadro Sync II in the near future for both units, primarily to sync GPUs between the desktops for proper redundancy operations but also, hopefully (and a bit of wishful thinking haha) the sync card, as it can output ref signal could solve the timing issues on the SDIs. With that said, even if I don’t get better overall latency compared to the discrete outputs I hope to at least get them in sync.

To give some detailed insight.
When I start up Resolume and launch a clip with MM.SS.MS counter, open advanced output and add both quad 2 card as outputs (any of the cards actual ports) and If I have all monitors in a cluster with the clip running, and then I use my phone to snap a photo of all monitors/outputs to see latency.
These are my findings!

GPU 1 (PCIE_1) shows 00.00.100 GUI + Discrete
GPU 2 (PCIE_3) shows 00.00.100 Discrete
SDI 1 (PCIE_5) shows 00.00.200 ±20ms On any output
SDI 2 (PCIE_6) shows 00.00.320 ±20ms On any output
Please see attached pics
IMG_20220301_012052.jpg
IMG_20220301_012053.jpg
IMG_20220301_012057__01.jpg
IMG_20220301_014315.jpg
Anyways, let’s move on! :-)

I did a test using 1 GPU and then I did the same test with both GPUs installed.
The test used all 4 respectively 8 outputs. Numbers below are before reaching 30-31 fps in Resolume and all monitors where 1080p as well as outputs where 1080p60.
___________________________________________________________________

2 GPU mode:
GPU 1
2 monitor GUI, main + advanced output
2 monitor as discrete output
GPU 2
4 monitor as discrete output

(with advanced output open on Screen 2)
Res Benchmark 1080p 94 Layers (noise 75 layers)

(with advanced output closed on Screen 2 and monitor left unused)
Res Benchmark 1080p 100 Layers (noise 78 layers)

(This info bellow if from win task manager)
Dedicated Memory Usage GPU 1 USED
Dedicated Memory Usage GPU 2 N/A

Shared Memory Usage GPU 1 USED
Shared Memory Usage GPU 2 N/A

Render 3D Usage GPU 1 78-80%
Render 3D Usage GPU 2 34-41%

Render Copy Usage GPU 1 USED
Render Copy Usage GPU 2 USED
___________________________________________________________________

2 GPU mode with decklink 8 + 8 out: (decklink also set to 30fps remove of stutter)
GPU 1
2 monitor GUI, main + advanced output
2 monitor as discrete output
GPU 2
4 monitor as discrete output
DECKLINk 1
8 SDI output set 30fps
DECKLINK 2
8 SDI output set 30fps

(with advanced output open on Screen 2)
Res Benchmark 1080p 48 Layers (noise 31 layers)

(with advanced output closed on Screen 2 and monitor left unused)
Res Benchmark 1080p 55 Layers (noise 34 layers)

(This info bellow if from win task manager)
Dedicated Memory Usage GPU 1 USED
Dedicated Memory Usage GPU 2 N/A

Shared Memory Usage GPU 1 USED
Shared Memory Usage GPU 2 N/A

Render 3D Usage GPU 1 81-90%
Render 3D Usage GPU 2 34-41%

Render Copy Usage GPU 1 USED
Render Copy Usage GPU 2 USED


Also, to discuss the decklink furthermore, I did try to output on all SDIs on both card, above numbers where generally consistent throughout and I ended up feeding 16x SDI 1920x1080 FHD into a BM Multiview 16 + 2 outputs active on GPU 1 beside the GUI on first and second screen.

So beside the 2 GUI monitors during this test I had 18 outputs active, a total of 37 324 800 pxls not counting GUI monitors.
To simplify that would total to a resolution of 8640x4320 pxls
FHD 1080p benchmark came in around 51 layers (noise) before hitting the 30fps limit.

I can also mention that each cards 8 outputs where “almost” in sync. Please see attached pics of counter for reference.

1 card is basically 2 cards on 1 pcb. The cards internal routing is designed as a dual system to simplify fill and key according to the peeps at BM, could be so but this really fucks things up if you ask me!

The design of the outputs are:
Physical by design R, 1, 2, 3, 4, 5, 6, 7, 8
R, 1+2, 3+4, 5+6, 7+8 – in fill key mode.
but if you is all as direct the config is as follows R, 1, 5, 2, 6, 3, 7, 4, 8.

Conclusion:
Combining 2 GPUs are possible and it works great, but keep in mind, processing seem to be done on the main GPU. Second GPU is only giving the additional optional outputs.
An in regard to this the choice of adding a GPU or simply to add an FX4 is really up to the user, their workflow and last but not least – what they can fit into their system + budget allowance.

As I described I went the GPU way, as it will give me more use considering workflow for this setup. To compare, with an additional GPU I have 4 outs.
For a mayor LED I can use 4K from GPU directly as 1 output to sending card as well as I can use other outs for FHD or other resolutions etc. Therefor this give me more options and flexibility compared to an FX4 and for a lesser cost as well. And no matter how I use my extra outs, they are always in sync and on the same frame as in GUI during real time render.
Considering the load on GPU isn’t all too much different from when outputting toward the decklink I would say that adding another GPU for outputs is like adding a video playback card such as decklink for the same reason.

Playback card has benefit that is isn’t sending unless assigned to send, unlike GPU discrete output which is sending windows but the timing is off.
GPU supports real time render but if Resolume crash you’ll have a desktop shown on your big screens :-P

As I only tried combining RTX A series GPUs, formerly known as Quadro I cannot speak for other combinations. 2 RTX A worked like a charm no doubt, and with much less issues then I originally thought.
Biggest issues is still with lanes as dual GPU on this mobo want to interfere with other pcie ports lane config.
For gamers RTX or the older GTX series I don’t know, as I did not try I cannot say.
If any of you guys are from Sweden, Stockholm and has an NVS810 I would love to meet up to make some testing and share on the forum, that and/or any other card for that matter :-)

Eledtech
Is taking Resolume on a second date
Posts: 20
Joined: Tue Apr 09, 2019 21:22

Re: RTX A4000 - Dual GPU - Multiple output setup

Post by Eledtech »

very great review.
can u test:
6 x 4k dedicated outputs on both a4000 cards in a 11520x4320 canvas?
latency ?
how many 4k files / layers scalef up?
Visual and led designer at eledtech and aseven night club/Berlin.

isprod.se
Met Resolume in a bar the other day
Posts: 12
Joined: Tue Nov 23, 2021 10:56

Re: RTX A4000 - Dual GPU - Multiple output setup

Post by isprod.se »

Eledtech wrote: Thu Mar 17, 2022 22:47 very great review.
can u test:
6 x 4k dedicated outputs on both a4000 cards in a 11520x4320 canvas?
latency ?
how many 4k files / layers scalef up?
Might be able to try next week, waiting for more 4K monitors for my station atm, But I dont have acces to 8 which would be needed 2 gui + 6, but I will check with a collegue and see what I can do :-)

But, as I dont have quadro sync card atm I can not sync both gpu outputs. Each vard can sync their own outputs atm. But I never use more then 4 from gpu so not critical.
Latency from gpu have never been an issue for me no matter how hard I push it, only SDI is affected to my knowledge.

so 11520x4320 = 49 766 400 pxl, thats about 12 000 000 more then I pushed before but on a lot less outputs.
Would you be able to provide a sample dxv3 file with this size for me to use?

mronis
Met Resolume in a bar the other day
Posts: 6
Joined: Tue Jun 14, 2022 23:36

Re: RTX A4000 - Dual GPU - Multiple output setup

Post by mronis »

Based on your experience would you still use two GPUS or the SDI cards?

We are looking to build a 5 output UHD system.

I was thinking dual A6000s and two 8K Blackmagic cards. Trying to avoid needing a synced second system.

Matze
Met Resolume in a bar the other day
Posts: 1
Joined: Sun Sep 25, 2022 20:09

Re: RTX A4000 - Dual GPU - Multiple output setup

Post by Matze »

Dual A6000 makes no sense, since calculations are still on one card (where your control monitor with Resolume is connected). The second A6000 will only playout. You can achive this cheaper if you combine A6000 with A4000.

User avatar
Arvol
Might as well join the team
Posts: 2768
Joined: Thu Jun 18, 2015 17:36
Location: Oklahoma, USA

Re: RTX A4000 - Dual GPU - Multiple output setup

Post by Arvol »

AJA Corvid line is a good option for sync'd outputs while using a beefy GPU for your rendering.

Post Reply