Skip to content
  • Content »
  • Kernfusion »
  • Fuses »
  • Danell »

FastGlow

  • [FastGlow.fuse](FastGlow.fuse

In my Glow fuse I'm using a way to quickly create a gaussian blur using 3 box blurs and doing 9 different sizes of blurs. In total this becomes 54 calls to the DCTL kernel (+ some more for other calculations).

When the kernel is done running the image moves from the GPU to CPU/RAM. In my tests this is the biggest bottle neck, making the fuse waaay slower than if everything would be done in one kernel (6fps vs 24fps)

FastGlow runs at about 4 frames / sec for Danell; 2.1 secs/frame on nmbr73's 2019 MBP (Core i9, Radeon 560X, see discord post); said to be slow on M1 too.

If any DCTL guru knows a way to leave the created image in the GPUs memory to be accessed again, this would be a game changer for fuses with many kernel calls. I tried to call _tex2DVec4Write 27 times in one kernel and the FPS only went down from 24fps to 22fps.

Going thought the Rays.fuse I'm seeing a couple of functions I haven't seen really anywhere else:
- Inside the DCTL I see make_intensity(float4, compOrder). I also found it being used in LearnNowFX's fuse Long Shadow. Do you know what it does?
- Next I see in the processing: node:SetGlobalSize(math.ceil(numRays / 128) * 128) and node:SetWorkSize(128). Does anyone know what these do?

Here you can find the source code :)
https://www.steakunderwater.com/wesuckless/viewtopic.php?t=5485