News | Forum | People | FAQ | Links | Search | Register | Log in
Super Fast Quakespasm Fork
So, I had this in my Twitter feed this morning. It seems the bulk of the rendering code has been pushed over to the GPU:

https://twitter.com/andrei_drexler/status/1457410987011756041

Having tried it myself on my CPU limited PC the speed difference is a huge hundreds of FPS gain.

AD_Tears has never been so smooth. :)
 
Can We Expect A Version For OSX? 
What about a version for Macs? 
 
If I'm getting a "GL context not found" I assume my specs are outdated? 
 
Someone will eventually make a map big enough to make this fork run below 30 fps too. 
 
Very pleased about this even if I cannot take advantage. Given the modest binary size, hopefully down the line all QS forks can merge with backend being set via cfg. Sticking with QSS for now 
Absolutely Awesome 
I just tried this with ad_tears and omg is it a huge difference. Instead of ~60 fps in outside areas I'm now getting super smooth 400 (!).

This is some pure black magic that Andrei Drexler has done. If there are still some issues with this fork (didn't had any problems while playing through Tears) I hope he'll be able to eliminate them and release a proper full version in the future. Once that has happened this will be my preferred source port for sure. 
To Those With The Knowledge 
What are the chances of this rendering stuff ending up in QSS or vanilla QS?

Just curious on people's thoughts. 
 
It would need a major rewrite. This stuff isn't a simple enable/disable, or any kind of easy trickery like that. It's a total gutting and rework of the renderer, effectively akin to moving it to a different API.

There is scope for renderer improvements in vanilla QS that wouldn't require such a drastic overhaul. The author of this fork has observed that QS was spending 50% of it's time doing frustum culling on large maps, but that's actually a quirk inherited from FitzQuake which frustum-culled individual surfaces rather than using the hierarchical BSP tree. Using the BSP tree, if a parent node is culled then all of it's child nodes are also automatically culled, which offers a significant optimization. Of course that also means that CPU time needs to be spent on traversing the BSP tree, and it's less SOA-friendly, but the point remains - the heavy frustum culling overhead in vanilla QS was not something that was in original GLQuake, it was due to a change it inherited from FitzQuake.

So that's something that's not as clear-cut as it might seem on the surface, and I'd definitely like to see a comparison benchmark of that code with code that uses the original BSP tree and does proper surface batching before saying anything definite.

Vanilla QS also has ample room for improvement in how it handles sky and water surfaces. These are largely left at the "brute-force it in software" approach of FitzQuake, and both performance and quality could be massively improved by implementing some shaders for them.

Looking over this code, the author also seems to have done some things similar to what I was doing when I was originally exploring Direct3D 11 with the old DirectQ engine. So he's essentially just taken the old CPU lightmap update code, moved it to a compute shader, and called it done, for example. This is actually a mistake, and there's a much simpler way of handling lightstyle animations that's basically staring you in the face, which doesn't need any lightmap updates at all, and which has even more bits of precision. That's just one example of where even this code needs further work.

So the points for comparison with vanilla QS are:

- Get a meaningful benchmark of frustum culling overhead when actually using the BSP tree to accelerate it.

- Resolve the CPU overhead of sky and water surfaces, getting better performance and better quality.

- Address issues with straight ports of existing CPU code to GPU compute by exploring alternative approaches for e.g. lightstyle animations. 
 
there's a much simpler way of handling lightstyle animations that's basically staring you in the face, which doesn't need any lightmap updates at all, and which has even more bits of precision.

I'm curious to know how this would work. 
I'm Curious Too. 
I'm assuming it would be to upload each lightstyle map separately (or at least, separate UV space in the atlas) and then somehow additively combine them when rendering.

(I also had the idea of 4 light styles = 4 channels in an RBGA texture, but, this breaks down if you want colored lighting.) 
O Hai 
Author here.

@Hipnotic Rogue: Good to hear it's working well for you, and thanks for the link.

Re: getting this code in QS and QSS, well... vanilla QS, I'd say highly unlikely. QSS is still unlikely, but maybe less so (Spike was looking into it, I think).

@metlslime: Thanks for the link. Unfortunately, there's a small bug in that initial release (dynamic lights had artifacts on AMD), so I'd strongly recommend getting the following one instead.

@Barnak: Unfortunately, MacOS isn't supported because Apple deprecated OpenGL and stopped updating it around version 4.1, which lacks compute shaders.

@oh, that person: That's probably it, sorry. What GPU/OS are you using? In any case, the next release will have slightly better error messages for unsupported systems.

@mankrip: I look forward to the day someone makes that map because then I'll be able to justify spending more time on optimizations. In the meantime, people can play not just ad_tears, but even TerShib the way their authors intended.

As for the lightmaps - you could have 4 textures instead of 1 (or one with 4 layers, or an atlas), one for each lightstyle a face could have. RGB would hold the color, alpha would be the lightstyle index. Since that index is constant per face, you can still interpolate the texels, then use the (interpolated) alpha value to find the light style index and the corresponding scaling factor for the (interpolated) RGB. Esentially, doing a just-in-time lightmap update with bilinear filtering without storing the result in an intermediate texture.

@knew: It would be nice if all the QS forks got merged, but I very much doubt it'll happen. Too many cooks and all.

@Berzerk2k2: Glad to hear that! If you do find issues with it, please let me know on github.

@mh: As much as I appreciate the simplicity and the extra precision of the method you're alluding to, I've actually implemented it and it's a bit slower on a GTX 1060 at 1080p (by ~9% on ad_zendar and ad_sepulcher, ~7% on ad_tears and vanilla demo1, and ~5% on shib8). I might still try a few tweaks, but I'm not expecting a reversal - the compute shader only runs at 10 Hz, and only on tiles that have lightstyles with changes. That being said, I'm still open to suggestions.

Re: BSP traversal - I think Novum tried it for vkQuake and concluded it was slower, but I am curious to see how other attempts might fare. 
 
What I favour about my lightstyle method is that it scales to any arbitrary complexity with absolutely no performance difference: you could even have 4 animated lightstyles on every surface in a map, implement lightstyle interpolation, and it would run at the exact same rate as if the styles were static. Even if it did turn out to be a few percent slower I consider all of that a fair tradeoff.

The simple version for luminance lightmaps needs one RGBA texture with the styles stored in each channel; it's just a dp4 of the texture lookup with the style values to get the end result.

The full version for coloured light repeats this three times, once for each of the RGB colours. That's three texture lookups which is where most of the performance overhead comes from.

There's a potential optimization path for surfaces with one style where the lightmap can be stored as a single conventional texture. It's some moderate extra complexity and does need a shader change too.

The whole thing is then represented as an array texture, with surf->lightmaptexturenum giving the array slice to use. d_lightstylevalue needs to go to the GPU each frame, but that's just 64 floats in a UBO. There's also a vertex texture lookup to read out the style values.

I believe the Kex engine in the 2021 remaster does something similar, but with 4 textures instead of 3. The trick is to realise you can do it with 3 if you use the colour channels to store styles rather than conventional colours. 
Why 32 Bit Only? 
Unlikely someone using minimum sys RQ described working on 32 bit system?
And yes, performance improvement very impressive. It more then twice faster vulkan on my system. And what else interesting - performance not rising with lowering resolution on any QS, QSS, VkQ. 
 
Don't really have much in way of checking fps ect, but yeah definitely runs smoother on anything I've thrown at it.

Cheeky I know but, dreaming of a arm port. :D 
 
Funny how after so many years and so many source ports, many focused on "modern", there's always another one pushing the engine further than anyone else did.

Congrats! 
You must be logged in to post in this thread.
Website copyright © 2002-2021 John Fitzgibbons. All posts are copyright their respective authors.