Func_Msgboard: Ironwail -- A Super Fast Quakespasm Fork

Ironwail -- A Super Fast Quakespasm Fork

Posted by Hipnotic Rogue on 2021/11/14 20:44:36

So, I had this in my Twitter feed this morning. It seems the bulk of the rendering code has been pushed over to the GPU:

https://twitter.com/andrei_drexler/status/1457410987011756041

Having tried it myself on my CPU limited PC the speed difference is a huge hundreds of FPS gain.

AD_Tears has never been so smooth. :)

First | Previous | Next | Last

#1 posted by metlslime on 2021/11/14 20:55:45

GitHub here with system requirements —

https://github.com/andrei-drexler/quakespasm/releases/tag/quakespasm-0.94.2-gl4-test

Can We Expect A Version For OSX?

#2 posted by Barnak on 2021/11/14 21:27:08

What about a version for Macs?

#3 posted by oh, that person on 2021/11/15 12:08:01

If I'm getting a "GL context not found" I assume my specs are outdated?

#4 posted by mankrip on 2021/11/15 12:27:32

Someone will eventually make a map big enough to make this fork run below 30 fps too.

#5 posted by knew on 2021/11/15 19:23:06

Very pleased about this even if I cannot take advantage. Given the modest binary size, hopefully down the line all QS forks can merge with backend being set via cfg. Sticking with QSS for now

Absolutely Awesome

#6 posted by Berzerk2k2 on 2021/11/16 19:43:02

I just tried this with ad_tears and omg is it a huge difference. Instead of ~60 fps in outside areas I'm now getting super smooth 400 (!).

This is some pure black magic that Andrei Drexler has done. If there are still some issues with this fork (didn't had any problems while playing through Tears) I hope he'll be able to eliminate them and release a proper full version in the future. Once that has happened this will be my preferred source port for sure.

To Those With The Knowledge

#7 posted by Hipnotic Rogue on 2021/11/19 15:04:50

What are the chances of this rendering stuff ending up in QSS or vanilla QS?

Just curious on people's thoughts.

#8 posted by mh on 2021/11/19 15:26:23

It would need a major rewrite. This stuff isn't a simple enable/disable, or any kind of easy trickery like that. It's a total gutting and rework of the renderer, effectively akin to moving it to a different API.

There is scope for renderer improvements in vanilla QS that wouldn't require such a drastic overhaul. The author of this fork has observed that QS was spending 50% of it's time doing frustum culling on large maps, but that's actually a quirk inherited from FitzQuake which frustum-culled individual surfaces rather than using the hierarchical BSP tree. Using the BSP tree, if a parent node is culled then all of it's child nodes are also automatically culled, which offers a significant optimization. Of course that also means that CPU time needs to be spent on traversing the BSP tree, and it's less SOA-friendly, but the point remains - the heavy frustum culling overhead in vanilla QS was not something that was in original GLQuake, it was due to a change it inherited from FitzQuake.

So that's something that's not as clear-cut as it might seem on the surface, and I'd definitely like to see a comparison benchmark of that code with code that uses the original BSP tree and does proper surface batching before saying anything definite.

Vanilla QS also has ample room for improvement in how it handles sky and water surfaces. These are largely left at the "brute-force it in software" approach of FitzQuake, and both performance and quality could be massively improved by implementing some shaders for them.

Looking over this code, the author also seems to have done some things similar to what I was doing when I was originally exploring Direct3D 11 with the old DirectQ engine. So he's essentially just taken the old CPU lightmap update code, moved it to a compute shader, and called it done, for example. This is actually a mistake, and there's a much simpler way of handling lightstyle animations that's basically staring you in the face, which doesn't need any lightmap updates at all, and which has even more bits of precision. That's just one example of where even this code needs further work.

So the points for comparison with vanilla QS are:

- Get a meaningful benchmark of frustum culling overhead when actually using the BSP tree to accelerate it.

- Resolve the CPU overhead of sky and water surfaces, getting better performance and better quality.

- Address issues with straight ports of existing CPU code to GPU compute by exploring alternative approaches for e.g. lightstyle animations.

#9 posted by mankrip on 2021/11/19 19:25:44

there's a much simpler way of handling lightstyle animations that's basically staring you in the face, which doesn't need any lightmap updates at all, and which has even more bits of precision.

I'm curious to know how this would work.

I'm Curious Too.

#10 posted by metlslime on 2021/11/19 21:16:33

I'm assuming it would be to upload each lightstyle map separately (or at least, separate UV space in the atlas) and then somehow additively combine them when rendering.

(I also had the idea of 4 light styles = 4 channels in an RBGA texture, but, this breaks down if you want colored lighting.)

O Hai

#11 posted by Makro on 2021/11/19 22:05:43

Author here.

@Hipnotic Rogue: Good to hear it's working well for you, and thanks for the link.

Re: getting this code in QS and QSS, well... vanilla QS, I'd say highly unlikely. QSS is still unlikely, but maybe less so (Spike was looking into it, I think).

@metlslime: Thanks for the link. Unfortunately, there's a small bug in that initial release (dynamic lights had artifacts on AMD), so I'd strongly recommend getting the following one instead.

@Barnak: Unfortunately, MacOS isn't supported because Apple deprecated OpenGL and stopped updating it around version 4.1, which lacks compute shaders.

@oh, that person: That's probably it, sorry. What GPU/OS are you using? In any case, the next release will have slightly better error messages for unsupported systems.

@mankrip: I look forward to the day someone makes that map because then I'll be able to justify spending more time on optimizations. In the meantime, people can play not just ad_tears, but even TerShib the way their authors intended.

As for the lightmaps - you could have 4 textures instead of 1 (or one with 4 layers, or an atlas), one for each lightstyle a face could have. RGB would hold the color, alpha would be the lightstyle index. Since that index is constant per face, you can still interpolate the texels, then use the (interpolated) alpha value to find the light style index and the corresponding scaling factor for the (interpolated) RGB. Esentially, doing a just-in-time lightmap update with bilinear filtering without storing the result in an intermediate texture.

@knew: It would be nice if all the QS forks got merged, but I very much doubt it'll happen. Too many cooks and all.

@Berzerk2k2: Glad to hear that! If you do find issues with it, please let me know on github.

@mh: As much as I appreciate the simplicity and the extra precision of the method you're alluding to, I've actually implemented it and it's a bit slower on a GTX 1060 at 1080p (by ~9% on ad_zendar and ad_sepulcher, ~7% on ad_tears and vanilla demo1, and ~5% on shib8). I might still try a few tweaks, but I'm not expecting a reversal - the compute shader only runs at 10 Hz, and only on tiles that have lightstyles with changes. That being said, I'm still open to suggestions.

Re: BSP traversal - I think Novum tried it for vkQuake and concluded it was slower, but I am curious to see how other attempts might fare.

#12 posted by mh on 2021/11/20 06:50:26

What I favour about my lightstyle method is that it scales to any arbitrary complexity with absolutely no performance difference: you could even have 4 animated lightstyles on every surface in a map, implement lightstyle interpolation, and it would run at the exact same rate as if the styles were static. Even if it did turn out to be a few percent slower I consider all of that a fair tradeoff.

The simple version for luminance lightmaps needs one RGBA texture with the styles stored in each channel; it's just a dp4 of the texture lookup with the style values to get the end result.

The full version for coloured light repeats this three times, once for each of the RGB colours. That's three texture lookups which is where most of the performance overhead comes from.

There's a potential optimization path for surfaces with one style where the lightmap can be stored as a single conventional texture. It's some moderate extra complexity and does need a shader change too.

The whole thing is then represented as an array texture, with surf->lightmaptexturenum giving the array slice to use. d_lightstylevalue needs to go to the GPU each frame, but that's just 64 floats in a UBO. There's also a vertex texture lookup to read out the style values.

I believe the Kex engine in the 2021 remaster does something similar, but with 4 textures instead of 3. The trick is to realise you can do it with 3 if you use the colour channels to store styles rather than conventional colours.

Why 32 Bit Only?

#13 posted by CV on 2021/11/20 07:46:51

Unlikely someone using minimum sys RQ described working on 32 bit system?
And yes, performance improvement very impressive. It more then twice faster vulkan on my system. And what else interesting - performance not rising with lowering resolution on any QS, QSS, VkQ.

#14 posted by SgtCrispy on 2021/11/20 16:54:09

Don't really have much in way of checking fps ect, but yeah definitely runs smoother on anything I've thrown at it.

Cheeky I know but, dreaming of a arm port. :D

#15 posted by 8yn1c on 2021/11/21 02:29:36

Funny how after so many years and so many source ports, many focused on "modern", there's always another one pushing the engine further than anyone else did.

Congrats!

QS 0.94.3

#16 posted by NightFright on 2021/12/05 17:08:25

Can the next build already be based on the new QS 0.94.3 release?

Besides that, this is cool, even though I am missing zip/pk3 support from QSS. It just saves a lot of disk space since pak is basically w/o compression.

New Version 0.3.0:

#17 posted by metlslime on 2022/01/12 00:19:50

Also it's now called "Ironwail"

https://github.com/andrei-drexler/quakespasm/releases

* tentative new name; too edgy?
* added fallback path for slightly older GPUs (see updated requirements below)
* more CPU & GPU rendering optimizations (e.g. +~20% on ad_tears)
* more loading time optimizations for complex maps
* new software emulation mode (r_softemu 0=off / 1=subtle / 2=balanced / 3=raw)
* support for lightmapped liquids (r_litwater 0/1)
* fixed various vid_restart issues and simplified it to be practically instant
* changed vid_vsync, vid_fsaa and gl_texture_anisotropy to apply immediately (no vid_restart required)
* fixed various stability and Intel compatibility issues (big thank you to @j4reporting for all the testing!)
* added lightstyle interpolation for reduced eyestrain (r_lerplightstyles 0/1/2) - thanks, @mhQuake!
* added more options to the video menu
* r_scale > 4 now allowed on high-res screens
* fixed 'texture not aligned to 16 pixels' crash in mg1/horde2
* added code to handle sky textures with non-standard sizes and warn about them (e.g. ad_tears)
* updated SDL2 to 2.0.18

Now My Port Of Choice

#18 posted by NightFright on 2022/01/12 00:27:29

Among all Quakespasm variants out there. Once it gets pk3 support as well, it's going to fit my needs just fine.

Very Nice!

#19 posted by Placop on 2022/01/12 11:44:49

Good work there, it is certainly my port of choice for huge mods/maps!
The only thing I'm missing in ad_tears are the advanced HUDs :P
Do you plan on adding Spiked features like those or advanced particle effects, like vkQuake is doing?

Also, I'm not sure if that's my config messing up or if it's a bug but when I load a huge map on startup (eg -game tershibboleth +map shib6), I am missing a lot of terrain geometry with monsters and pickups floating in the air.
A "restart" in the console (or quicksave+quickload) fixes that

Quake 2022 Video

#20 posted by dumptruck_ds on 2022/01/12 18:07:15

I used the previous version to record the cool fly thru videos in the recent preview video. Super handy to have the compatible protocol by default and buttery smooth when recording.

Was Surprised

#21 posted by NightFright on 2022/01/12 19:38:20

To see lightmapped liquids to be on by default, but seems it doesn't cause any issue in maps that don't support it. When it does, it's a really neat effect you quickly don't want to miss any more.

I am not sure if it's a good idea to have the z-fighting hack on by default as well since afaik it has a few downsides (e.g. might reveal secrets due to visible wall seams), but it's only a factor in classic maps, anyway.

#22 posted by oglerau on 2022/01/19 16:58:07

how many entities is the engine able to handle at the same time? would an extreme horde mode (1000+) be possible with the engine?

#10

#23 posted by Kinn on 2022/01/19 18:53:12

(I also had the idea of 4 light styles = 4 channels in an RBGA texture, but, this breaks down if you want colored lighting.)

In my top secret guake-not-guake project I do exactly this, and the light colour is implicit in the lightstyle. So, colour is tied to style, which is less freedom, but if you're designing a game from scratch around the idea, it's not a problem.

#23

#24 posted by mh on 2022/01/19 21:53:14

It works with 3 textures though, one for each "real" colour channel.

#24 - Absolutely

#25 posted by Kinn on 2022/01/20 09:09:46

I was just trying to be economical. For my purposes it works well.

First | Previous | Next | Last

You must be logged in to post in this thread.