OK let's edit with a better answer/explain.
-> I'm not trying to convince anybody or to force you to use the KZ Drivers
-> I made few algo for the KZ Drivers with the help of Internal SDK Crytal Engine, CryEngine, UDK and NVapi SDK
-> Because i needed to read more far <-> i use a disassembler to check every number/entry from the DLL Library that was use on the Official Driver to improve the execution
-> Surely it can't not work for everybody as i don't know how every GPU works
-> I did my best to improve the performance on every GPU including the mobile version
-> Yes Nvidia take my work to be use on the 314.21/314.22
-> You should read more what they said on the 314.21/314.22 release notes (OpenCL/DirectCompute and others stuff was never mentioned before)
-> Go ask Nvidia a answer, they will tell you nothing about the FP16 vs FP64
I'm not your enemy, i'm just trying to help you to use on a better way your GPU, sorry that my voice are not enough loud as a big company like Nvidia, if Nvidia tell you something you will believe, if a human alone tell you something, you don't believe because it look like weird (human being <-> never listen the stranger)
How many people thanks me in the past for releasing the GTA IV Patch, the Half Life 2 Patch for Geforce FX and Co, Shaderpak recompiled for Crysis, the Spirit Drivers that Guru3D users in the past was happy to use
my help to DriverHeaven etc...
Back to the thread, what i wanna to say it's simple.
What you want to see ? better performance, good graphic or worst performance, good graphice
For sure you will choose the first option
Algorithm can be always improve when you use different type of float point, so i made my own code using float2half (that is also available on cuda) "half-point" to add more performance and have a better execution on the Nvidia GPU.
You should know that Nvidia GPU support float and double precision , a half-point that is not native can only be use by using a wrapper/conversion
Why use a half over the float (just to let you know, half = half precision, float = full precision double = double precision)
So imagin a scene that have 100 objects, 10 personnage, 1 plane that use many poly (well actually we will use low poly to use less poly during a heavy scene)
Example result 100fps with a FP64 render scene will result a 150fps with a FP32 and 200fps with a FP16
To have this result i did some truncating for OGL -> FP64toFP16 - FP32toFP16(GLfloat val) and a similar method for DX (don't forget that you also 16x4 = 64 -> 16x2 = 32)
so i limit the number that we need to force a faster rendering (less detailled object, etc... but the object it's still the same object, it's also difficult to see a real difference without zoom to the texture)
You can also use variable compensation by duplicating the precision that result of a non difference between FP32 and FP16
Certainly take a lot of work by doing this for me but in the end, it's good to improve the execution.
It's too bad that the dev doesn't take more time to code on a better way by using a mainstream process to work on every type of GPU (i saw that UE4 use this, well see, it also use old partial code for the illumination light)
Sorry if i disturb anyone, just want to help... but don't worry now, after the menace i receive from Nvidia, i will stop the KZ drivers (it was not the first time)
So now you can understand how the Nvidia GPU have more fps than the AMD GPU on the same scene (one is using the FP16 when the other one use FP64) you can clearly see the difference
if you look closer the texture, shader, etc...
-> I'm not trying to convince anybody or to force you to use the KZ Drivers
-> I made few algo for the KZ Drivers with the help of Internal SDK Crytal Engine, CryEngine, UDK and NVapi SDK
-> Because i needed to read more far <-> i use a disassembler to check every number/entry from the DLL Library that was use on the Official Driver to improve the execution
-> Surely it can't not work for everybody as i don't know how every GPU works
-> I did my best to improve the performance on every GPU including the mobile version
-> Yes Nvidia take my work to be use on the 314.21/314.22
-> You should read more what they said on the 314.21/314.22 release notes (OpenCL/DirectCompute and others stuff was never mentioned before)
-> Go ask Nvidia a answer, they will tell you nothing about the FP16 vs FP64
I'm not your enemy, i'm just trying to help you to use on a better way your GPU, sorry that my voice are not enough loud as a big company like Nvidia, if Nvidia tell you something you will believe, if a human alone tell you something, you don't believe because it look like weird (human being <-> never listen the stranger)
How many people thanks me in the past for releasing the GTA IV Patch, the Half Life 2 Patch for Geforce FX and Co, Shaderpak recompiled for Crysis, the Spirit Drivers that Guru3D users in the past was happy to use
my help to DriverHeaven etc...
Back to the thread, what i wanna to say it's simple.
What you want to see ? better performance, good graphic or worst performance, good graphice
For sure you will choose the first option
Algorithm can be always improve when you use different type of float point, so i made my own code using float2half (that is also available on cuda) "half-point" to add more performance and have a better execution on the Nvidia GPU.
You should know that Nvidia GPU support float and double precision , a half-point that is not native can only be use by using a wrapper/conversion
Why use a half over the float (just to let you know, half = half precision, float = full precision double = double precision)
So imagin a scene that have 100 objects, 10 personnage, 1 plane that use many poly (well actually we will use low poly to use less poly during a heavy scene)
Example result 100fps with a FP64 render scene will result a 150fps with a FP32 and 200fps with a FP16
To have this result i did some truncating for OGL -> FP64toFP16 - FP32toFP16(GLfloat val) and a similar method for DX (don't forget that you also 16x4 = 64 -> 16x2 = 32)
so i limit the number that we need to force a faster rendering (less detailled object, etc... but the object it's still the same object, it's also difficult to see a real difference without zoom to the texture)
You can also use variable compensation by duplicating the precision that result of a non difference between FP32 and FP16
Certainly take a lot of work by doing this for me but in the end, it's good to improve the execution.
It's too bad that the dev doesn't take more time to code on a better way by using a mainstream process to work on every type of GPU (i saw that UE4 use this, well see, it also use old partial code for the illumination light)
Sorry if i disturb anyone, just want to help... but don't worry now, after the menace i receive from Nvidia, i will stop the KZ drivers (it was not the first time)
So now you can understand how the Nvidia GPU have more fps than the AMD GPU on the same scene (one is using the FP16 when the other one use FP64) you can clearly see the difference
if you look closer the texture, shader, etc...
Last edited: