ulf.schroeter Posted August 13, 2011 Posted August 13, 2011 Proposal Usage of SS2 for increased double precision 3D transform performance. This INTEL article has details and sample code.
ulf.schroeter Posted October 29, 2011 Author Posted October 29, 2011 really, hasn't been mentioned in to realease notes. Anyway good news !
ulf.schroeter Posted October 29, 2011 Author Posted October 29, 2011 Found double precision SSE2 code, but just a question: no SSE2 acceleration for matrix * vector multiplication e.g. dvec4 &mul(dvec4 &ret,const dmat4 &m,const dvec4 &v) { ret.x = m.m00 * v.x + m.m01 * v.y + m.m02 * v.z + m.m03 * v.w; ret.y = m.m10 * v.x + m.m11 * v.y + m.m12 * v.z + m.m13 * v.w; ret.z = m.m20 * v.x + m.m21 * v.y + m.m22 * v.z + m.m23 * v.w; ret.w = v.w; return ret; } Besides matrix * matrix multiplication this should be the most common operation where optimization would make sense (at least for float precision it is optimized)
ulf.schroeter Posted October 29, 2011 Author Posted October 29, 2011 Think I found the answer. Mass double precision matrix * vector multiplications are optimized via void SimdSSE2::mulMat3Vec3() void SimdSSE2::mulMat4Vec4()
frustum Posted October 29, 2011 Posted October 29, 2011 Memory layout of dmat4 is not effective for single vector multiplication. dmat4 is rearranged for multiple vector multiplications.
ulf.schroeter Posted October 31, 2011 Author Posted October 31, 2011 Do you have any rough numbers how much SSE2 improved double precision transform performance ?
frustum Posted October 31, 2011 Posted October 31, 2011 The CPU is "Intel® Core i5 CPU 650 @ 3.20GHz" Set Generic simd processor minMaxVec3f Time: 0.063 FPS: 15.9834 Mem: 2557.34 Mb/s minMaxVec4f Time: 0.080 FPS: 12.4738 Mem: 1995.81 Mb/s minMaxVec3d Time: 0.063 FPS: 15.9289 Mem: 5097.25 Mb/s minMaxVec4d Time: 0.081 FPS: 12.2850 Mem: 3931.20 Mb/s mulMat3Vec3f Time: 0.054 FPS: 18.6081 Mem: 2977.30 Mb/s mulMat4Vec3f Time: 0.054 FPS: 18.4176 Mem: 2946.81 Mb/s mulMat4Vec4f Time: 0.056 FPS: 17.7939 Mem: 2847.03 Mb/s mulMat3Vec3d Time: 0.113 FPS: 8.8857 Mem: 2843.41 Mb/s mulMat4Vec3d Time: 0.115 FPS: 8.6618 Mem: 2771.76 Mb/s Set SSE simd processor minMaxVec3f Time: 0.015 FPS: 68.6153 Mem: 10978.45 Mb/s minMaxVec4f Time: 0.014 FPS: 69.0894 Mem: 11054.30 Mb/s minMaxVec3d Time: 0.061 FPS: 16.3983 Mem: 5247.45 Mb/s minMaxVec4d Time: 0.079 FPS: 12.6208 Mem: 4038.67 Mb/s mulMat3Vec3f Time: 0.023 FPS: 43.8059 Mem: 7008.94 Mb/s mulMat4Vec3f Time: 0.026 FPS: 38.7357 Mem: 6197.71 Mb/s mulMat4Vec4f Time: 0.025 FPS: 39.6589 Mem: 6345.43 Mb/s mulMat3Vec3d Time: 0.107 FPS: 9.3884 Mem: 3004.30 Mb/s mulMat4Vec3d Time: 0.115 FPS: 8.6634 Mem: 2772.29 Mb/s Set SSE2 simd processor minMaxVec3f Time: 0.015 FPS: 68.6059 Mem: 10976.95 Mb/s minMaxVec4f Time: 0.014 FPS: 69.1085 Mem: 11057.36 Mb/s minMaxVec3d Time: 0.016 FPS: 63.1592 Mem: 20210.95 Mb/s minMaxVec4d Time: 0.016 FPS: 63.4961 Mem: 20318.75 Mb/s mulMat3Vec3f Time: 0.023 FPS: 43.8020 Mem: 7008.32 Mb/s mulMat4Vec3f Time: 0.026 FPS: 38.7012 Mem: 6192.19 Mb/s mulMat4Vec4f Time: 0.025 FPS: 39.6873 Mem: 6349.96 Mb/s mulMat3Vec3d Time: 0.045 FPS: 22.2119 Mem: 7107.79 Mb/s mulMat4Vec3d Time: 0.045 FPS: 22.2040 Mem: 7105.27 Mb/s
ulf.schroeter Posted October 31, 2011 Author Posted October 31, 2011 Thanks frustum, quite nice improvement !
alexei.garbuzenko Posted October 31, 2011 Posted October 31, 2011 As I see double operations are faster than float with SSE2. Is it reasonable to switch from floats to doubles on SSE2-enabled cores to have math improvements?
frustum Posted October 31, 2011 Posted October 31, 2011 Nope, double operation isn't faster than the float ones. The memory bandwidth is up to two times higher but overall FPS is slower.
alexei.garbuzenko Posted October 31, 2011 Posted October 31, 2011 Oh, my bad. I thought that 'Mem' column is proportional to 'FPS' Thanks for clarifying!
Recommended Posts