既然苹果公司在Mac OS X 10.5的矢量库中为Intel的处理器做了大量优化(见加速框架的发布说明)……我来测试下。感谢Lvli007的机器。
Since Apple Inc. is said to make many improves in vecLib of Mac OS X 10.5 to Intel's CPU(SeeAccelerate Release Notes), I'll check it. Thanks to Lvli007 for providing the computer.
Intel Core 2 Duo T7200:
Convolution (2048 x 256):
microseconds GFlops microseconds GFlops
(average) (average) (best) (best)
------------ --------- ------------ ------
CPU # 1: 1345 0.7782 1338 0.7822
CPU # 2: 1343 0.7794 1338 0.7822
Complex Fast Fourier Transform (1024 elements):
microseconds GFlops microseconds GFlops
(average) (average) (best) (best)
------------ --------- ------------ ------
CPU # 1: 5.183 9.878 5.113 10.01
CPU # 2: 5.189 9.866 5.125 9.99
Real Fast Fourier Transform (1024 elements):
microseconds GFlops microseconds GFlops
(average) (average) (best) (best)
------------ --------- ------------ ------
CPU # 1: 3.369 6.839 3.345 6.888
CPU # 2: 3.335 6.909 3.291 7.001
Dot Product (1024 elements):
microseconds GFlops microseconds GFlops
(average) (average) (best) (best)
------------ --------- ------------ ------
CPU # 1: 0.2783 7.354 0.276 7.417
CPU # 2: 0.2747 7.451 0.276 7.417
看来确实快了,可以同上次10.4.8下的结果对比。看来经过不懈努力,SSE3还是有可能和Altivec拼一拼的。虽然,那个卷积还是那么寒碜。
但是,我意外地得到了Rosetta下conv的结果:
Ahh… better than previous results under 10.4.8. Seems that after a year's work, SSE3 eventually achieved the same order of Altivec, though the convolution's still poor.
But I got score of conv under Rosetta accidentally:
Convolution (2048 x 256):
microseconds GFlops microseconds GFlops
(average) (average) (best) (best)
------------ --------- ------------ ------
CPU # 1: 193.7 5.402 190.7 5.489
CPU # 2: 193.4 5.41 190 5.507
不仅大大超过了模拟器下其他三项的性能,而且,7倍于Intel原生过程的性能……无语了。补充下,可以生成通用二进制程序的convMP.tar.bz2在此。
Not only speed over the other 3 tests under the emulator, but also 7 times faster than the native routeline for Intel… I have nothing to say about. BTW the convMP.tar.bz2 which can build Universal Binary is here.