Zhe's blog: 11/01/2007

既然苹果公司在Mac OS X 10.5的矢量库中为Intel的处理器做了大量优化(见加速框架的发布说明)……我来测试下。感谢Lvli007的机器。
Since Apple Inc. is said to make many improves in vecLib of Mac OS X 10.5 to Intel's CPU(SeeAccelerate Release Notes), I'll check it. Thanks to Lvli007 for providing the computer.
Intel Core 2 Duo T7200:


Convolution (2048 x 256):

  microseconds  GFlops    microseconds  GFlops
   (average)   (average)     (best)     (best)
  ------------ ---------  ------------  ------
CPU # 1: 1345  0.7782   1338   0.7822
CPU # 2: 1343  0.7794   1338   0.7822


Complex Fast Fourier Transform (1024 elements):

  microseconds  GFlops    microseconds  GFlops
   (average)   (average)     (best)     (best)
  ------------ ---------  ------------  ------
CPU # 1: 5.183  9.878   5.113   10.01
CPU # 2: 5.189  9.866   5.125   9.99


Real Fast Fourier Transform (1024 elements):

  microseconds  GFlops    microseconds  GFlops
   (average)   (average)     (best)     (best)
  ------------ ---------  ------------  ------
CPU # 1: 3.369  6.839   3.345   6.888
CPU # 2: 3.335  6.909   3.291   7.001


Dot Product (1024 elements):

  microseconds  GFlops    microseconds  GFlops
   (average)   (average)     (best)     (best)
  ------------ ---------  ------------  ------
CPU # 1: 0.2783  7.354   0.276   7.417
CPU # 2: 0.2747  7.451   0.276   7.417

看来确实快了，可以同上次10.4.8下的结果对比。看来经过不懈努力，SSE3还是有可能和Altivec拼一拼的。虽然，那个卷积还是那么寒碜。
但是，我意外地得到了Rosetta下conv的结果：
Ahh… better than previous results under 10.4.8. Seems that after a year's work, SSE3 eventually achieved the same order of Altivec, though the convolution's still poor.
But I got score of conv under Rosetta accidentally:


Convolution (2048 x 256):

  microseconds  GFlops    microseconds  GFlops
   (average)   (average)     (best)     (best)
  ------------ ---------  ------------  ------
CPU # 1: 193.7  5.402   190.7   5.489
CPU # 2: 193.4  5.41    190   5.507

不仅大大超过了模拟器下其他三项的性能，而且，7倍于Intel原生过程的性能……无语了。补充下，可以生成通用二进制程序的convMP.tar.bz2在此。
Not only speed over the other 3 tests under the emulator, but also 7 times faster than the native routeline for Intel… I have nothing to say about. BTW the convMP.tar.bz2 which can build Universal Binary is here.

Zhe's blog

11/07/2007

New Mac OS X, old test 新系统，老测试

Blog Archive

Links

About Me

AnswerTips