README.440-DDR-performance 3.84 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11
AMCC suggested to set the PMU bit to 0 for best performace on the
PPC440 DDR controller. The 440er common DDR setup files (sdram.c &
spd_sdram.c) are changed accordingly. So all 440er boards using
these setup routines will automatically receive this performance
increase.

Please see below some benchmarks done by AMCC to demonstrate this
performance changes:


----------------------------------------
12
SDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone)
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
----------------------------------------
Stream benchmark results
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 112345 microseconds.
   (= 112345 clock ticks)
Increase the size of the arrays if this shows that you are not getting
at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the precision of your system
timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         256.7683       0.1248       0.1246       0.1250
Scale:        246.0157       0.1302       0.1301       0.1302
Add:          255.0316       0.1883       0.1882       0.1885
Triad:        253.1245       0.1897       0.1896       0.1899


TTCP Benchmark Results
ttcp-t: socket
ttcp-t: connect
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
localhost
ttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57
ttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw

----------------------------------------
SDRAM0_CFG0[PMU] = 0 (Suggested modification)
Setting PMU = 0 provides a noticeable performance improvement *2% to
5% improvement in memory performance.
*Improves the Mbit/sec for TTCP benchmark by almost 76%.
----------------------------------------
Stream benchmark results
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 120066 microseconds.
   (= 120066 clock ticks)
Increase the size of the arrays if this shows that you are not getting
at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the precision of your system
timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         262.5167       0.1221       0.1219       0.1223
Scale:        258.4856       0.1238       0.1238       0.1240
Add:          262.5404       0.1829       0.1828       0.1831
Triad:        266.8594       0.1800       0.1799       0.1802

TTCP Benchmark Results
ttcp-t: socket
ttcp-t: connect
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
localhost
ttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89
ttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw


2006-07-28, Stefan Roese <sr@denx.de>