Skip to content

ROCKPro64 - Alter u-boot im SPI-Flash

ROCKPro64
1 1 543
  • Hallo zusammen, mal wieder ein paar Erfahrungen weiter geben.

    Ich habe mich heute gewundert, das meine NVMe Installation nur folgendes angezeigt hat.

    LnkSta:	Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    

    Hmm 🤔 , da ich diesen ROCKPro64 auch immer zum Testen nutze, habe ich lange gebraucht um das Problem zu finden. Da war wohl ein uralter u-boot im SPI Flash gespeichert. Nachdem ich diesen gelöscht hatte, konnte ich auch wieder die volle Geschwindigkeit geniessen. ☺

    LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    

    Kurz noch ein aktueller Speedtest auf dem 4.19er Kernel

    rock64@rockpro64v2_0:~$ uname -a
    Linux rockpro64v2_0 4.19.0-rc4-1065-ayufan-g72e04c7b3e06 #1 SMP PREEMPT Sat Sep 29 21:27:52 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
    

    Und iozone

    rock64@rockpro64v2_0:~$ sudo iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 
    	Iozone: Performance Test of File I/O
    	        Version $Revision: 3.429 $
    		Compiled for 64 bit mode.
    		Build: linux 
    
    	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
    	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
    	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
    	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
    	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
    	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
    	             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
    	             Vangel Bojaxhi, Ben England, Vikentsi Lapa.
    
    	Run began: Sun Sep 30 11:00:30 2018
    
    	Include fsync in write timing
    	O_DIRECT feature enabled
    	Auto Mode
    	File size set to 102400 kB
    	Record Size 4 kB
    	Record Size 16 kB
    	Record Size 512 kB
    	Record Size 1024 kB
    	Record Size 16384 kB
    	Command line used: iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
    	Output is in kBytes/sec
    	Time Resolution = 0.000001 seconds.
    	Processor cache size set to 1024 kBytes.
    	Processor cache line size set to 32 bytes.
    	File stride size set to 17 * record size.
                                                                  random    random     bkwd    record    stride                                    
                  kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
              102400       4    94816   146795   159030   159656    54653   143233                                                          
              102400      16   299647   411046   438669   442937   194936   404928                                                          
              102400     512  1023181  1024309   960772   982149   800280  1032060                                                          
              102400    1024  1116173  1168683  1112536  1140780  1004678  1178748                                                          
              102400   16384   927019  1387086  1435659  1485404  1470077  1379217                                                          
    
    iozone test complete.
    

    Jetzt bin ich wieder zufrieden 🤓 Dritter ROCKPro64 ist auf dem Weg, damit das nicht nochmal passiert 😉

  • ROCKPro64 - Kernel 5.6 und Wireguard 1.0

    ROCKPro64 linux rockpro64 wireguard
    1
    0 Stimmen
    1 Beiträge
    353 Aufrufe
    Niemand hat geantwortet
  • Der 3. ROCKPro64

    ROCKPro64 rockpro64
    3
    0 Stimmen
    3 Beiträge
    1k Aufrufe
    FrankMF
    Nachdem ich jetzt mein NAS neu gemacht habe, schauen wir mal, was die Chinesen geliefert haben. Bestellt hatte ich ROCKPro64 v2.1 2GB RAM Kühlkörper Netzteil 3A USB-Adapter für eMMC-Modul Endlich habe ich mal an den USB-Adapter für das eMMC-Modul gedacht [image: 1540029625079-img_20181020_115348_ergebnis-resized.jpg] Was ist mir aufgefallen? Das Versionsdatum ist neu (siehe oben) Die PCIe NVMe Karte ist neu Bei der PCIe NVMe Karte liegt eine Abstandshülse aus Messing und eine winzig kleine Schraube bei. Damit bekomme ich aber nicht die NVMe-SSD befestigt. Ich habe dann gemurkst Da sollte Pine64 unbedingt nachbessern! So sieht das dann zusammengebaut aus. [image: 1540029757102-img_20181020_115425_ergebnis-resized.jpg] [image: 1540029767472-img_20181020_115438_ergebnis.jpg] Da ich ein paarmal gelesen hatte, das Leute Probleme mit dem PCIe NVMe Adapter hatten, direkt als erstes mal ein Test ob das reibungslos funktioniert. Sys rock64@rockpro64:/mnt$ uname -a Linux rockpro64 4.4.132-1075-rockchip-ayufan-ga83beded8524 #1 SMP Thu Jul 26 08:22:22 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux lspci rock64@rockpro64:/mnt$ sudo lspci -vvv [sudo] password for rock64: 00:00.0 PCI bridge: Rockchip Inc. RK3399 PCI Express Root Port Device 0100 (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort+ <MAbort+ >SERR+ <PERR+ INTx- Latency: 0 Interrupt: pin A routed to IRQ 238 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00000000-00000fff Memory behind bridge: fa000000-fa0fffff Prefetchable memory behind bridge: 00000000-000fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME+ Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+ Address: 00000000fee30040 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [b0] MSI-X: Enable- Count=1 Masked- Vector table: BAR=0 offset=00000000 PBA: BAR=0 offset=00000008 Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L1, Exit Latency L0s <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+ LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 0.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd Off, Power+ Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via message ARIFwd+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [274 v1] Transaction Processing Hints Interrupt vector mode supported Device specific mode supported Steering table in TPH capability structure Kernel driver in use: pcieport 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 237 Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00003000 PBA: BAR=0 offset=00002000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [158 v1] Power Budgeting <?> Capabilities: [168 v1] #19 Capabilities: [188 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [190 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Kernel driver in use: nvme Da sieht alles gut aus. x4 alles Bestens! iozone rock64@rockpro64:/mnt$ sudo iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Iozone: Performance Test of File I/O Version $Revision: 3.429 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa. Run began: Sat Oct 20 10:08:28 2018 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 102400 kB Record Size 4 kB Record Size 16 kB Record Size 512 kB Record Size 1024 kB Record Size 16384 kB Command line used: iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 102400 4 63896 108269 91858 95309 32845 73173 102400 16 123393 236653 273766 275807 118450 199130 102400 512 471775 570571 484612 496942 441345 575817 102400 1024 544229 642558 508895 511834 486506 647765 102400 16384 1044520 1100322 1069825 1092146 1089301 1086757 iozone test complete. Das sieht nicht optimal aus, schau ich mir später an. Das hier soll nur ein kurzer Test sein ob das Board rennt Nachdem ich mittlerweile zwei ROCKPro64 im "produktiven" Einsatz habe, war es immer sehr mühsam mal eben was zu testen. Man will die anderen ja nicht immer ausmachen, dran rumhantieren usw. Deswegen jetzt der dritte, der im Moment dann die Rolle des Testkandidaten einnimmt. Ab sofort kann ich wieder nach Lust und Laune, neue Images testen usw.
  • Benchmark

    ROCKPro64 rockpro64
    1
    0 Stimmen
    1 Beiträge
    480 Aufrufe
    Niemand hat geantwortet
  • Ubuntu Bionic - Namen der Interfaces umstellen

    ROCKPro64 howto rockpro64
    1
    0 Stimmen
    1 Beiträge
    610 Aufrufe
    Niemand hat geantwortet
  • 960 EVO M.2 vs. 970 PRO M.2

    ROCKPro64 rockpro64
    2
    0 Stimmen
    2 Beiträge
    2k Aufrufe
    FrankMF
    Die 970 steckt jetzt in meinem Haupt-PC. Dort werkelt ein aktuelles Linux Mint Cinnamon 19. Zum Vergleich. 100M frank@frank-MS-7A34:~$ sudo iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 [sudo] Passwort für frank: Iozone: Performance Test of File I/O Version $Revision: 3.429 $ Compiled for 64 bit mode. Build: linux-AMD64 Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa. Run began: Sun Aug 19 16:52:19 2018 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 102400 kB Record Size 4 kB Record Size 16 kB Record Size 512 kB Record Size 1024 kB Record Size 16384 kB Command line used: iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 102400 4 92640 121912 131074 139525 45719 116653 102400 16 254286 285267 285539 320370 108049 314486 102400 512 537947 581765 606103 598137 537701 588214 102400 1024 566892 547921 567369 597286 518014 558686 102400 16384 1407884 1642148 1941120 2115608 2006947 1668118 iozone test complete. 1000M frank@frank-MS-7A34:~$ sudo iozone -e -I -a -s 1000M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Iozone: Performance Test of File I/O Version $Revision: 3.429 $ Compiled for 64 bit mode. Build: linux-AMD64 Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa. Run began: Sun Aug 19 15:28:38 2018 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 1024000 kB Record Size 4 kB Record Size 16 kB Record Size 512 kB Record Size 1024 kB Record Size 16384 kB Command line used: iozone -e -I -a -s 1000M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 1024000 4 95635 121379 108328 108265 45369 123356 1024000 16 239238 314359 245937 241877 105865 297193 1024000 512 596812 620661 442100 382367 351948 613525 1024000 1024 608903 611898 434687 417192 412018 646465 1024000 16384 1898738 2004622 2143647 2188062 2099674 1983240 iozone test complete. Da scheint auf dem ROCKPro64 noch ein wenig Luft nach oben.
  • Bionic Minimal 0.7.8

    ROCKPro64 rockpro64
    2
    1
    0 Stimmen
    2 Beiträge
    657 Aufrufe
    FrankMF
    Testin Testing
  • ROCKPro64 Übersicht - was geht? **veraltet**

    Angeheftet Verschoben Archiv rockpro64
    5
    0 Stimmen
    5 Beiträge
    2k Aufrufe
    FrankMF
    Ich sehe gerade, das könnte hier auch mal neu gemacht werden.
  • stretch-minimal-rockpro64

    Verschoben Linux rockpro64
    3
    0 Stimmen
    3 Beiträge
    1k Aufrufe
    FrankMF
    Mal ein Test was der Speicher so kann. rock64@rockpro64:~/tinymembench$ ./tinymembench tinymembench v0.4.9 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 2812.7 MB/s C copy backwards (32 byte blocks) : 2811.9 MB/s C copy backwards (64 byte blocks) : 2632.8 MB/s C copy : 2667.2 MB/s C copy prefetched (32 bytes step) : 2633.5 MB/s C copy prefetched (64 bytes step) : 2640.8 MB/s C 2-pass copy : 2509.8 MB/s C 2-pass copy prefetched (32 bytes step) : 2431.6 MB/s C 2-pass copy prefetched (64 bytes step) : 2424.1 MB/s C fill : 4887.7 MB/s (0.5%) C fill (shuffle within 16 byte blocks) : 4883.0 MB/s C fill (shuffle within 32 byte blocks) : 4889.3 MB/s C fill (shuffle within 64 byte blocks) : 4889.2 MB/s --- standard memcpy : 2807.3 MB/s standard memset : 4890.4 MB/s (0.3%) --- NEON LDP/STP copy : 2803.7 MB/s NEON LDP/STP copy pldl2strm (32 bytes step) : 2802.1 MB/s NEON LDP/STP copy pldl2strm (64 bytes step) : 2800.7 MB/s NEON LDP/STP copy pldl1keep (32 bytes step) : 2745.5 MB/s NEON LDP/STP copy pldl1keep (64 bytes step) : 2745.8 MB/s NEON LD1/ST1 copy : 2801.9 MB/s NEON STP fill : 4888.9 MB/s (0.3%) NEON STNP fill : 4850.1 MB/s ARM LDP/STP copy : 2803.8 MB/s ARM STP fill : 4893.0 MB/s (0.5%) ARM STNP fill : 4851.7 MB/s ========================================================================== == Framebuffer read tests. == == == == Many ARM devices use a part of the system memory as the framebuffer, == == typically mapped as uncached but with write-combining enabled. == == Writes to such framebuffers are quite fast, but reads are much == == slower and very sensitive to the alignment and the selection of == == CPU instructions which are used for accessing memory. == == == == Many x86 systems allocate the framebuffer in the GPU memory, == == accessible for the CPU via a relatively slow PCI-E bus. Moreover, == == PCI-E is asymmetric and handles reads a lot worse than writes. == == == == If uncached framebuffer reads are reasonably fast (at least 100 MB/s == == or preferably >300 MB/s), then using the shadow framebuffer layer == == is not necessary in Xorg DDX drivers, resulting in a nice overall == == performance improvement. For example, the xf86-video-fbturbo DDX == == uses this trick. == ========================================================================== NEON LDP/STP copy (from framebuffer) : 602.5 MB/s NEON LDP/STP 2-pass copy (from framebuffer) : 551.6 MB/s NEON LD1/ST1 copy (from framebuffer) : 667.1 MB/s NEON LD1/ST1 2-pass copy (from framebuffer) : 605.6 MB/s ARM LDP/STP copy (from framebuffer) : 445.3 MB/s ARM LDP/STP 2-pass copy (from framebuffer) : 428.8 MB/s ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 4.5 ns / 7.2 ns 131072 : 6.8 ns / 9.7 ns 262144 : 9.8 ns / 12.8 ns 524288 : 11.4 ns / 14.7 ns 1048576 : 16.0 ns / 22.6 ns 2097152 : 114.0 ns / 175.3 ns 4194304 : 161.7 ns / 219.9 ns 8388608 : 190.7 ns / 241.5 ns 16777216 : 205.3 ns / 250.5 ns 33554432 : 212.9 ns / 255.5 ns 67108864 : 222.3 ns / 271.1 ns