Skip to content

[HOWTO] ROCKPro64 - PCIe NVMe Karte mit Samsung 960 EVO m.2

Verschoben Hardware
  • Das Bild sollte Euch ja mittlerweile bekannt vorkommen. 😉 Hier mal ein paar Dinge erklärt - Zielgruppe Einsteiger!

    0_1528781181138_IMG_20180523_111136.jpg

    Als erstes mal ein wichtiger Hinweis, von dieser Karte kann man aktuell nicht booten. Der U-Boot (Boatloader) unterstützt das zur Zeit nicht.

    Ich nutze für die Versuche hier, die erste Version die den PCIe-Port zum Leben erweckt.
    bionic-minimal-rockpro64-0.6.52-257-arm64.img.xz

     rock64@rockpro64:~$ uname -a
     Linux rockpro64 4.4.126-rockchip-ayufan-257 #1 SMP Sun Jun 10 18:30:43 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
    

    Ein lspci zeigt uns ein paar Info's.

    rock64@rockpro64:~$ lspci
    00:00.0 PCI bridge: Rockchip Inc. RK3399 PCI Express Root Port Device 0100
    01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
    

    Hier sehen wir die Rockchip Bridge, das ist der PCIe-Slot und die Samsung SSD. Das gibt es auch in ausführlicher.

    rock64@rockpro64:~$ sudo lspci -vvv
    [sudo] password for rock64: 
    00:00.0 PCI bridge: Rockchip Inc. RK3399 PCI Express Root Port Device 0100 (prog-if 00 [Normal decode])
    	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort+ <MAbort+ >SERR+ <PERR+ INTx-
    	Latency: 0
    	Interrupt: pin A routed to IRQ 238
    	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    	I/O behind bridge: 00000000-00000fff
    	Memory behind bridge: fa000000-fa0fffff
    	Prefetchable memory behind bridge: 00000000-000fffff
    	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
    	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
    		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    	Capabilities: [80] Power Management version 3
    		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
    		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME+
    	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+
    		Address: 00000000fee30040  Data: 0000
    		Masking: 00000000  Pending: 00000000
    	Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
    		Vector table: BAR=0 offset=00000000
    		PBA: BAR=0 offset=00000008
    	Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
    		DevCap:	MaxPayload 256 bytes, PhantFunc 0
    			ExtTag- RBE+
    		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
    			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
    			MaxPayload 128 bytes, MaxReadReq 512 bytes
    		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
    		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L1, Exit Latency L0s <256ns, L1 <8us
    			ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
    		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk-
    			ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
    		LnkSta:	Speed 5GT/s, Width x2, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
    		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
    			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
    		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
    			Control: AttnInd Off, PwrInd Off, Power+ Interlock-
    		SltSta:	Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock-
    			Changed: MRL- PresDet- LinkState-
    		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
    		RootCap: CRSVisible-
    		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
    		DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via message ARIFwd+
    		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
    		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
    			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    			 Compliance De-emphasis: -6dB
    		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
    			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    	Capabilities: [100 v2] Advanced Error Reporting
    		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    	Capabilities: [274 v1] Transaction Processing Hints
    		Interrupt vector mode supported
    		Device specific mode supported
    		Steering table in TPH capability structure
    	Kernel driver in use: pcieport
    
    01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 (prog-if 02 [NVM Express])
    	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
    	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    	Latency: 0
    	Interrupt: pin A routed to IRQ 237
    	Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16K]
    	Capabilities: [40] Power Management version 3
    		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
    		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
    		Address: 0000000000000000  Data: 0000
    	Capabilities: [70] Express (v2) Endpoint, MSI 00
    		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
    			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
    		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
    			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
    			MaxPayload 128 bytes, MaxReadReq 512 bytes
    		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
    		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
    			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
    		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk-
    			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
    		LnkSta:	Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
    		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
    		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
    			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    			 Compliance De-emphasis: -6dB
    		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
    			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
    		Vector table: BAR=0 offset=00003000
    		PBA: BAR=0 offset=00002000
    	Capabilities: [100 v2] Advanced Error Reporting
    		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
    	Capabilities: [158 v1] Power Budgeting <?>
    	Capabilities: [168 v1] #19
    	Capabilities: [188 v1] Latency Tolerance Reporting
    		Max snoop latency: 0ns
    		Max no snoop latency: 0ns
    	Capabilities: [190 v1] L1 PM Substates
    		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
    			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
    		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
    			   T_CommonMode=0us LTR1.2_Threshold=0ns
    		L1SubCtl2: T_PwrOn=10us
    	Kernel driver in use: nvme
    

    Viele Dinge, wo man keine Ahnung von hat, auch ich nicht 😉 Aber, es gibt auch hier interessante Info's.

    LnkSta:	Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    

    Was sagt uns diese Zeile? Die SSD ist mit zwei Lanes (x2) an den Prozessor angebunden. Oder sollte? Der Adapter soll eigentlich x4 unterstützen.

    Laut diesem Link hier, sollte sie maximal 800 MB/s schaufeln.

    Etwas vom Thema abgekommen. Also zurück. Wir wissen jetzt das die SSD vorhanden ist. Ein sudo fdisk -l gibt folgendes aus.

    rock64@rockpro64:~$ sudo fdisk -l
    Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/mmcblk0: 14.7 GiB, 15811477504 bytes, 30881792 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disklabel type: gpt
    Disk identifier: 298096AF-4287-4988-B53A-24CDE27C1C8D
    
    Device          Start      End  Sectors  Size Type
    /dev/mmcblk0p1     64     8063     8000  3.9M Linux filesystem
    /dev/mmcblk0p2   8064     8191      128   64K Linux filesystem
    /dev/mmcblk0p3   8192    16383     8192    4M Linux filesystem
    /dev/mmcblk0p4  16384    24575     8192    4M Linux filesystem
    /dev/mmcblk0p5  24576    32767     8192    4M Linux filesystem
    /dev/mmcblk0p6  32768   262143   229376  112M Microsoft basic data
    /dev/mmcblk0p7 262144 30881758 30619615 14.6G Linux filesystem
    
    
    Disk /dev/nvme0n1: 232.9 GiB, 250059350016 bytes, 488397168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    
    
    Disk /dev/zram0: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/zram1: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/zram2: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/zram3: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/zram4: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    
    
    Disk /dev/zram5: 323 MiB, 338722816 bytes, 82696 sectors
    Units: sectors of 1 * 4096 = 4096 bytes
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 4096 bytes
    

    Der interessante Teil.

    Disk /dev/nvme0n1: 232.9 GiB, 250059350016 bytes, 488397168 sectors
    Units: sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    

    ACHTUNG! Ab hier besteht die Gefahr von Datenverlust. Bitte vorher das Gehirn einschalten und denkt dran, ich übernehme kein Garantie für Eure Daten 🙂

    Eine jungfräuliche Karte muss jetzt erst mal eingerichtet werden. Mit sudo fdisk /dev/nvme0n1

    fdisk /dev/nvme0n1
    Welcome to fdisk (util-linux 2.25.2).
    Changes will remain in memory only, until you decide to write them.
    Be careful before using the write command.
    
    Command (m for help): n
    Partition type
    p primary (0 primary, 0 extended, 4 free)
    e extended (container for logical partitions)
    Select (default p): p
    Partition number (1-4, default 1): 1
    First sector (2048-488397168, default 2048): 2048
    Last sector, +sectors or +size{K,M,G,T,P} (2048-488397168, default 488397168): 488397168
    Created a new partition 1 of type 'Linux' and of size 232,9 GiB.
    
    Command (m for help): w
    The partition table has been altered.
    Calling ioctl() to re-read partition table.
    Syncing disks.
    

    Danach noch formatieren.

    mkfs.ext4 /dev/nvme0n1
    

    Mit

    mount /dev/nvme0n1 /mnt/
    

    wird die SSD dann ins System eingehangen.

    Speedtest

    Schreibtest

    rock64@rockpro64:/mnt$ sudo dd if=/dev/zero of=sd.img bs=1M count=4096 conv=fdatasync
    4096+0 records in
    4096+0 records out
    4294967296 bytes (4.3 GB, 4.0 GiB) copied, 12.6595 s, 339 MB/s
    rock64@rockpro64:/mnt$ sudo dd if=/dev/zero of=sd.img bs=1M count=4096 conv=fdatasync
    4096+0 records in
    4096+0 records out
    4294967296 bytes (4.3 GB, 4.0 GiB) copied, 12.6277 s, 340 MB/s
    

    Lesetest

    rock64@rockpro64:/mnt$ sudo hdparm -tT /dev/nvme0n1
    
    /dev/nvme0n1:
     Timing cached reads:   2562 MB in  2.00 seconds = 1280.89 MB/sec
     Timing buffered disk reads: 1734 MB in  3.00 seconds = 577.90 MB/sec
    

    Das Ganze mal laut dieser Anleitung.

    rock64@rockpro64:/mnt$ sudo dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.41565 s, 314 MB/s
    rock64@rockpro64:/mnt$ echo 3 | sudo tee /proc/sys/vm/drop_caches 
    3
    rock64@rockpro64:/mnt$ dd if=tempfile of=/dev/null bs=1M count=1024 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.62841 s, 659 MB/s
    rock64@rockpro64:/mnt$ dd if=tempfile of=/dev/null bs=1M count=1024 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.701997 s, 1.5 GB/s
    rock64@rockpro64:/mnt$ dd if=tempfile of=/dev/null bs=1M count=1024 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.664641 s, 1.6 GB/s
    rock64@rockpro64:/mnt$ dd if=tempfile of=/dev/null bs=1M count=1024 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.681466 s, 1.6 GB/s
    rock64@rockpro64:/mnt$ dd if=tempfile of=/dev/null bs=1M count=1024 
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.687785 s, 1.6 GB/s
    

    Nun müsste man die Testergebnisse einschätzen können. Da das meine erste SSD in einem PCIe Steckplatz ist, kann ich das nicht so richtig. Fakt ist, eigentlich müsste da was mehr gehen. Ich überlasse mal die Einschätzung anderen. @tkaiser

    Fazit:

    Das könnte ein richtig interessantes SOC werden, ich träume jetzt mal was. Booten von der PCIe-SSD und an USB3 ein Datengrab. Könnte ein sehr vielversprechendes NAS werden oder was auch immer. Viel Spaß beim Testen!!

    Und Danke an Kamil für seine Arbeit an diesem Linux-Image!

  • Ergänzung

    Eine andere SATA-Karte und eine Riser-Karte mit angeschlossener GPU startet nicht.

    rock64@rockpro64v2_1:~$ uname -a
    Linux rockpro64v2_1 4.4.132-1075-rockchip-ayufan-ga83beded8524 #1 SMP Thu Jul 26 08:22:22 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
    

  • Problem mit SATA Karte Marvell 88SE9230 Chipsatz

    Verschoben Ungelöst Probleme?
    17
    0 Stimmen
    17 Beiträge
    1k Aufrufe
    C

    Danke soweit für den Support.
    Wird sicher nicht das letzte Mal bleiben das ich hier vorbeischaue....

  • Ayufan Release 0.7.12

    ROCKPro64
    3
    0 Stimmen
    3 Beiträge
    395 Aufrufe
    FrankMF

    Dafür andere Probleme 🙂

    Link Preview Image 0.7.12_with_pcie_nvme_ssd - Pastebin.com

    Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

    favicon

    Pastebin (pastebin.com)

    Aktuell nicht zu empfehlen!

  • 0 Stimmen
    1 Beiträge
    1k Aufrufe
    Niemand hat geantwortet
  • Unterstützung Lüfter

    ROCKPro64
    5
    0 Stimmen
    5 Beiträge
    2k Aufrufe
    FrankMF

    Mit dem neuen Release hatte jemand das mal ausprobiert -> https://forum.frank-mankel.org/topic/795/fan-control-omv-auyfan-0-10-12-gitlab-ci-linux-build-184-kernel-5-6/6

    Dieser Kernel kam zur Anwendung

    Linux rockpro64 5.6.0-1137-ayufan-ge57f05e7bf8f #ayufan SMP Wed Apr 15 10:16:02 UTC 2020 aarch64 GNU/Linux

    Dort stellt man dann fest, das sich eine Kleinigkeit geändert hat. Der Pfad und der Dateiname hat sich geändert.

    Kontrollieren kann man das mit

    nano /sys/devices/platform/pwm-fan/hwmon/hwmon3/pwm1

    Der Wert geht von 0 - 255, wie gehabt.

  • 0 Stimmen
    3 Beiträge
    2k Aufrufe
    FrankMF

    Echtes Problem gefunden.

    Wenn die eMMC-Karte verbaut ist, ich mit der SD-Karte starte (Jumper gesetzt), kann ich keinen Kernel updaten. Es ist alles ganz normal installiert, er startet aber immer den letzten vorhandenen.

    Jumper entfernt, eMMC-Modul entfernt!

    Bootvorgang mit unveränderter SD-Karte, neuer Kernel wird geladen.

    OK, das verstehe ich im Moment überhaupt nicht !?!?!?

  • Lokale Einstellungen

    Verschoben ROCKPro64
    1
    0 Stimmen
    1 Beiträge
    554 Aufrufe
    Niemand hat geantwortet
  • [HOWTO] ROCKPro64 - Boot

    Verschoben Hardware
    5
    0 Stimmen
    5 Beiträge
    4k Aufrufe
    FrankMF

    Das Problem sollte mit Kernel 4.19.0-rc4-1069-ayufan behoben sein.

  • stretch-minimal-rockpro64

    Verschoben Linux
    3
    0 Stimmen
    3 Beiträge
    981 Aufrufe
    FrankMF

    Mal ein Test was der Speicher so kann.

    rock64@rockpro64:~/tinymembench$ ./tinymembench tinymembench v0.4.9 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 2812.7 MB/s C copy backwards (32 byte blocks) : 2811.9 MB/s C copy backwards (64 byte blocks) : 2632.8 MB/s C copy : 2667.2 MB/s C copy prefetched (32 bytes step) : 2633.5 MB/s C copy prefetched (64 bytes step) : 2640.8 MB/s C 2-pass copy : 2509.8 MB/s C 2-pass copy prefetched (32 bytes step) : 2431.6 MB/s C 2-pass copy prefetched (64 bytes step) : 2424.1 MB/s C fill : 4887.7 MB/s (0.5%) C fill (shuffle within 16 byte blocks) : 4883.0 MB/s C fill (shuffle within 32 byte blocks) : 4889.3 MB/s C fill (shuffle within 64 byte blocks) : 4889.2 MB/s --- standard memcpy : 2807.3 MB/s standard memset : 4890.4 MB/s (0.3%) --- NEON LDP/STP copy : 2803.7 MB/s NEON LDP/STP copy pldl2strm (32 bytes step) : 2802.1 MB/s NEON LDP/STP copy pldl2strm (64 bytes step) : 2800.7 MB/s NEON LDP/STP copy pldl1keep (32 bytes step) : 2745.5 MB/s NEON LDP/STP copy pldl1keep (64 bytes step) : 2745.8 MB/s NEON LD1/ST1 copy : 2801.9 MB/s NEON STP fill : 4888.9 MB/s (0.3%) NEON STNP fill : 4850.1 MB/s ARM LDP/STP copy : 2803.8 MB/s ARM STP fill : 4893.0 MB/s (0.5%) ARM STNP fill : 4851.7 MB/s ========================================================================== == Framebuffer read tests. == == == == Many ARM devices use a part of the system memory as the framebuffer, == == typically mapped as uncached but with write-combining enabled. == == Writes to such framebuffers are quite fast, but reads are much == == slower and very sensitive to the alignment and the selection of == == CPU instructions which are used for accessing memory. == == == == Many x86 systems allocate the framebuffer in the GPU memory, == == accessible for the CPU via a relatively slow PCI-E bus. Moreover, == == PCI-E is asymmetric and handles reads a lot worse than writes. == == == == If uncached framebuffer reads are reasonably fast (at least 100 MB/s == == or preferably >300 MB/s), then using the shadow framebuffer layer == == is not necessary in Xorg DDX drivers, resulting in a nice overall == == performance improvement. For example, the xf86-video-fbturbo DDX == == uses this trick. == ========================================================================== NEON LDP/STP copy (from framebuffer) : 602.5 MB/s NEON LDP/STP 2-pass copy (from framebuffer) : 551.6 MB/s NEON LD1/ST1 copy (from framebuffer) : 667.1 MB/s NEON LD1/ST1 2-pass copy (from framebuffer) : 605.6 MB/s ARM LDP/STP copy (from framebuffer) : 445.3 MB/s ARM LDP/STP 2-pass copy (from framebuffer) : 428.8 MB/s ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 4.5 ns / 7.2 ns 131072 : 6.8 ns / 9.7 ns 262144 : 9.8 ns / 12.8 ns 524288 : 11.4 ns / 14.7 ns 1048576 : 16.0 ns / 22.6 ns 2097152 : 114.0 ns / 175.3 ns 4194304 : 161.7 ns / 219.9 ns 8388608 : 190.7 ns / 241.5 ns 16777216 : 205.3 ns / 250.5 ns 33554432 : 212.9 ns / 255.5 ns 67108864 : 222.3 ns / 271.1 ns