불량 램으로 panic/reboot 겪어보신 분 계신가요? : 클리앙

몇달전에 리눅스 크래시나서 재부팅되는 이슈로 물어본적이 있었는데요,

아마도 이것이 단서가 아닌가 싶습니다요.. (우분투 기준입니다.)

일단 크래시나서 재부팅된 다음에 /var/log 디렉토리로 가면 kern.log, kern.log.1, kern.log.2.gz ... 같은 파일들이 있습니다.

최근 크래시는 .1, .2 가 안붙은 kern.log 인데요,

에디터로 열어서 [ 0.000000] 이 있는 라인을 찾아보면 대략 위아래로 이렇게 생겼습니다.

2023-08-16T22:35:00.786949+09:00 x300 kernel: [20536.167624] show_signal: 128 callbacks suppressed
2023-08-16T22:35:00.786977+09:00 x300 kernel: [20536.167628] traps: php[604781] general protection fault ip:5575d2d2011f sp:7ffdce616220 error:0 in php[5575d2a00000+451000]
2023-08-16T23:53:35.558765+09:00 x300 kernel: [    0.000000] Linux version 6.2.0-27-generic (buildd@lcy02-amd64-001) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.2.0-17ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.40) #28-Ubuntu SMP PREEMPT_DYNAMIC Wed Jul 12 22:39:51 UTC 2023 (Ubuntu 6.2.0-27.28-generic 6.2.15)
2023-08-16T23:53:35.558770+09:00 x300 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.0-27-generic root=UUID=59fabc67-ad69-494a-a989-768eb21c14ee ro

앞에는 타임스탬프이고 [ ] 안에 있는 숫자는 부팅후 쭉 증가하는 초단위 숫자입니다.

크래시나기전에 20536 초 동안 살아있다가 php에서 general protection fault 가 발생해서 죽고 리부팅되면서

dmesg 치면 나오는 내용이 시작됩니다.

general protection fault 말고

BUG: scheduling while atomic: chmod/689742/0x02ea572f

이런것도 보였는데 대체로 대부분 general protection fault 이후에 리부팅된것 같습니다.

근데 인터넷 검색해보면 램 바꾸고 이 에러 없어졌다는 글이 좀 보이는데요,

혹시 비슷한 사례를 겪으신 분 계실까요?

현재 장착된 램은 삼성 DDR4-3200 32GB*2 이고 dmidecode 출력은 다음과 같습니다.

# dmidecode --type memory
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.


Handle 0x000E, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 128 GB
        Error Information Handle: 0x000D
        Number Of Devices: 2


Handle 0x0015, DMI type 17, 92 bytes
Memory Device
        Array Handle: 0x000E
        Error Information Handle: 0x0014
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: SODIMM
        Set: None
        Locator: DIMM 0
        Bank Locator: P0 CHANNEL A
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 3200 MT/s
        Manufacturer: Samsung
        Serial Number: 0411A611
        Asset Tag: Not Specified
        Part Number: M471A4G43AB1-CWE
        Rank: 2
        Configured Memory Speed: 3200 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V
        Memory Technology: DRAM
        Memory Operating Mode Capability: Volatile memory
        Firmware Version: Unknown
        Module Manufacturer ID: Bank 1, Hex 0xCE
        Module Product ID: Unknown
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Non-Volatile Size: None
        Volatile Size: 32 GB
        Cache Size: None
        Logical Size: None


Handle 0x0018, DMI type 17, 92 bytes
Memory Device
        Array Handle: 0x000E
        Error Information Handle: 0x0017
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: SODIMM
        Set: None
        Locator: DIMM 0
        Bank Locator: P0 CHANNEL B
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 3200 MT/s
        Manufacturer: Samsung
        Serial Number: 0411A61C
        Asset Tag: Not Specified
        Part Number: M471A4G43AB1-CWE
        Rank: 2
        Configured Memory Speed: 3200 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V
        Memory Technology: DRAM
        Memory Operating Mode Capability: Volatile memory
        Firmware Version: Unknown
        Module Manufacturer ID: Bank 1, Hex 0xCE
        Module Product ID: Unknown
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Non-Volatile Size: None
        Volatile Size: 32 GB
        Cache Size: None
        Logical Size: None

리눅서당

하드웨어 불량 램으로 panic/reboot 겪어보신 분 계신가요? 9