T4 - More flashmem woes

Moderators: Gully, peteru

Post Reply
stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

T4 - More flashmem woes

Post by stevebow » Wed Jul 10, 2019 10:34

I had something very odd happen to my T4 this morning, something I have not seen before.

While watching the news this morning, all of a sudden the video started flashing (between around 5 frames black and 5 frames live). There was no audio. Around 10-15 secs. later, the flashing changed from 5 fr. black to 5 fr. GUI busy logo (75% circle over black).

I fired up the PC to get a serial log:

Code: Select all

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2019.07.10 09:12:59 =~=~=~=~=~=~=~=~=~=~=~=
{3442}<  2017.547> [eFilePushThreadRecorder] OVERFLOW while recording
{3442}<  2027.170> [eFilePushThreadRecorder] OVERFLOW while recording
brcmnand brcmnand.0: timeout waiting for command 0 (1)
brcmnand brcmnand.0: irq status 22a00000, intfc status f00000e0
brcmnand brcmnand.0: EDU still active: 00000001
brcmnand brcmnand.0: EDU still active: 00000001
brcmnand brcmnand.0: EDU still active: 00000001
UBIFS error (pid 3690): ubifs_read_node: bad node type (192 but expected 2)
UBIFS error (pid 3690): ubifs_read_node: bad node at LEB 11:806096, LEB mapping status 1
Not a node, first 24 bytes:
00000000: 4e a2 5f 52 09 73 65 02 ad 63 68 44 80 56 07 74 75 65 35 73 c0 73 5e 78                          N._R.se..chD.V.tue5s.s^x
CPU: 0 PID: 3690 Comm: telnetd Tainted: GF          O 3.14.2 #2
Stack : 00000000 00000000 00000000 00000000{3442}<  2093.32 000000004> [eFilePushThr 00000000eadRecorder] OVE 80976a52
INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=21049 jiffies, g=9300, c=9299, q=111)
Task dump for CPU 0:
telnetd         R running      0  3690    506 0x08100004
Stack : cb92c20c 00000000 cb905980 cb905980 0049cb10 7f8d00b4 7f8d03fc 00000003
  00000004 004c0000 7f8d0178 00000000 7f8d0000 801015c4 77441b8c 00000000
  004c0000 00000001 7f8d03fc 7f8d00b4 773a3ae4 80011af4 004f0000 0000000a
  00000001 00000001 7744ce00 004c58b8 00000e6a 774a4628 00000fab 0000002f
  0049cb10 7f8d00b4 7f8d03fc 7efcf900 00000001 00000000 2e373120 00000000
  ...
Call Trace:
[<80768d44>] __schedule+0x36c/0x7ec
[<80011af4>] handle_sys+0x134/0x158

 { 0}  (t=21049 jiffies g=9300 c=9299 q=131)
CPU: 0 PID: 3690 Comm: telnetd Tainted: GF          O 3.14.2 #2
Stack : 00000000 00000000 00000000 00000000 00000000 00000000 80976a52 0000003f
  00000000 00000000 00000e6a 00010000 cb92c298 808e0ce7 8082613c 00000000
  00000e6a 809738d8 808e49c4 00000000 808e0000 80765abc 00000004 8002da3c
  00000000 00000000 8082c2f8 c5443474 c5443474 cb92c298 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 c54433f0
  ...
Call Trace:
[<8000c94c>] show_stack+0x64/0x7c
[<80766748>] dump_stack+0xc0/0xf0
[<8008c848>] rcu_check_callbacks+0x504/0x87c
[<8003b7d4>] update_process_times+0x48/0x88
[<80097600>] tick_sched_timer+0x70/0x33c
[<800547a4>] __run_hrtimer.isra.4+0x68/0x110
[<80055210>] hrtimer_interrupt+0x1c8/0x508
[<8000f4f8>] c0_compare_interrupt+0x50/0x88
[<80081264>] handle_irq_event_percpu+0x78/0x27c
[<800856e0>] handle_percpu_irq+0x8c/0xbc
[<800808e8>] generic_handle_irq+0x34/0x4c
[<80008cac>] do_IRQ+0x18/0x2c
[<80003754>] plat_irq_dispatch+0xac/0x10c
[<80006c88>] ret_from_irq+0x0/0x4
[<80033480>] __do_softirq+0xe4/0x2fc
[<80033984>] irq_exit+0x78/0x84
[<80003528>] brcm_intc_dispatch.isra.0+0x90/0x210
[<80003774>] plat_irq_dispatch+0xcc/0x10c
[<80006c88>] ret_from_irq+0x0/0x4
[<803dff28>] __arch_local_irq_restore+0x18/0x30
[<8007eadc>] vprintk_emit+0x268/0x560
[<80765abc>] printk+0x38/0x44
[<8000bfa4>] show_stacktrace+0x80/0x248
[<8000c908>] show_stack+0x20/0x7c
[<80766748>] dump_stack+0xc0/0xf0
[<802b0d98>] ubifs_read_node+0x1a0/0x2c0
[<802d2230>] ubifs_tnc_read_node+0x164/0x174
[<802b2120>] tnc_read_node_nm+0xe8/0x1f4
[<802b5058>] ubifs_tnc_locate+0x210/0x268
[<802b5828>] ubifs_tnc_lookup_nm+0x38/0x178
[<802a8570>] ubifs_lookup+0xac/0x200

[... log then loops from "INFO: rcu_sched self-detected stall on CPUINFO:" line]
What was interesting was that the log was being written at about a 300 baud rate (brought back fond memories :) ) in bursts of 5/6 chars. or so.

I also tried to telnet in, but the T4 was not responsive at all (no login prompt).

At that point, I repowered, and noticed the following:

Code: Select all

[...]
UBI: scanning is finished
UBI: attached mtd0 (name "rootfs", size 2041 MiB) to ubi0
UBI: PEB size: 1048576 bytes (1024 KiB), LEB size: 1040384 bytes
UBI: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
UBI: VID header offset: 4096 (aligned 4096), data offset: 8192
UBI: good PEBs: 2039, bad PEBs: 2, corrupted PEBs: 0
UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
UBI: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 1134490118
UBI: available PEBs: 0, total reserved PEBs: 2039, PEBs reserved for bad PEB handling: 78
UBI: background thread "ubi_bgt0d" started, PID 69
ALSA device list:
  No soundcards found.
UBIFS: background thread "ubifs_bgt0_0" started, PID 70
UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 0, name "rootfs"
UBIFS: LEB size: 1040384 bytes (1016 KiB), min./max. I/O unit sizes: 4096 bytes/4096 bytes
UBIFS: FS size: 2025627648 bytes (1931 MiB, 1947 LEBs), journal size 12550144 bytes (11 MiB, 13 LEBs)
UBIFS: reserved for root: 0 bytes (0 KiB)
UBIFS: media format: w4/r0 (latest is w4/r0), UUID 05573954-D033-4F2D-A3CF-C5FBAFC1672F, small LPT model
VFS: Mounted root (ubifs filesystem) on device 0:11.
devtmpfs: mounted
Freeing unused kernel memory: 260K (8092f000 - 80970000)

INIT: version 2.88 booting
[...]
Fortunately the T4 booted normally.

It appears that the flashmem had played up yet again, judging from the UBIFS error lines in the first log, and fortunately recovered on boot by UBIFS. The "bad PEBs: 2" I have always had and I see these bad PEBs too during flash erase, so I am not worried about these.

I have 26 boot logs still, of these, 5 have "UBIFS: recovery needed" lines. Of those, 3 were successfully "UBIFS: recovery completed" and 2 required reflashing.

I think it may be slowly high time to order one of these devices as a spare. I see they can be had for around AU$15 at Mouser, but not at RS or Element 14, unfortunately.

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: T4 - More flashmem woes

Post by peteru » Thu Jul 11, 2019 00:32

It looks like there is a flash memory failure trend starting to appear. Ate you running 19.3?

There are two primary suspects:

1. Age related flash failures.
2. Bugs in 19.3 firmware.

So far there is not enough data to make a call. We need to keep in mind the fact that the most likely users to notice these problems are also the same users that tend to run beta releases, so let's not be to hasty in blaming 19.3!

"Beauty lies in the hands of the beer holder."
Blog.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: T4 - More flashmem woes

Post by stevebow » Thu Jul 11, 2019 10:45

peteru wrote:
Thu Jul 11, 2019 00:32
Ate you running 19.3?

No, I am on 17.5-20190207. I first noticed a flashmem problem (I assume, the symptoms were the same) around a year after I purchased the T4:

T4 failed to boot today, needed reflashing

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: T4 - More flashmem woes

Post by prl » Thu Jul 11, 2019 13:14

peteru wrote:
Thu Jul 11, 2019 00:32
There are two primary suspects:

1. Age related flash failures.
...
We need to keep in mind the fact that the most likely users to notice these problems are also the same users that tend to run beta releases, so let's not be to hasty in blaming 19.3!

Those same users are also likely to have obtained their T4s longest ago.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: T4 - More flashmem woes

Post by peteru » Thu Jul 11, 2019 17:09

prl wrote:
Thu Jul 11, 2019 13:14
peteru wrote:
Thu Jul 11, 2019 00:32
There are two primary suspects:

1. Age related flash failures.
...
We need to keep in mind the fact that the most likely users to notice these problems are also the same users that tend to run beta releases, so let's not be to hasty in blaming 19.3!

Those same users are also likely to have obtained their T4s longest ago.

And would have probably had more erase/program cycles than a typical user.

Flash memory :(

"Beauty lies in the hands of the beer holder."
Blog.

IanB
Wizard
Posts: 1550
Joined: Sat Jan 24, 2009 14:04
Location: Melbourne

Re: T4 - More flashmem woes

Post by IanB » Sun Jul 14, 2019 10:01

Probably time to start looking at nand flash diagnostics and tools, maybe .../drivers/mtd/test/*

Post Reply

Return to “Hardware Discussion”