Bad block. Bad, bad block...

Moderators: Gully, peteru

Post Reply
prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Bad block. Bad, bad block...

Post by prl » Tue Jul 02, 2019 00:31

My test T4 wouldn't boot after a USB upgrade to 20190630. Just sat showing Booting... in the front display.

Looking at the serial output showed:

Code: Select all

No filesystem could mount root, tried:  ubifs
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)
That prompted me to look at the serial output of a USB upgrade:

Code: Select all

Reading usbdisk0:beyonwiz/hdp/rootfs.bin: ....................................................................................................................................................................................................... 207618048 bytes read
Done. 207618048 bytes read
offset: 00000000   size: 7F900000
Force erasing... 
Erasing flash...Erase address  0x5a00000 failed, bad block?
Erase address  0x5b00000 failed, bad block?
Erase address  0x8600000 failed, bad block?
Erase address  0x9000000 failed, bad block?
Programming...

>>>[ui_flash] Writing data (nandflash0.rootfs) from source (beyonwiz/hdp/rootfs.bin)
done. 207618048 bytes written
That looks like a cause. Bad block. Bad, bad block...

Any suggestions other than a quick trip to WA for the T4 to see Dr Warkus?

Full serial output for the USB upgrade attached for anyone inclined to pore over the entrails.
Attachments
CoolTerm Capture 2019-07-02 00-11-38.txt
(25.14 KiB) Downloaded 93 times
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Tue Jul 02, 2019 01:13

Don't panic, those are perfectly normal on NAND flash devices. The drivers will catch and remap those bad blocks. With only three bad blocks out of the whole flash, you have nothing to complain about.

Chalk up the corruption to the usual NAND flip bit issues that you'd expect to see over time, I don't think there's anything extra-ordinary happening.

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Tue Jul 02, 2019 03:34

So if that isn't causing the problem, why this?

Code: Select all

UBI: background thread "ubi_bgt0d" started, PID 68
UBIFS: background thread "ubifs_bgt0_0" started, PID 69
UBIFS error (pid 1): ubifs_read_node: bad node type (255 but expected 9)
UBIFS error (pid 1): ubifs_read_node: bad node at LEB 195:573456, LEB mapping status 0
Not a node, first 24 bytes:
00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.2 #1
Stack : 00000000 00000000 00000000 00000000 00000000 00000000 80986a52 00000033
          00000000 00000000 00000001 00010000 cfc22d08 808ecce7 8083139c 00000000
          00000001 809838d8 ffffffea 802af188 00000003 807681e0 000000c3 8002c4b4
          00000001 00000000 80837558 cfc39bd4 cfc39bd4 cfc22d08 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 cfc39b50
          ...
Call Trace:
[<8000c1f8>] show_stack+0x20/0x70
[<8076c794>] dump_stack+0xc0/0xf0
[<802b5018>] ubifs_read_node+0x1d0/0x2b8
[<802bf2d4>] dbg_old_index_check_init+0x88/0xf8
[<802b1524>] ubifs_mount+0xe8c/0x1778
[<800f9efc>] mount_fs+0x1c/0x100
[<80118398>] vfs_kern_mount.part.9+0x60/0x188
[<8011af74>] do_mount+0x1f0/0xb30
[<8011bd58>] SyS_mount+0xe8/0x18c
[<8093bda0>] do_mount_root+0x28/0xb4
[<8093c008>] mount_block_root+0x13c/0x26c
[<8093c338>] prepare_namespace+0xe0/0x1fc
[<8093bc28>] kernel_init_freeable+0x268/0x298
[<80763b7c>] kernel_init+0x10/0x100
[<80006b28>] ret_from_kernel_thread+0x14/0x1c

UBIFS: background thread "ubifs_bgt0_0" stops
List of all partitions:
0800       156290904 sda  driver: sd
  0801       156289024 sda1 204e4d4b-5ad5-4d95-b47f-e482c78e3f23
1f00         2089984 mtdblock0  (driver?)
1f01         2097152 mtdblock1  (driver?)
1f02            7168 mtdblock2  (driver?)
1f03             640 mtdblock3  (driver?)
1f04            1920 mtdblock4  (driver?)
1f05             512 mtdblock5  (driver?)
1f06             512 mtdblock6  (driver?)
1f07             256 mtdblock7  (driver?)
1f08             256 mtdblock8  (driver?)
1f09               8 mtdblock9  (driver?)
No filesystem could mount root, tried:  ubifs
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)
Rebooting in 180 seconds..
Looks like an actual corrupt flash block to me.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Tue Jul 02, 2019 17:52

It is flash corruption, but not as a result of the bootloader flashing process. That process has correctly identified failed blocks that can not be erased (all set to 0xFF) and mapped them out. From the log you posted, you have blocks that appear to be referenced, but not programmed, as evidenced by the 0xFF values. It is possible that the updated firmware has bugs in the code. I know that there were some changes in that area. Can you get more info? Perhaps another reflash to see if it is reproducible?

"Beauty lies in the hands of the beer holder."
Blog.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: Bad block. Bad, bad block...

Post by stevebow » Tue Jul 02, 2019 19:05

Sounds not too dissimilar to my own bit flipping issue recently:

T4 - Flash mem failed again

A reflash has fixed that - until next time...

I have been experiencing this problem since Oct. 2016:

T4 failed to boot today, needed reflashing

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Tue Jul 02, 2019 20:00

It happens. I had to reflash my T4 (with 17.5) on the weekend because the flash got enough bit flips that it died on Wed or Thu last week. It was one of the factors that prompted me to finish 19.3 for the T-series (and U4).

If there is a reproducible test case, I'm happy to investigate further, but for random, once a year errors, there's not much that can be done.

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Tue Jul 02, 2019 22:31

peteru wrote:
Tue Jul 02, 2019 20:00
If there is a reproducible test case, I'm happy to investigate further, but for random, once a year errors, there's not much that can be done.

There is a reproducible test case on my T4. T have loaded the 20190630 firmware from USB about 4-5 times always with it resulting in a crash when the firmware boots.

What hasn't been reproduced (yet) are serial logs of the firmware load and reboot to see whether common blocks in the flash device are involved.

That will come once I've checked in the picon updates.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: Bad block. Bad, bad block...

Post by stevebow » Wed Jul 03, 2019 10:21

peteru wrote:
Tue Jul 02, 2019 20:00
If there is a reproducible test case, I'm happy to investigate further, but for random, once a year errors, there's not much that can be done.

In my case it appears to happen around once a year and is easily worked around (reflash), so I am not concerned at this stage.

prl wrote:
Tue Jul 02, 2019 22:31
What hasn't been reproduced (yet) are serial logs of the firmware load and reboot to see whether common blocks in the flash device are involved.

Funny you should say that. Each reflash that I still have logs for show the same two erase failures each time prior to flashing the rootfs:

Code: Select all

*** USB 2.0  Flash Disk        (1047 MB, lbs=512) ***
*** beyonwiz/hdp/rootfs.bin: File found
Looking for usbdisk0:beyonwiz/hdp/rootfs.bmp
Found 720x576 splash image
Reading usbdisk0:beyonwiz/hdp/rootfs.bin: ....................................................................................................................................................... 157286400 bytes read
Done. 157286400 bytes read
offset: 00000000   size: 7F900000
Force erasing... 
Erasing flash...Erase address  0x5a00000 failed, bad block?
Erase address  0x5b00000 failed, bad block?
Programming...

>>>[ui_flash] Writing data (nandflash0.rootfs) from source (beyonwiz/hdp/rootfs.bin)
done. 157286400 bytes written

Looking back, my earliest post on this was back in Ot. 2016: T4 failed to boot today, needed reflashing - this is something that is easily reproducible here.

I've attached the serial log of my most recent flash, done after my most recent bit flip episode in May. 2 of your 4 bad blocks are at 0x5a00000 and 0x5b00000, same as mine, and there are 4096 blocks in these devices, interesting.

The device's datasheet indicates each block consists of 256 pages. Each page consists of 4096 bytes of data plus 224 "spare" bytes for remapping purposes. I can only assume that these erase failure messages can be treated as warnings, that at least one of the spare bytes are in use in at least one corrupted page in each block, and all is good. For now.
Attachments
T4_Flash.zip
(3.03 KiB) Downloaded 71 times

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Wed Jul 03, 2019 14:07

prl wrote:
Tue Jul 02, 2019 22:31
There is a reproducible test case on my T4. T have loaded the 20190630 firmware from USB about 4-5 times always with it resulting in a crash when the firmware boots.
That's interesting. Have you also tried another download, just in case you have a corrupted image? I have not experienced any issues with my T4, even though it's unhealthy.
prl wrote:
Tue Jul 02, 2019 22:31
What hasn't been reproduced (yet) are serial logs of the firmware load and reboot to see whether common blocks in the flash device are involved.

The logs from the bootloader flash update should be pretty much identical from run to run. The bad blocks should stay bad and they should be identified as such and remapped. The first-boot logs could be more interesting and may actually reveal information that might help, but I suspect that you will need to pass some extra arguments to the kernel to enable more debug info. If you want to go down that path, you'll end up having to learn a bit about interrupting the bootloader, configuring bootloader variables and/or manually booting the kernel with custom arguments. Hopefully you will not need to learn about unbricking. :shock: You'll also need to learn about the underlying kernel implementation of the flash handling layers. That's a lot of stuff to absorb...

While I am not ruling out some software regression that could be causing issues, I'd start looking elsewhere. So far you seem to be the only person with this specific problem, which would suggest that the distributed image is OK. You also don't seem to have issues with other versions of the firmware, which would suggest that the hardware is OK. Logically, that leaves the downloaded firmware file that you have been using as the most likely culprit. I'll run some hashes on the file I have here and post them so that you can check...

"Beauty lies in the hands of the beer holder."
Blog.

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Wed Jul 03, 2019 14:26

Code: Select all

$ md5sum -b beyonwiz-19.3-beyonwizt4-20190630_usb.zip
7d7785afdc1b3bbad53b7a0fbbcfc350 *beyonwiz-19.3-beyonwizt4-20190630_usb.zip

$ sha1sum -b beyonwiz-19.3-beyonwizt4-20190630_usb.zip
a8523db7e5d7392fd5ebe9e4a78734d7c2f81942 *beyonwiz-19.3-beyonwizt4-20190630_usb.zip

$ sha256sum -b beyonwiz-19.3-beyonwizt4-20190630_usb.zip
92f32d92215ea6bb4e3c7242509b4ea6cd701947c18962d27692753a2d1519a9 *beyonwiz-19.3-beyonwizt4-20190630_usb.zip

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Wed Jul 03, 2019 14:43

Code: Select all

Cambyses:Firmware prl$ md5 beyonwiz-19.3-beyonwizt4-20190630_usb.zip
MD5 (beyonwiz-19.3-beyonwizt4-20190630_usb.zip) = 7d7785afdc1b3bbad53b7a0fbbcfc350

Cambyses:Firmware prl$ shasum  beyonwiz-19.3-beyonwizt4-20190630_usb.zip
a8523db7e5d7392fd5ebe9e4a78734d7c2f81942  beyonwiz-19.3-beyonwizt4-20190630_usb.zip
MacOS md5 doesn't have a -b (read in binary mode) flag, but the checksum's the same anyway. The MacOS shasum also matches your sha1sum value.

It doesn't look like file corruption.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: Bad block. Bad, bad block...

Post by stevebow » Wed Jul 03, 2019 15:32

Have you tried reflashing a previous working version, as you've mentioned only flashing 20190630? While the checksums are good, it would be worth a go anyway, with nothing to lose, you never know.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Wed Jul 03, 2019 16:19

stevebow wrote:
Wed Jul 03, 2019 15:32
Have you tried reflashing a previous working version, as you've mentioned only flashing 20190630? While the checksums are good, it would be worth a go anyway, with nothing to lose, you never know.

I'll give it a go.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Wed Jul 03, 2019 17:44

prl wrote:
Wed Jul 03, 2019 14:43
It doesn't look like file corruption.
OK, the downloaded file looks good. It's still possible that the data on the USB drive is corrupt. Maybe it isn't, but...

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Wed Jul 03, 2019 17:58

peteru wrote:
Wed Jul 03, 2019 17:44
prl wrote:
Wed Jul 03, 2019 14:43
It doesn't look like file corruption.
OK, the downloaded file looks good. It's still possible that the data on the USB drive is corrupt. Maybe it isn't, but...

How do you know that wasn't what I checksummed :) (it wasn't in fact).

I'm currently trying a USB update of 20190207, and have removed the old copy of 20190630 from the USB drive. But if the 20190207 update succeeds, I'll try to do 20190630 again, and I'll compare the extracted files on the USB against an extract on the Mac.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Wed Jul 03, 2019 18:16

prl wrote:
Wed Jul 03, 2019 16:19
stevebow wrote:
Wed Jul 03, 2019 15:32
Have you tried reflashing a previous working version, as you've mentioned only flashing 20190630? While the checksums are good, it would be worth a go anyway, with nothing to lose, you never know.

I'll give it a go.

A USB update of 20190207 worked just fine. Off to try 20190630 again.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Wed Jul 03, 2019 22:24

prl wrote:
Wed Jul 03, 2019 18:16
prl wrote:
Wed Jul 03, 2019 16:19
stevebow wrote:
Wed Jul 03, 2019 15:32
Have you tried reflashing a previous working version, as you've mentioned only flashing 20190630? While the checksums are good, it would be worth a go anyway, with nothing to lose, you never know.

I'll give it a go.

A USB update of 20190207 worked just fine. Off to try 20190630 again.

Three successive updates of the T4 to 20190630 worked without problem.

It's a pity I didn't check the old copy of 20190630 before I deleted it, but I'll modify my script that loads the firmware onto the USB to compare the copy of the firmware on the USB stick with one extracted on my Mac.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Wed Jul 03, 2019 23:19

So, probably a bad flash block, but on the USB drive. It may be a good idea to run f3probe on your USB drive, see what it reports.

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Thu Jul 04, 2019 13:34

peteru wrote:
Wed Jul 03, 2019 23:19
So, probably a bad flash block, but on the USB drive. It may be a good idea to run f3probe on your USB drive, see what it reports.
The drive I'm using is a known fake (conference freebie), but it's been partitioned to use only the first 2GB of the claimed 4GB after I did a write/read-back test to check its actual size (my program, not f3).

Code: Select all

Cambyses:src prl$ brew install f3
Warning: You are using macOS 10.11.
We (and Apple) do not provide support for this old version.
You will encounter build failures with some formulae.
Please create pull requests instead of asking for help on Homebrew's GitHub,
Discourse, Twitter or IRC. You are responsible for resolving any issues you
experience, as you are running this old version.
Turns out that it works, despite the dire warnings.

f3write/f3read reports no errors, even after ejecting and re-plugging the thumb drive. To check whether the there might have been a bad block in the files that were already on the thumb drive, I reformatted the partition (keeping its reduced size) and re-ran f3write/f3read. That also reported no errors, even after ejecting and re-plugging the thumb drive.

Running brew install f3 on my Mac installed f3write and f3read, but did not install f3probe or f3fix.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: Bad block. Bad, bad block...

Post by stevebow » Thu Jul 04, 2019 13:57

prl wrote:
Wed Jul 03, 2019 18:16
A USB update of 20190207 worked just fine. Off to try 20190630 again.

I heard that sigh of relief here in Sydney! 8)

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Thu Jul 04, 2019 14:11

stevebow wrote:
Thu Jul 04, 2019 13:57
prl wrote:
Wed Jul 03, 2019 18:16
A USB update of 20190207 worked just fine. Off to try 20190630 again.

I heard that sigh of relief here in Sydney! 8)

:D

I'm still a bit puzzled about just what broke.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

User avatar
peteru
Uber Wizard
Posts: 9735
Joined: Tue Jun 12, 2007 23:06
Location: Sydney, Australia
Contact:

Re: Bad block. Bad, bad block...

Post by peteru » Thu Jul 04, 2019 14:38

The controller on the USB drive will need to perform the occasional bad block remapping. If there was a bad block and it was remapped using an address from the "ghost" memory on the fake drive, then it's entirely possible to end up with incorrect data in an unrelated block. The fact that the T4 log is showing the contents of the flashed image to contain 0xFF suggests that whatever happened to the data on the USB drive most likely resulted in some data being replaced with an erased, but not programmed, block.

I guess we learned that if/when users experience issues that are not reproducible, even after a reflash, it may be time to try with a different USB drive. It's a long shot, but it can happen. You are lucky that the kernel caught the corruption, if it was in a non-critical part it may have resulted in all sorts of bizarre behaviours. Of course, you also managed to figure out what went wrong by hooking up the serial cable. A typical user may have concluded that it was their T4 that died, rather than suspecting their USB drive.

"Beauty lies in the hands of the beer holder."
Blog.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: Bad block. Bad, bad block...

Post by prl » Thu Jul 04, 2019 15:35

I'm currently making a script to check the USB firmware image contents with that in the ZIP file.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

Post Reply

Return to “Hardware Discussion”