UTF8 characters not displayed in CIFS mount

Moderators: Gully, peteru

Post Reply
stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

UTF8 characters not displayed in CIFS mount

Post by stevebow » Thu Jun 07, 2018 12:02

Running 20180417 Production f/w on my T4. I noticed some entries in the Media Player were not displaying UTF8 characters in file and directory names on my NAS. Looking further, it appears to be possibly an issue with these characters being lost somewhere at the smb level.

SSHing directly into my NAS, these are a few examples of UTF8 names:

Code: Select all

-rw-rw-rw-    1 admin    everyone         0 Jun  7 11:19 ==冨田勲++.txt
drwxrwsrwx    3 Steve    everyone      4096 Feb 18  2016 冨田勲/
drwxrwsrwx   13 Steve    everyone      4096 Apr 19  2016 喜多郎/
drwxrwsrwx    4 Steve    everyone      4096 Feb 18  2016 坂本龍一/

Logging into the T4 and doing the same via the T4 smb mount yields:

Code: Select all

-rwxr-xr-x    1 root     root             0 Jun  7 11:19 ==*
drwxr-xr-x    2 root     root             0 Feb 18  2016 /
drwxr-xr-x    2 root     root             0 Apr 19  2016 /
drwxr-xr-x    2 root     root             0 Feb 18  2016 /

The .txt file I deliberately added "==" and "++" - note how the filename is truncated after the "==".

On the T4 I have created the NAS mountpoint via the GUI, and the corresponding entry in auto.network is:

Code: Select all

TS509_Public509 -fstype=cifs,user=Steve,pass=####,rw,iocharset=utf8,cache=loose ://192.168.#.#/Public509

So the iocharset parameter is present.

The UTF8 characters are properly displayed in Windows File Explorer as well as in Mac Finder (both AFP and SMB).

A bug, or something I need to further configure in the T4? Thanks.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Thu Jun 07, 2018 19:09

Do UTF-8 characters display correctly if they are in files on the HDD? Do they display correctly in the GUI?
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Thu Jun 07, 2018 20:51

I copied the directories/file from the NAS to the T4 (renamed .txt to .mp3 so it would be visible in Media Player). To retain filesystem integrity, I first tarred them on the NAS, sftp'd them across to the T4, untarred them there. During tar extraction, I did see the UTF8 characters displayed (using Putty).

This is what I see on the T4 HDD (telnet'd via Putty):

Code: Select all

root@beyonwizt4:/media/hdd/movie/UTF8_Files# ls -alF
drwxr-xr-x    5 root     users         4096 Jun  7 20:29 ./
drwxr-xr-x   16 root     root         45056 Jun  7 20:29 ../
-rw-rw-rw-    1 root     root             0 Jun  7 11:19 ==
drwxrwxrwx    3 root     root          4096 Jun  7 20:29 /
drwxrwxrwx    3 root     root          4096 Jun  7 20:29 /
drwxrwxrwx    3 root     root          4096 Jun  7 20:29 /

i.e. the same result when viewing the NAS directly via cifs. Similar result too via the Media Player GUI, the difference here being the filename is not truncated after"==":
.
T4_UTF8_NAS_to_T4_via_tar.png

The result was the same with the T4's File Commander also. I might add that I log in to my NAS the same way as my T4 i.e. Putty/telnet, the settings the same for both as far as I can see.

User avatar
MrQuade
Uber Wizard
Posts: 11844
Joined: Sun Jun 24, 2007 13:40
Location: Perth

Re: UTF8 characters not displayed in CIFS mount

Post by MrQuade » Thu Jun 07, 2018 21:13

Ok, that should rule out Samba settings at least.
I would have thought the Wiz would handle regular UTF-8 charters. Maybe something special about Japanese ones? I've glanced at articles that mentioned special handling for UTF-8 Japanese character sets. I have no idea what needs to be done though mind you.
Logitech Harmony Ultimate+Elite RCs
Beyonwiz T2/3/U4/V2, DP-S1 PVRs
Denon AVR-X3400h, LG OLED65C7T TV
QNAP TS-410 NAS, Centos File Server (Hosted under KVM)
Ubiquiti UniFi Managed LAN/WLAN, Draytek Vigor130/Asus RT-AC86U Internet
Pixel 4,5&6, iPad 3 Mobile Devices

IanB
Wizard
Posts: 1550
Joined: Sat Jan 24, 2009 14:04
Location: Melbourne

Re: UTF8 characters not displayed in CIFS mount

Post by IanB » Fri Jun 08, 2018 09:23

Have you set putty's character set to UTF-8, the default is Latin-1?

Does the font putty is using include the required glyphs?

What does "ls -l | od -c" show?

User avatar
MrQuade
Uber Wizard
Posts: 11844
Joined: Sun Jun 24, 2007 13:40
Location: Perth

Re: UTF8 characters not displayed in CIFS mount

Post by MrQuade » Fri Jun 08, 2018 09:42

IanB wrote:
Fri Jun 08, 2018 09:23
Have you set putty's character set to UTF-8, the default is Latin-1?

Does the font putty is using include the required glyphs?

What does "ls -l | od -c" show?
I'd assumed he was using the same client settings when he ssh'd into his NAS.
Logitech Harmony Ultimate+Elite RCs
Beyonwiz T2/3/U4/V2, DP-S1 PVRs
Denon AVR-X3400h, LG OLED65C7T TV
QNAP TS-410 NAS, Centos File Server (Hosted under KVM)
Ubiquiti UniFi Managed LAN/WLAN, Draytek Vigor130/Asus RT-AC86U Internet
Pixel 4,5&6, iPad 3 Mobile Devices

IanB
Wizard
Posts: 1550
Joined: Sat Jan 24, 2009 14:04
Location: Melbourne

Re: UTF8 characters not displayed in CIFS mount

Post by IanB » Fri Jun 08, 2018 11:01

Why would you assume that? Let the op answer for them self.



Assume just makes an ASS out of yoU and ME.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Fri Jun 08, 2018 13:51

Hi IanB,

I'm afraid MrQuade is correct. Per my post from yesterday "I might add that I log in to my NAS the same way as my T4 i.e. Putty/telnet, the settings the same for both as far as I can see."

My Putty setting "Window->Translation->Remote character set" is UTF-8. I have this setting for all my Putty session settings actually. I do see the characters rendered correctly in Putty when logged in to my NAS.

Piping ls through od as you suggested:

Code: Select all

root@beyonwizt4:/media/hdd/movie/UTF8_Files# ls -l
-rw-rw-rw-    1 root     root             0 Jun  7 11:19 ==
drwxrwxrwx    3 root     root          4096 Jun  7 20:29
drwxrwxrwx    3 root     root          4096 Jun  7 20:29
drwxrwxrwx    3 root     root          4096 Jun  7 20:29

root@beyonwizt4:/media/hdd/movie/UTF8_Files# ls -l | od -c
0000000    -   r   w   -   r   w   -   r   w   -                   1
0000020    r   o   o   t                       r   o   o   t
0000040                                            0       J   u   n
0000060        7       1   1   :   1   9       =   =  \n   d   r   w   x
0000100    r   w   x   r   w   x                   3       r   o   o   t
0000120                        r   o   o   t
0000140                4   0   9   6       J   u   n           7       2
0000160    0   :   2   9      \n   d   r   w   x   r   w   x   r   w   x
0000200                    3       r   o   o   t                       r
0000220    o   o   t                                           4   0   9
0000240    6       J   u   n           7       2   0   :   2   9      \n
0000260    d   r   w   x   r   w   x   r   w   x                   3
0000300    r   o   o   t                       r   o   o   t
0000320                                4   0   9   6       J   u   n
0000340        7       2   0   :   2   9      \n
0000352

root@beyonwizt4:/media/hdd/movie/UTF8_Files# ls -l | od -x
0000000     722d    2d77    7772    722d    2d77    2020    2020    2031
0000020     6f72    746f    2020    2020    7220    6f6f    2074    2020
0000040     2020    2020    2020    2020    2020    2030    754a    206e
0000060     3720    3120    3a31    3931    3d20    0a3d    7264    7877
0000100     7772    7278    7877    2020    2020    2033    6f72    746f
0000120     2020    2020    7220    6f6f    2074    2020    2020    2020
0000140     2020    3420    3930    2036    754a    206e    3720    3220
0000160     3a30    3932    0a20    7264    7877    7772    7278    7877
0000200     2020    2020    2033    6f72    746f    2020    2020    7220
0000220     6f6f    2074    2020    2020    2020    2020    3420    3930
0000240     2036    754a    206e    3720    3220    3a30    3932    0a20
0000260     7264    7877    7772    7278    7877    2020    2020    2033
0000300     6f72    746f    2020    2020    7220    6f6f    2074    2020
0000320     2020    2020    2020    3420    3930    2036    754a    206e
0000340     3720    3220    3a30    3932    0a20
0000352

Note the endianness is reversed in the hex output, but it appears the directory names have been reduced to a 0x20 and the filename simply truncated after the first two "==" chars.

Unfortunately the flavour of Busybox included with both my NASes (Qnap) don't include the od command, for comparison.

IanB
Wizard
Posts: 1550
Joined: Sat Jan 24, 2009 14:04
Location: Melbourne

Re: UTF8 characters not displayed in CIFS mount

Post by IanB » Fri Jun 08, 2018 15:06

Seems busybox's ls is somewhat crippled with utf-8 characters.

Try using "echo *" to try and reveal the hidden characters. i.e. :-

cd /media/hdd/movie/UTF8_Files
echo *

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Fri Jun 08, 2018 16:23

The firmware can (and does) display UTF-8-encoded Unicode code points. The font coverage may not be great, though, and it wouldn't surprise me if there were no Asian character points. I know that if a code point is display and there's no font entry for it, nothing is displayed (but IIRC it is logged in the debug log).

The display of UTF-8 encoded Unicode code points when using ssh or telnet is down to the capabilities of the respective servers and the terminal window display.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

IanSav
Uber Wizard
Posts: 16846
Joined: Tue May 29, 2007 15:00
Location: Melbourne, Australia

Re: UTF8 characters not displayed in CIFS mount

Post by IanSav » Fri Jun 08, 2018 16:56

Hi Prl,
prl wrote:
Fri Jun 08, 2018 16:23
The firmware can (and does) display UTF-8-encoded Unicode code points. The font coverage may not be great, though, and it wouldn't surprise me if there were no Asian character points. I know that if a code point is display and there's no font entry for it, nothing is displayed (but IIRC it is logged in the debug log).
Yes, missing glyphs leave the display blank but log an error like "[eTextPara] unicode U+ 6f1 not present". (I am seeing a bit of this testing VirtualKeyBoard.)

Regards,
Ian.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Fri Jun 08, 2018 17:07

IanSav wrote:
Fri Jun 08, 2018 16:56
... missing glyphs leave the display blank ...

I think it's more accurate to say that missing glyphs do not affect the display at all. They do not paint pixels, and they do not move the cursor.
IanSav wrote:
Fri Jun 08, 2018 16:56
... but log an error like "[eTextPara] unicode U+ 6f1 not present".

Thanks for confirming that.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Sun Jun 10, 2018 10:25

IanB wrote:
Fri Jun 08, 2018 15:06
Seems busybox's ls is somewhat crippled with utf-8 characters.

Try using "echo *" to try and reveal the hidden characters. i.e. :-

cd /media/hdd/movie/UTF8_Files
echo *

Code: Select all

root@beyonwizt4:/media/hdd/movie/UTF8_Files# echo *
冨田勲 喜多郎 坂本龍一 ==冨田勲++.mp3
Interesting...

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Sun Jun 10, 2018 10:29

prl wrote:
Fri Jun 08, 2018 16:23
The firmware can (and does) display UTF-8-encoded Unicode code points.

Seems that it does - but why would "echo" show the Japanese glyphs and "ls" not?

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Sun Jun 10, 2018 11:25

stevebow wrote:
Sun Jun 10, 2018 10:29
prl wrote:
Fri Jun 08, 2018 16:23
The firmware can (and does) display UTF-8-encoded Unicode code points.

Seems that it does - but why would "echo" show the Japanese glyphs and "ls" not?

You'd need to look at the busybox code for ls and the glob library (which the shell would be using to expand the echo arguments).
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Sun Jun 10, 2018 11:39

I guess this can be chalked up to the T4's particular flavour of Busybox/ls not able to properly return those characters.

What I am still mystified by though is how Putty is rendering those glyphs at all. The font used is Courier New, which contains no Asian language glyphs at all.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Sun Jun 10, 2018 12:21

prl wrote:
Fri Jun 08, 2018 16:23
The font coverage may not be great, though, and it wouldn't surprise me if there were no Asian character points. I know that if a code point is display and there's no font entry for it, nothing is displayed (but IIRC it is logged in the debug log).

The GUI skin I am using is easy-skin-aus-hd, and digging a little deeper I see the relevant font is tuxtxt.ttf, which is actually Vera Sans Mono Bold:
.
tuxtxt_Glyphs.png

There aren't any Asian glyphs present - I very much suspect any further glyphs may have been cut to reduce filesize, which is quite understandable.

May I suggest if Unicode glyphs are not present in the target font, rather than simply rendering a 0x20, perhaps an alternative glyph could be substituted, the unicode number of which could be defined in the GUI's code, as that substitute glyph may not be present in other fonts.

Or, if the GUI can mix fonts mid string, a glyph from a font that is guaranteed (hah!) to be present across firmware releases.

For example, for tuxtxt.tff the glyph 25CA could be used (the open diamond). This particular glyph is present in many but not all fonts in /usr/share/fonts (I presume that's where all GUI related fonts live). I think the presence of a symbol would at least be better than a space in affected directory/filenames.
Last edited by stevebow on Sun Jun 10, 2018 12:25, edited 1 time in total.

IanSav
Uber Wizard
Posts: 16846
Joined: Tue May 29, 2007 15:00
Location: Melbourne, Australia

Re: UTF8 characters not displayed in CIFS mount

Post by IanSav » Sun Jun 10, 2018 12:24

Hi Stevebow,

I think it would be better to find more complete versions of the few fonts that are actually used on Enigma2.

Regards,
Ian.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Sun Jun 10, 2018 12:37

IanSav wrote:
Sun Jun 10, 2018 12:24
I think it would be better to find more complete versions of the few fonts that are actually used on Enigma2.

I was considering the limited flash storage space available to firmware, but now that I actually check :roll: :

Code: Select all

root@beyonwizt4:~# df -h
Filesystem                Size      Used Available Use% Mounted on
ubi0:rootfs               1.8G    139.8M      1.6G   8% /
devtmpfs                563.7M      4.0K    563.7M   0% /dev
tmpfs                    64.0K         0     64.0K   0% /media
tmpfs                   563.9M    736.0K    563.1M   0% /var/volatile
/dev/sda1               931.2G    906.6G     24.6G  97% /media/hdd

While there is actually plenty of space available for a few, more complete fonts, the problem remains that your desired font (e.g. Vera Sans Mono Bold) may simply not be available with many more glyphs than latin based languages.

Investigating further, it appears that the Vera family of fonts is now under the auspices of Gnome. The font can now be found here: http://ftp.gnome.org/pub/GNOME/sources/ ... vera/1.10/ - it turns out that this is essentially the same as the font supplied with the T4.

There is a replacement project at https://dejavu-fonts.github.io/ , which adds many more glyphs above the original font, but no Asian language glyphs however. The fonts haven't been updated since July 2016 so I expect they never will.

But short story long :) , I still feel it would be better for the GUI to substitute a symbol rather than a 0x20 for non-existent glyphs.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Sun Jun 10, 2018 13:31

stevebow wrote:
Sun Jun 10, 2018 12:21
May I suggest if Unicode glyphs are not present in the target font, rather than simply rendering a 0x20, perhaps an alternative glyph could be substituted, the unicode number of which could be defined in the GUI's code, as that substitute glyph may not be present in other fonts.

In the UI code, if there is no glyph for a Unicode codepoint, a space isn't used instead, nothing is used. As I said, "missing glyphs do not affect the display at all. They do not paint pixels, and they do not move the cursor."
stevebow wrote:
Sun Jun 10, 2018 12:21
For example, for tuxtxt.tff the glyph 25CA could be used (the open diamond). This particular glyph is present in many but not all fonts in /usr/share/fonts

IMO, the correct replacement would be U+FFFD (REPLACEMENT CHARACTER), but, unfortunately, I suspect that that is missing in all fonts.

As far as the runtime cost of having more glyphs in the fonts goes, the font glyphs are managed through a cache, and so having more complete fonts shouldn't cause main memory bloat.

However, while the T4 and U4 have lots of firmware flash, the T2 and T3 only have 512MB.

The current fonts aren't very large: the ones used for general text display are ~150-350kB each.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Sun Jun 10, 2018 13:41

stevebow wrote:
Sun Jun 10, 2018 12:21
The GUI skin I am using is easy-skin-aus-hd, and digging a little deeper I see the relevant font is tuxtxt.ttf, which is actually Vera Sans Mono Bold:
.
tuxtxt_Glyphs.png

tuxtxt.ttf is aliased to Console, and isn't much used in easy-skin-aus-hd (mostly for the Console screen, and not much elsewhere). The most widely used font alias in the skin is Regular, which is aliased to nmsbd.ttf. nmsbd.ttf is about 7 times bigger than tuxtxt.ttf.

The Regular font is aliased to md_khmurabi_10.ttf in Full-Metal-Wizard and to nmsbd.ttf in OverlayHD.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Sun Jun 10, 2018 15:17

Here are the contents of nmsbd.ttf (Nemesis Flatline Bold). It doesn't include any east Asian language glyphs.
Screen Shot 2018-06-10 at 15.11.16.png
Screen Shot 2018-06-10 at 15.11.38.png
Screen Shot 2018-06-10 at 15.12.36.png
Screen Shot 2018-06-10 at 15.13.25.png
Screen Shot 2018-06-10 at 15.14.10.png
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

IanB
Wizard
Posts: 1550
Joined: Sat Jan 24, 2009 14:04
Location: Melbourne

Re: UTF8 characters not displayed in CIFS mount

Post by IanB » Sun Jun 10, 2018 15:26

stevebow wrote:
Sun Jun 10, 2018 11:39
...
What I am still mystified by though is how Putty is rendering those glyphs at all. The font used is Courier New, which contains no Asian language glyphs at all.
Yes that is weird, and not only putty but also the web browsers. The Windoze font system seems to have some sort of substitution algorithm. Matching by eyeball it seems to harvest the missing glyphs from Meiryo. Have a play with the Character Map tool. :twisted:

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Tue Jun 12, 2018 10:18

prl wrote:
Sun Jun 10, 2018 13:31
IMO, the correct replacement would be U+FFFD (REPLACEMENT CHARACTER), but, unfortunately, I suspect that that is missing in all fonts.

That would make a lot more sense. Out of interest, I checked all the font families in my Windows font directory, and only 6 families have a glyph (diamond with a question mark within) for that position, the minority by far.

As it happens, 3 of the PVR fonts do contain this glyph: Droid Sans Bold, the Open Sans family and the Roboto family.
.
However, while the T4 and U4 have lots of firmware flash, the T2 and T3 only have 512MB.

The current fonts aren't very large: the ones used for general text display are ~150-350kB each.

That would make it a little tight then for T2/T3. Some of those multi lingual glyph fonts run into the 10s MB.
.
tuxtxt.ttf is aliased to Console, and isn't much used in easy-skin-aus-hd (mostly for the Console screen, and not much elsewhere). The most widely used font alias in the skin is Regular, which is aliased to nmsbd.ttf. nmsbd.ttf

You are quite right. My apologies to all for my Vera Sans tangent, I had my fonts mixed. :oops: A consequence of a dozen consoles/windows/crap open on the screen at once! :?

Anyhoo, I substituted Nemesis Flatline with a font that does contain a U+FFFD, just to confirm what you already know prl :) , that this glyph is indeed not rendered as a substitute for non-existant glyphs.

I also substituted Malgun Gothic (~13MB) from my Windows fonts directory. This is a good font to try because it contains both Latin and Japanese glyphs:
.
T4_UTF8_Japanese.png

Success! Well, partially. Just to refresh, we should see this:

Code: Select all

-rw-rw-rw-    1 admin    everyone         0 Jun  7 11:19 ==冨田勲++.txt
drwxrwsrwx    3 Steve    everyone      4096 Feb 18  2016 冨田勲/
drwxrwsrwx   13 Steve    everyone      4096 Apr 19  2016 喜多郎/
drwxrwsrwx    4 Steve    everyone      4096 Feb 18  2016 坂本龍一/
So even this particular font appears not to be complete. But at least this exercise has confirmed the issue is with the font and not the GUI.

I tried searching for a more complete Nemesis Flatline font, but the only references seem to be with other PVRs, including a Nemesis Flatline skin. This font also contains no copyright/source information, so I suspect it is something rolled together for a particular skin.

Would it be worth the effort (for completeness?) to modify the GUI to use U+FFFD to substitute non-existent glyphs, and either use the glyph from one of the other supplied fonts or add it to Nemesis Flatline?

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Tue Jun 12, 2018 10:46

IanB wrote:
Sun Jun 10, 2018 15:26
stevebow wrote:
Sun Jun 10, 2018 11:39
...
What I am still mystified by though is how Putty is rendering those glyphs at all. The font used is Courier New, which contains no Asian language glyphs at all.
Yes that is weird, and not only putty but also the web browsers. The Windoze font system seems to have some sort of substitution algorithm. Matching by eyeball it seems to harvest the missing glyphs from Meiryo. Have a play with the Character Map tool. :twisted:

Yes, Windows appears to be playing some sort of trickery. If you paste the ls listing from my NAS into Notepad, then change Notepad's font to something silly (like Cooper Black), the Latin characters are all changed but not the Japanese. Pasting into a Word document shows the font to be MS Gothic. I am not sure how this trickery is managed in Windows as I can't find a setting for it.

prl
Wizard God
Posts: 32703
Joined: Tue Sep 04, 2007 13:49
Location: Canberra; Black Mountain Tower transmitters

Re: UTF8 characters not displayed in CIFS mount

Post by prl » Tue Jun 12, 2018 12:59

stevebow wrote:
Tue Jun 12, 2018 10:18
Anyhoo, I substituted Nemesis Flatline with a font that does contain a U+FFFD, just to confirm what you already know prl :) , that this glyph is indeed not rendered as a substitute for non-existant glyphs.

And there doesn't seem to be a built-in way to do that. The font mechanism does allow fallbacks, by setting a replacement font as a class variable in the eTextPara object. If that's set and a character isn't found in the normal font for the eTextPara, the character is looked up again in the replacement font. There's a limited over-ride for this that allows you to specify Unicode codepoints that will always use their replacement font glyphs even if there is a glyph for the codepoint in the normal font for the eTextPara.

The replacement font mechanism is used in the skins (they have replacement="1" in their alias), easy-skin-aus-hd & OverlayHD use ae_AlMateen.ttf as the replacement font. Full-Metal-Wizard uses md_khmurabi_10.ttf as the replacement font, but that is pretty pointless, since it's also used as the font file in all normal font aliases except Console :roll:.

But even where ae_AlMateen.ttf is the replacement font, it's not likely to help all that much, because it's one of the smaller fonts (total file size). It's 12.5kB, while nmsb.tff is 35.3kB and md_khmurabi_10.ttf is 24.8kB (I know hat's a very crude way to measure font size).

But if a replacement is set up that has east Asian (or south- or southeast-Asian) glyphs then they should be available in all GUI text displays that use Asian codepoints.

There isn't any mechanism to allow a skin to set forced replacement codepoints, even though the underlying C++ code supports that..

It wouldn't be hard, though, to add code for a "last ditch attempt" to supply a glyph if none of those mechanisms found a glyph.

All the font glyph replacement stuff is done in eTextPara::renderString() in lib/gdi/font.cpp.
Peter
T4 HDMI
U4, T4, T3, T2, V2 test/development machines
Sony BDV-9200W HT system
LG OLED55C9PTA 55" OLED TV

IanSav
Uber Wizard
Posts: 16846
Joined: Tue May 29, 2007 15:00
Location: Melbourne, Australia

Re: UTF8 characters not displayed in CIFS mount

Post by IanSav » Tue Jun 12, 2018 14:24

Hi,

The OpenPLi people have been pointing me to DejaVu as a more complete font set. I am experimenting with it at the moment and issues with Persian have been mostly resolved.

Here is a link to the font if you want to try it out yourself: https://dejavu-fonts.github.io/

This font is going to be added to OverlayHD. ;)

Regards,
Ian.

stevebow
Master
Posts: 482
Joined: Thu Sep 03, 2015 11:21
Location: Sydney

Re: UTF8 characters not displayed in CIFS mount

Post by stevebow » Thu Jul 05, 2018 13:34

prl wrote:
Tue Jun 12, 2018 12:59
It wouldn't be hard, though, to add code for a "last ditch attempt" to supply a glyph if none of those mechanisms found a glyph.

It would obviously be better to render something for missing glyphs than nothing at all (visually), as it stands now.

Post Reply

Return to “Networking”