smufl-discuss

[smufl-discuss] Re: Discussing Unicode and encoding dilemmas

Classic

List

Threaded

1 message

David Webber

[smufl-discuss] Re: Discussing Unicode and encoding dilemmas

From: Grzegorz Rolek

> Glyph index is fundamental to any TrueType-based font implementation (thus
> also OpenType). It's the number by which the font engine, and the font
> itself, recognizes which glyph is which in every glyph-related operation.
> This is different to a code point (a character), which is a text encoding
> thing. Any particular code point is mapped to a particular glyph index in
> a font.

OK - I wasn't 100% sure tat this is what you meant.

> For example, in Unicode encoding, a code point U+0020, the space
> character, is normally mapped to a glyph representing space, but that
> glyph can be identified within the font with any index the font developer
> wants. This mapping, and there can be several of these for different text
> encodings for any given font, are all contained in the font.<

In my own experience of creating fonts, I got the impression that the there
was (in any given font) a 1-1 mapping of code-point onto 'glyph index' and
that you can't give symbols the same code point if they have a different
glyph index. But I haven't researched it.

> This exact code point (character) versus glyph distinction makes it
> possible to have more glyphs in the font than any particular text encoding
> requires. I believe the API you're using is able to pick up any glyph from
> a font solely by its index, whenever it's mapped to a code point
> (character) or not. <

I don't think so. Windows has two basic sets of APIs one for use with
fonts using old-fashioned code pages, and another for use with Unicode
(UTF-16).

The first set takes text strings in which each character is a single byte,
and so you can't have more than 256 of them. In this case, I believe the
singe byte characters are effectively glyph indices - at least for old
fashioned Windows symbol fonts.

The second set (preferred these days) gets its characters from UTF-16LE
encoded code points (and I'd have to check which APIs are happy with
surrogate pairs).

I can't see that a computer program can ask for a Unicode character by
specifying the glyph index: a different font could surely be arranged to
have different glyph indices corresponding with the same code point, and the
program using it would never know. [One of my sample files has a Balkan
song with a title in a single text string in both Latin and Cyrillic
letters: I can change the font and, as long as it supports both, I get the
same text. I know the code points of the letters but nobody has told me
what the internal glyph indices are.]

> Still, this is pretty low-level. Font features are a higher level
> mechanism for picking up and shuffling around such unencoded, that is, not
> mapped to any code point, glyphs in the text stream. But yes, these are an
> addition to bare TrueType fonts, although they're supported on most, if
> not all, modern platforms. Especially on Microsoft's platforms, as it's
> the one who has developed, in collaboration with Adobe, the whole
> technology more than a decade ago.<

I am not (yet) aware of the range of what is possible with OpenType
technology - I have only created TrueType fonts in the past. But my
instinct is to keep things simple. The idea that you have nearly all of
what you need to write music in a simple form, but you have to use more
complicated techniques to get just one or two of the symbols, just feels
horribly wrong, especially when there are code points occupied by symbols
(like upside-down clefs) which are neither use to man nor beast :-(

I understand that fonts can be designed so that eg whenever "fi" is
encountered the two characters are automatically replaced by a single
ligature character, even when there is no code point for the composite
character. But the clef vs clef-change choice is a very different case from
that, sometimes you want one and sometimes the other.

Dave

David Webber
Mozart Music Software
http://www.mozart.co.uk/

#############################################################
This message is sent to you because you are subscribed to
the mailing list <[hidden email]>.
To unsubscribe, E-mail to: <[hidden email]>
To switch to the DIGEST mode, E-mail to <[hidden email]>
To switch to the INDEX mode, E-mail to <[hidden email]>
Send administrative queries to <[hidden email]>