Does the ilreq ABNF work for consonant clusters that don't form conjuncts?

This issue is carried over from an unanswered issue at https://github.com/w3c/ilreq/issues/31

In the following i distinguish between consonant _clusters_ and _conjuncts_, where the latter involves special shaping and the hiding of the VIRAMA, because that's where the difference lies afaict.

In the ilreq doc section [2. Indic orthographic syllable boundaries](https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries), contains a set of ABNF rules for indicating syllable boundaries, which are referred to for many applications, such as vertical text, line wrapping, initial-letter styling, etc. The examples include Tamil, however (with the exception of  க்ஷ, ஶ்ரீ , and ஸ்ரீ ) modern consonant clusters in Tamil don't form conjuncts in the same way as, say, Devanagari or Bengali. Instead, Tamil simply applies a pulli (virama) dot above the consonant without a following vowel, eg. கேட்டுக்.

Given a Tamil word such as யாவற்றையும் (yāvaṟṟaiyum), should the break points for text segmentation, line breaking , drop letter (if the cluster appeared at the start of the text), letter spacing in horizontal text, and vertical text representation conform to this:

A ![screen shot 2017-12-06 at 09 21 15](https://user-images.githubusercontent.com/4839211/33654128-132e6b6e-da67-11e7-935d-5c584a6e9664.png)

or this?

B ![screen shot 2017-12-06 at 09 21 32](https://user-images.githubusercontent.com/4839211/33654147-1f15f654-da67-11e7-8222-aa3d9eac5f5a.png)

The latter is what the ilreq document currently suggests.

A similar question arises when fonts don't produce certain conjuncts in other scripts, for one reason or another, or where a ZWNJ is added to prevent a conjunct forming. Where are the break points for the following?  Are they:

C ![screen shot 2017-12-06 at 09 50 50](https://user-images.githubusercontent.com/4839211/33655497-1b03f698-da6b-11e7-9cf0-83fe60fa5b69.png)

or

D ![screen shot 2017-12-06 at 09 51 06](https://user-images.githubusercontent.com/4839211/33655512-21ec0a5e-da6b-11e7-8ef4-de68ec880481.png)

Given that for a more typical rendering of the text the break points, as described in the ilreq doc, would be:

E ![screen shot 2017-12-06 at 09 51 18](https://user-images.githubusercontent.com/4839211/33655532-342d21ee-da6b-11e7-8d81-5cb031529159.png)

UAX#29 currently doesn't produce E for Devanagari (which is what the ilreq doc requires). It produces something more like C. But UAX#29 is about to change, so that by default a whole consonant cluster will be seen as a unit (ie. E). The effect of that upcoming change is not completely clear, however, for scripts like Tamil, or Devanagari when the virama is showing. I'm looking for someone to provide expert advice for what would be expected in those situations.

A reliance on the shape of the text is not described in the ilreq document, which i think is problematic. (It's also problematic for the general concept of grapheme clusters in Unicode, which should count as one unit the whole of a conjunct sequence such as ஶ்ரீ but not a Tamil consonant cluster such as த்தை.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does the ilreq ABNF work for consonant clusters that don't form conjuncts? #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does the ilreq ABNF work for consonant clusters that don't form conjuncts? #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions