Thoughts on Unicode in India

I have been thinking about multilingual publishing on the Web and in other digital forms, specifically in Indic languages, in the context of my work at Pratham Books and around Wikipedia and the more I think about it, the more I believe it to be crucial in enabling the preservation of language and culture as also in being able publish and to spread knowledge and culture.

These are early thoughts that I will refine over the next few months in to a white paper and would welcome feedback on this.

  1. While there are many ways to achieve a legal framework for inter-operable content (CC, GFDL, PD or the Copyright Act Amendment for the Print Impaired) etc. there needs to be a technical framework for such interoperability as well.
  2. Given that we (Pratham Books) publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard.
  3. Given India’s push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats.
  4. Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in.
  5. The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised Open Type Indic Unicode fonts.
  6. Given the importance of linguistic diversity to India’s cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise.
  7. The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well.
  8. The GoI has recognised this and notified Unicode 5.1.0 as the de-facto standard for all eGovernance projects. This standard needs to be more widely adopted for all Government digital projects and any software or content procurement as well.
  9. Use of Unicode will significantly reduce bandwidth/storage as they are more efficient, allows for universal search (within a page/web search etc.), sorting and indexing, for text-to-voice synthesis, for machine translation and allows for greater and better search engine optimisation.

I have to add that I am not an expert on this field and may have things down completely incorrectly – I really do need your help in working on this, please.

Update on 22.02.2011

Thanks to Santhosh over at the Wikimedia India mailing list, I have learnt much. In particular that Unicode isn’t a font as such but a method of encoding information.

From what I understand – there are three components:

  1. Input (Different types of keyboard layouts are used but are independent of the method of encoding.)
  2. Encoding and storing the input (ASCII is the older method. Unicode is the standard.) 
  3. Representing, visually for the human user, what has been inputed and encoded. (Font or typefaces and these are, to an extent, independent of the encoding method used.)

An excellent resource of “List of available Indic fonts for scripts encoded in Unicode.” http://indlinux.org/wiki/index.php/IndicFontsList

Update on 17.03.2011

To add: http://generalprop.newschallenge.org/SNC/ViewItem.aspx?pguid=900d111c-1475-4d78-8fab-789663818724&itemguid=696ce9df-02c5-4585-8062-a96e57ef50f7

Advertisements

About gkjohn

Recovering lawyer, erstwhile entrepreneur, pretend polymath, hopeful zookeeper and future dilettante and farmer of organic strawberries. Work at @aksharadotorg and @klpdotorg. Previously at @prathambooks. Was a @tedfellow.
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Thoughts on Unicode in India

  1. Anonymous says:

    I don’t subscribe to the idea that the Government should be in the business of funding development of Opentype,Unicode compliant fonts. Perhaps that stems from my belief that the Government per se shouldn’t be in the business of selling milk, bread and all of that. It generally leads to centralized planning and, mis-utilization of resources.The role of the Government in this case should be to limit itself towards providing a policy guidance and, thereafter ensuring that it hosts a reference implementation (there is a reference font for example, but no way to re-use it or, even test against it for measuring variance) . The typical problem with fonts in India is somewhat linked to the fact that Ministry of IT (TDIL) is the sole representative on the Unicode Committee and, has more often not indulged in a consultative process before forwarding a decision on Unicode. With the availability of the Unicode charts, it should not be difficult for the existing fonts to check for inconsistencies and thereafter patch them. Similarly, it should be trivial to work out new keyboard layouts (especially the various typewriter layouts that are requested ever so often) and package them to be made available via Linux distributions.Specific to fonts, it is not only display-ready fonts that are required. Investment is required to have print-ready fonts as well. The lack of print ready fonts make it extremely difficult to have aesthetics involved in the publication of Indic content. A way of resolving this, without waiting for the Government to get involved is to look at a kickstarter model – identify developers who are willing to put in time and effort to develop fonts in the Open Source way and thereafter open up kickstarter pledges for each font.The entire set of events that are initiated from input-storage-rendering-printing require fixes and checks with known Open standards. And, at each step there is a requirement for a policy framework that vendors of software applications should adhere with especially when supplying software for public consumption.

  2. Anonymous says:

    Sankarshan, thank you very much for your detailed comments. I understand, and would agree with you, on the Government support though I am coming at this from a public policy kind of angle so have left it in, for now. I agree with you, and have included, print fonts as well. Will add the reference font issue – thanks for pointing it out.

  3. Anonymous says:

    Took me a while to recall the name of the reference font but here it is – http://tdil.mit.gov.in/download/SakalBharati.htmAdditionally, I’d request that you give http://sankarshan.randomink.org/blog/2007/08/22/notes-on-l10n-and-language-technology-recommendations/ and, http://sankarshan.randomink.org/blog/2007/11/02/respectlovecontributorsusershow-one-can-progress-with-indic-bits/ a quick read. These are older entries but the situation is somewhat continuing.In short, Indic, either for print/publishing or, for desktop usage requires a structured effort in areas of input, keyboard layout, a11y, storage of data, rendering of data. At each of these areas there are existing open standards that need to be complied with and, there are certain changes that need to be proposed into the standards. Unicode forms a bulwark because input and rendering pieces build themselves according to what Unicode stipulates.

  4. Anonymous says:

    Thank you, Sankarshan. On the reference font – what is its function? As a reference, yes, but how so? What does one check against it? And thanks for the two links – learned much from them. Will you ever be in Bangalore soon? I could spend a long time with you and will learn much.

  5. Anonymous says:

    Some small mistakes:In the fifth point, it should be "OpenType", not "open type".And people usually spell it as "typeface" instead of separated "type face". :)I’m still quite curious about how "Use of Unicode will significantly reduce bandwidth/storage as they are more efficient" this idea came out. Is it an ideal expectaion to Unicode? Or people are misled by the government?

  6. Anonymous says:

    Hai – thanks for pointing out the continuing fallacy I was propoogating on the efficiency bit. I don’t quite remember where I got that bit of information from. Fixed the rest.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s