I have been thinking about multilingual publishing on the Web and in other digital forms, specifically in Indic languages, in the context of my work at Pratham Books and around Wikipedia and the more I think about it, the more I believe it to be crucial in enabling the preservation of language and culture as also in being able publish and to spread knowledge and culture.
These are early thoughts that I will refine over the next few months in to a white paper and would welcome feedback on this.
- While there are many ways to achieve a legal framework for inter-operable content (CC, GFDL, PD or the Copyright Act Amendment for the Print Impaired) etc. there needs to be a technical framework for such interoperability as well.
- Given that we (Pratham Books) publish in Indian languages, using Unicode fonts are the only way to achieve cross-platform interoperability and is a global standard.
- Given India’s push towards copyright reform for the print impaired, it is imperative that Unicode fonts be used in the creation of Indic content because it is otherwise a huge barrier to conversion to print-friendly formats.
- Unicode, being an open global standard guarantees content accessibility in the future and ensures no proprietary font and vendor lock in.
- The limitation is on the lack of high quality and varied typefaces that are both screen and print optimised Open Type Indic Unicode fonts.
- Given the importance of linguistic diversity to India’s cultural heritage, it is imperative that greater attention is paid to the development of such fonts under licenses that allow for free re-use and to fix issues in the fonts that might arise.
- The Govt. should fund the open development of at least 5 such fonts for each the 21 Constitutionally recognised languages and make these available not just for free, but under free license to re-use and improve as well.
- The GoI has recognised this and notified Unicode 5.1.0 as the de-facto standard for all eGovernance projects. This standard needs to be more widely adopted for all Government digital projects and any software or content procurement as well.
- Use of Unicode will significantly reduce bandwidth/storage as they are more efficient, allows for universal search (within a page/web search etc.), sorting and indexing, for text-to-voice synthesis, for machine translation and allows for greater and better search engine optimisation.
I have to add that I am not an expert on this field and may have things down completely incorrectly – I really do need your help in working on this, please.
Update on 22.02.2011
Thanks to Santhosh over at the Wikimedia India mailing list, I have learnt much. In particular that Unicode isn’t a font as such but a method of encoding information.
From what I understand – there are three components:
- Input (Different types of keyboard layouts are used but are independent of the method of encoding.)
- Encoding and storing the input (ASCII is the older method. Unicode is the standard.)
- Representing, visually for the human user, what has been inputed and encoded. (Font or typefaces and these are, to an extent, independent of the encoding method used.)
An excellent resource of “List of available Indic fonts for scripts encoded in Unicode.” http://indlinux.org/wiki/index.php/IndicFontsList
Update on 17.03.2011