Mondes iranien et indien

Home > Recherches > Projets en cours > Tamil Epigraphy Database project

Tamil Epigraphy Database project

Thursday 17 November 2016

All the versions of this article: [English] [français]

Tamil Epigraphy - Lexical and Grammatical Database with Searchable Engine
Collaboration between Tamil Virtual Academy, Chennai (Inde) & UMR7528 Mondes iranien et indien (France)

Title of the project
Tamil Epigraphy - Lexical and Grammatical Database with Searchable Engine

Calendar : 2 years (August 2016 - July 2018)

Principal investigator
Appasamy Murugaiyan, EPHE-UMR 7528 Mondes iranien et indien, France

Jan Kucera, Institute of South and Central Asia, Charles University, Praha, Czech Republic.
Dr R. Poongundran, Epigraphist, Tamil Nadu, India
Dr Vasu Renganathan, Centre for South Asian Studies, University of Pennsylvania, USA.


Description and structure of the database
This pilot project aims to make available the Tamil epigraphic texts as a common resource for further analyses and explorative research possibilities. Upon completion, the database will furnish the needs of researchers and common readers who are interested in probing the history of Tamils and and their culture through Tamil Epigraphical data. The database will be conceived of as a multi-disciplinary tool that will cater the needs of the researchers from a number of disciplines including humanities and social sciences like history, social anthropology, geography, economics, art history, architecture, linguistics, literature, religion and other related disciplines.

The database that is proposed to draw records from Tamil inscriptions on a historical basis will present all necessary meta-data enabling the users to locate each inscription chronologically and geographically. The database will display records from Tamil inscription in a number of formats including digital photograph, estampage, typescript, text in the original script (vaṭṭeḻuttu, tamiḻ, grantha) and in modern Tamil script along with suitable transliteration form (as in Madras University Tamil Lexicon). Each text will be accompanied by a translation in English. All of these varieties of data will be stored in a relational database in such a manner that the information can be retrieved from a number different perspectives, as would be envisioned by the end users.

The search engine
The text input will be in Unicode characters and will tend to evolve as more data is added on an ongoing basis. The system will enhance search in a number of different fashion including a) in any of the following Unicode script: vaṭṭeḻuttu, grantha, classical Tamil and modern Tamil and b) both in Tamil or English. After segmentation, every information, in addition to POS tagging, will be marked for a number of semantic (sub-) fields. A multilevel search can be carried out based either on a given lexical item, a given grammatical category or other semantic (sub-) fields. The search engine will enable users to look for text with any given grammatical category or any other related attributes as available in the database. Thus, the proposed search engine and the relational database will be construed as exhaustive as possible so all of the information from the database can be made accessible from a multidisciplinary perspective. In particular, this database will be stored in the open-source database MySQL and a search engine will be built using the programming language PHP with AJAX technology. A hosting company, like <> will be used to host database and application during the development period before moving the final version into the servers of Tamil Virtual Academy <> . Search engine used to search Tamil literature texts extensively as in <> will be considered as a template for this project.

Aims of the project

  • Preserving and archiving of Tamil epigraphical data in a digital form.
  • Document a complete list of Tamil inscriptions recorded till date. These inscriptions, in general, are found to be available in a scattered form in many major sources and one of the primary scopes of this project is to account for them schematically. The list will contain a full set of meta-data as can be found from the volumes of South Indian Inscriptions and will be compiled based on their sources like South Indian Inscriptions, Publications of the Tamil Nadu State Department of Archaeology, the journal of āvaṇam, published by the Tamil Nadu Archaeological Society, South Indian Temple Inscriptions, Epigraphia Indica, Epigraphia Carnatica, Damilica, Inscriptions of the Pudukkottai State, Travancore Archaeological Series, Tirumalai-Tiruppati Devastanam Epigraphical series and other occasional publications and collections.
  • Develop a lexical and grammatical database of inscriptions (5th to 8th century CE)
  • To develop electronic dictionaries of Tamil epigraphy and as a complement to the existing Tamil Epigraphical dictionaries and glossaries.
  • Contribute to develop research in Tamil historical linguistics and other historical studies.
  • The database and the research engine will be distributed as a free resource under the terms of GNU General Public License.



Contact & information :