Additional Materials

This section includes techniques & links to help you find answers for your academic papers & projects.

Standards & Specifications

The Eagles Guidelines

The Eagles Guidelines provide guidance for markup to be used with text corpora, particularly for identifying features relevant in computational linguistics and lexicography.

TEI P5: Guidelines for Electronic Text Encoding and Interchange

The TEI has served for many years as a mature annotation format for corpora of different types, including linguistically annotated data.

Leipzig Glossing Rules

The rules cover a large part of linguists' needs in glossing texts, but most authors will feel the need to add (or modify) certain conventions (especially category labels). For Georgian Version see, ლაიფციგის გლოსირების წესები.

Eurotyp Guidelines

The Eurotyp Guidelines provides program in Language Typology developed by the Committee on Computation and Standardization.

MULTEXT-East Morphosyntactic Specifications

A multilingual dataset for language engineering research and development, focused on the morphosyntactic level of linguistic description.

The Digital Humanities Manifesto

Just a vision on the future of Humanities.

Corpus of Mingrelian Language

Under Construction.

Language Processing Tools

IPA Help 2.1

A useful, simple tool for learning to recognize, transcribe, and produce the sounds of the International Phonetic Alphabet (IPA).

Speech Analyzer

Speech Analyzer facilitates acoustic analysis of speech sounds.

Praat: doing phonetics by computer

Praat is a free computer software package for the scientific analysis of speech in phonetics.

Field Linguist's Toolbox

Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text.

ELAN

ELAN is a professional tool for the creation of complex annotations on video and audio resources.

Gephi

Gephi is the leading visualization and exploration software for all kinds of graphs and networks and it can be easily adopted for the visualization of linguistic data.

PC-KIMMO

The program is designed to generate (produce) and/or recognize (parse) words using a two-level model of word structure in which a word is represented as a correspondence between its lexical level form and its surface level form.

FSM

This tool is a practical guide to finite-state theory and to the use of the Xerox finite-state programming languages LexC and xfst.

AntConc etc.

A freeware corpus analysis toolkit for concordancing and text analysis and a lot of other tools.

General Linguistics Websites

The Linguist List

The Linguist List is dedicated to providing information on language and language analysis.

SIL Organization

SIL serves language communities worldwide, building their capacity for sustainable language development, by means of research, translation, training and materials development.

The World Atlas of Language Structures Online

The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars).

The Language Archive

The Data Archive at the Max Planck Institute for Psycholinguistics is storing a lot of unique material, from a large variety of languages worldwide, which is recorded and analyzed by researchers from different linguistic disciplines.


Endangered Languages

DOBES

The DOBES Archive contains language documentation data from a great variety of languages from around the world that are in danger of becoming extinct.

Endangered Languages Database

The Endangered Languages Database project includes a database of language endangerment levels with references to collections and recordings of oral literature that exist in archives around the world.

UNESCO Atlas of the World's Languages in Danger

The online edition of the Atlas of the World's Languages in Danger is a tool to monitor the status of endangered languages and the trends in linguistic diversity at the global level.

Georgian Corpora

The GNC Project

The Georgian National Corpus is a comprehensive corpus of the Georgian language covering all stages of its historical development.

The Georgian Language Corpus

The Georgian Language Corpus (GLC) is a corpus comprising texts written in Old, Middle and Modern Georgian Language and equipped with additional features for their analysis.

Linguistic Portrait of Georgia

The Linguistic Portrait of Georgia is a database aiming at representing the results of different projects on Georgian Dialects.

Wardrops' Collection Online

The Wardrops' Collection Online (WCO) is a digital repository and research project devoted to the Wardrops' Collection of Georgian manuscripts preserved at the Bodleian Library.

Corpora Worldwide

Open Language Archives Community (OLAC)

OLAC is part of a larger community known as the Open Archives Initiative. The OAI develops and promotes interoperability standards for digital archives, and currently spans dozens of archives and a total of over a million records.

The Child Language Data Exchange System (CHILDES)

CHILDES is the child language component of the TalkBank.