Objective

Current state of Eastern Armenian studies requires new approaches and linguistic tools to validate key empirical hypotheses and findings as well as to expand the field of research. Corpus-based approach will allow revisiting the aspects of the traditional grammar that have not been sufficiently studied and will facilitate developing new descriptive and theoretical concepts.

Eastern Armenian National Corpus (EANC) provides linguists with a searchable annotated database of Eastern Armenian. EANC includes empirical linguistic data ranging from classical Standard Eastern Armenian literature to Yerevan street talk recorded and transcribed in 2008.

The immediate objective of EANC is to help linguists find and explore sentences (occurrences) in SEA texts that meet specific search criteria. EANC allows searching for:

wordforms and lexemes
part-of-speech categories, morphological attributes, and inflection types
punctuation
contextual queries and collocations

EANC also provides a researcher with an option to build a user-defined subcorpus, such as a single author subcorpus, or a subcorpus containing specific genres and/or periods.

Since EANC provides samples of actual SEA usage across periods, genres, and discourse formats, it can also be used as a powerful educational resource. English translations are provided for about 85 percent of the tokens, facilitating the use of the corpus by non-native speakers, e.g. Armenian language learners. EANC can also be used in various fields such as literature and culture studies, journalism, history, and others.

EANC is a "national" corpus in the sense that it attempts to build the fullest possible representation of the Eastern Armenian language in all its culturally and socially significant aspects, following in the tradition of existing online national corpora – British National Corpus, Russian National Corpus and others.

Importantly, EANC is as much about corpus linguistics as it is about Armenian studies. The EANC team aims to build a modern flexible linguistic database that can be used as a platform for creating corpora of other languages, exploring statistical approaches to language description, as well as applying natural language processing methods.