Written discourse


As of March 2009, written discourse in EANC includes over 106 million tokens. There are 510 authors in the EANC database not counting the press subcorpus.

Written Discourse

Tokens

% EANC

Press 47 264 735 42,9%
Fiction 37 279 344 33,8%
Science 13 875 930 12,6%
Other Non-Fiction 4 735 997 4,3%
Poetry 3 648 160 3,3%
Total Written Discourse

106 804 166

96,8%





Various genres of EANC texts are distributed unevenly over time. The 19th and 20th centuries are mostly represented by literary texts, prose and poetry. Some older press has been added to the corpus in a joined project by EANC and the Armenian National Library (see Press Archive). The main bulk of the press subcorpus, however, was acquired by downloading texts from open newspaper archives and thus represents the modern (from 2000 on) language of internet news resources of the Republic of Armenia (see also Armenian texts online). This makes the ratio between press and fiction texts for the last decade very different from the same ratio for the rest of the corpus.

Nonfiction texts are also represented unevenly over time. Most scientific texts come from the Soviet period (primarily, the 1960s and 70s). Most legal texts, however, have been obtained from open internet sources and come from the last decade.

Period

            Prose

           Poetry

             Non-fiction

        Press

Total by period

 

tokens

% period

tokens

% period

tokens

% period

tokens

% period

 
                   

before 1870

291 930

64%

3 630

1%

n/a

0%

160 704

35%

456 264

1870 - 1879

514 702

53%

48 811

5%

249 572

26%

149 631

16%

962 716

1880 - 1889

1 431 103

74%

4 020

0%

48 411

3%

446 963

23%

1 930 497

1890 - 1899

801 630

100%

n/a

0%

n/a

0%

n/a

0%

801 630

1900 - 1909

735 988

36%

84 430

4%

253 204

12%

954 997

47%

2 028 619

1910 - 1919

451 942

60%

61 526

8%

n/a

0%

245 806

32%

759 274

1920 - 1929

739 636

44%

296 573

18%

44 170

3%

599 488

36%

1 679 867

1930 - 1939

2 211 314

57%

27 747

1%

242 714

6%

1 410 425

36%

3 892 200

1940 - 1949

922 848

46%

138 791

7%

198 717

10%

732 734

37%

1 993 090

1950 - 1959

2 408 255

47%

784 771

15%

462 914

9%

1 421 629

28%

5 077 569

1960 - 1969

4 013 652

57%

479 107

7%

425 842

6%

2 176 226

31%

7 094 827

1970 - 1979

5 885 441

48%

121 854

1%

4 354 936

36%

1 899 469

15%

12 261 700

1980 - 1989

3 983 807

34%

69 216

1%

5 935 592

50%

1 861 032

16%

11 849 647

1990 - 1999

1 227 048

37%

78 553

2%

1 324 881

40%

650 432

20%

3 280 914

2000 - 2008

1 129 320

2%

57 638

0%

4 174 458

10%

34 552 624

88%

39 914 040

 

 

 

 

 

 

 

 

 

 

undated

10 530 728

82%

1 391 493

11%

896 516

7%

2 575

0%

12 821 312

 

 

 

 

 

 

 

 

 

 

Total

37 279 344

35%

3 648 160

3%

18 611 927

17%

47 264 735

44%

106 804 166

 

One of the important objectives of EANC is to collect as many Standard Eastern Armenian fiction texts as practicable. EANC includes all school reading texts in today’s Armenian secondary school program, as well as the vast majority of SEA classical literature starting from Khachatur Abovian (mid-19th century). Many classical writings from before 1938 are now accessible for full view in EANC Electronic Library.