Mr. Katarina Aladrović Slovaček, research assistant

Melita Ivanković, teacher

The Faculty for Teacher Education, University of Zagreb

Savska cesta 77, Zagreb

Analysis of Croatian corpus of child language (age 7 to 12) and its usage in teaching

Language as an abstract system of signs has its two realizations: oral and written. A child utters syllables first, then words, phrases and then sentences. The process of voice articulation, together with one of the phases in language learning, is finished by the age of 6. In this phase a child whose language development is proper is capable to compose a sentence consisting of five, six or seven words (Pavličević-Franić, 2005). Vocabulary of a seven-year-old contains around 10 000 words (Pavličević- Franić, Gazdić- Alerić, 2010). Institutional learning of the Croatian language as a mother tongue starts in the kindergarten and it happens through different communicational language games. However, when starting school (at the age of 6 or 7), a child starts to learn language systematically. Since the Croatian language, as a Slavic language, is morphologically rich, language learning often presents a real problem to children. They are afraid of learning language and they don't like it (Pavličević- Franić, Aladrović, 2009). This fact shows the necessity to change language teaching, especially in primary school which according to Croatian educational system lasts eight years. Those changes require the usage of communicative-functional approach in language teaching (Miljević- Riđički and others, 2004).

Croatian National Educational Corpus has over a hundred million basic words (www.hnk.ffzg.hr, June 2010). A base of Croatian child language, which includes a period till the age of six, was created within a CHILDES base. But, the corpus of Croatian child language during primary school is not registered in the mentioned bases, so the purpose of this research is to analyze the corpus of child language from the age of seven to twelve, the period when children are in the concrete operational stage (Piaget, 1969). The corpus has around 1500 written works collected while doing a research in 30 primary schools in all the regions of Croatia, and it contains around 30 000 words. The corpus will be coded and afterwards analyzed on morphological, syntactical and lexical level. The research will also try to answer the question of how to use the corpus while learning the mother tongue in primary school and help children to start loving their language and to learn it happily.

I. Language acquisition and learning

Language makes man different from all other creatures on the Earth and therefore the language acquisition is, on one hand, a completely usual occurrence while, on the other hand, it is very special and fascinating. The language acquisition itself shows the general features common to all children in the world since all of them manage to successfully acquire the language regardless of its other features, regardless of the language to which the children are exposed in natural situations and the teaching method since they manage to acquire the most different language stages in a very short period. The child's language development is connected to its physical, cognitive, emotional, social and communicative development (Owens, 1984). In the first several years the children of orderly language development gain full control of their language. When they are five years old, the children's vocabulary comprises 1,000 words, the majority of the phonological and grammatical system of their language has been acquired as well as the basics of the word meaning and their use and the manner of use of language in certain situations (McGregor, 2009: 203). The language acquisition also depends on the habits of spoken language in the child's surroundings, the speech of their parents, other members of the community which includes other children interacting with them. Though all children are able to learn the language to which they are exposed in early childhood, there are nevertheless individual differences among the children, such as: features of their mother tongue and different circumstances in which the language is learnt since they influence the speed by which the language is learnt. In order to explain the language acquisition, the researchers developed several different theories which can be divided into three main ones: behaviourist, generative and cognitive. Behaviourists (most significant of them being B. G. Skinner), consider the language acquisition to be learnt behaviour and therefore they condition it by creation of associative links between the stimulus and the response. They believe that the language and speech are learnt by imitation of speech of the adult person which could be called learning according to the model: auditive/visual stimulus - response to stimulus - reinforcement. The child listens to the model and imitates what they have auditory perceived. Imitating the adult speakers, by the method of trials and errors, stimulation and repeating, the child acquires the language structures which results in improvement of their language development (Pavličević-Franić, 2005). Generative theory occurred in the 50's of the XX century. In linguistics, this period was marked by development of generative grammar which views at the language as the knowledge of people to whom that language is a mother tongue or who are native speakers of that language. The aim of this linguistic theory is to reach the grammar in the mind of the speaker (Palmović, 2005). N. Chomsky, the creator of the generative grammar, differentiates the competence of the native speaker as the unconscious innate language knowledge and the performance – actual use of language in the actual situation. He believes that language acquisition is actually grammar acquisition because the children are born with language abilities and general knowledge about the form of the human language (Vilke, 1991). In order to develop it, it is necessary to expose the children to the language of the environment. In this way, the innate grammar of the children is stimulated and appropriately reinforced. This is how the generativists prove the fact that almost all children manage to acquire the mother tongue regardless of their other differences and the differences among the languages to which they are exposed (Jelaska, 2007). Chomsky is the main representative of the nativist theory which explains the easiness and the speed by which the children acquire the language thanks to the fact that a large part of their language knowledge is innate to them (Palmović, 2005). The innateness of the language model, according to Chomsky (1965), explains the similarity in the process of language acquisition in different languages and cultures. Chomsky calls the content of the said language model LAD - the language acquisition device. According to such language acquisition model, the child is exposed to the language data from which they discover the language parameters specific for the particular language (Kuvač and Palmović, 2007: 52). This means that all children go through the same stages of language acquisition, use similar structures and make similar deviations from the language to which they are exposed, regardless of the language which they are acquiring. They only have to be exposed to any human language and their innate grammars will be stimulated and reinforced in a certain way. Based on these facts one can conclude that the language speakers adopted the production rules applicable to new linguistic occurrences and therefore the language acquisition is actually the grammar acquisition and acquisition of the cognitive system which enables the people to understand and use the language. Grammar is not learnt, it is acquired, adopted, spoken (Jelaska, 2007: 68). Though the grammars of natural languages differ, they also have a lot of similarities which are called universalities. The said universalities are considered to be an important proof of innateness because they could not have appeared by accident. The theory of innate ability is also proved by the fact that the children manage to master the language much better than it can be expected on the basis of the language data which they have been exposed to. The representative of cognitive theories, J. Piaget (1967), believes that cognitive abilities enable learning in general, which includes the learning of language which means that the developed cognitive abilities are necessary precondition for successful language development (according to: Pavličević-Franić, 2005). Piaget considered the language to be a means of the thinking process or thinking about the reality, the appearance of language therefore depending on the structure of the reality itself. In view of this fact, he believed that the appearance of language is conditioned by the level of the sensorimotor intelligence during the first eighteen months of the child's life. J. Piaget (1947.) believes that cognitive abilities enable learning in general, including the learning of language which means that the developed cognitive abilities are precondition for successful language development (cognitive theory). He claims that the language acquisition and learning happens in four stages: sensorimotor (from birth to the age of two); preoperational (from the age of two to the age of seven), concrete operations (from the age of seven to the age of eleven/twelve) and formal operations stage. After discussing Piaget’s theory, L. Vigotski (1962) concluded that the child becomes a sensible being at the moment of occurrence of speech and that the development of cognitive abilities and the child's development depend on language and are conditioned by language (according to Kovačević, 1996). The language acquisition does not end when the child enters school (in Croatian educational system about the age of seven), but goes on until the age of twelve when the language automation occurs which means that the children know the morphology and syntax on the level of language automation. This period is called the early language learning period and in Croatian educational system it lasts from the age of seven to the age of twelve. This is the period when the language should be learnt by developing and stimulating communicative competence. Since the language learning is very often connected with negative attitude of pupils towards the mother tongue caused by the quantity and difficulty of the content, the aim of this paper was to find out whether this attitude could be changed and what improvements can be made if corpus is introduced as one of the methods of language learning. The research made in 2004 (Miljević-Riđički and associates) confirmed that children do not like the Croatian language as the mother tongue and that it is placed on the bottom of the scale of favourite subjects. The research made in 2009 (Pavličević-Franić and Aladrović) shows somewhat better attitude of pupils to the Croatian language as the school subject, though it is still connected to many negative connotations. The extensiveness of content, inappropriate manner of content processing and inadequate content can cause problems in learning of the standard form of the Croatian language and „the fear of language“ which can consequentially cause long-term problems to the pupils related to their expression and literacy. For the sake of illustration, it should be mentioned that the Croatian language is the most comprehensive subject in primary school which the pupils are taught for five lessons per week in the period of early language learning (until the age of 12). In addition, the communication in the mother tongue is the first and the crucial competence of the lifelong learning since the child will more easily learn other languages as well as other subjects if they have learnt their language well (European Commission, 2005). With the aim to improve the quality of the Croatian language teaching and learning, the intention was to investigate a small corpus of written papers of pupils in order to identify the language problems which the pupils encounter at a certain age and to accordingly change the language teaching and learning methods in order to change the attitude of pupils towards the Croatian language as a school subject.

II. Corpus of Children's Language

Croatian National Corpus includes 101.3 million tokens (www.hnk.ffzg.hr) and consists of a systematic collection of selected texts of the contemporary the Croatian language covering different media, genres, styles, areas and themes. Apart from Croatian National Corpus, there are some other corpora of the Croatian language, such as the Croatian Language Treasury of the Institute of the Croatian Language and Linguistics (www.ihhj.hr, December 2010).

Research of children’s language in Croatia did not start as early as in the United Kingdom, the United States of America or Germany. The first description of the children’s language and its lexical development was provided by Ivan Furlan in his dissertation „Diversity of vocabulary and speech structure“(1961). Many language researches were done in the 60’s and 70’s, however, their name contained the word „speech“ instead of the word „language“. By the end of the 70’s, Ante Fulgosi published his paper „Recent Research in Psycholinguistics“ (1979), where he presented the recent research from the field of psycholinguistics which was primarily inspired by the generative theory of Noam Chomsky about the language acquisition. More systematic research of children’s language had not started until the 80’s of the XX century and the works of Stjepko Težak (Grammar in Primary School, 1980) and the 90’s and the works of M. Ljubešić, M. Kovačević, Z. Babić and D. Pavličević-Franić, while a larger step forward was made with opening of the Laboratory for Psycholinguistic Research (POLIN) in 1999. Through their research, the Laboratory members contributed to the understanding of acquisition of the children’s language within the Croatian language corpus. The first Croatian corpus of children’s language was made by POLIN and it is included in the CHILDES world database. It consists of spontaneous speech of three monolingual children and a corpus of story-telling abilities of preschool children. The corpus of school children’s language (lexical level) has been collected and shown in the First School Dictionary of the Croatian Language which is at the same time the only e-dictionary with 2,500 explained words with recorded correct pronunciation, 2,000 drawings and 185 cartoons. It was published by the Institute of the Croatian Language and Linguistics in 2009. The corpus of school children’s language also includes the textbook language whose analysis was made in 2008 (Pavličević-Franić and Gazdić-Alerić, 2010). Within the textbook corpus, the words which most often appear in textbooks were counted and then sorted out into four categories: polysyllabic, affective, professional terminology and other.