Child Development and Education
Research Partnership Project
Contents
Page
- Background ……………………………………………………………………..3
- Aims and objectives ……………………………………………………………4
3. Project rationale ………………………………………………………………...7
4. Data-linkage methodology ……………………………………………………..8
5. Study sample and data…………………………………………………………9
6. Project outcomes ……………………………………………………………….11
7. Anticipated timeline …………………………………………………………….13
Appendix 1. Data items selected for linkage analysis ………………………….15
A1. Department of Education Data ……………………………………………….15
A2. Department of Health Data ……………………………………………………17
A3. DEEWR Data (AEDI) …………………………………………………………..23
1. Background
The NT Government’ has contributed $400,000 over the past four years as a founding partner of the SA NT Datalink consortium. This has enabled the development of the technical capacity to support research being conducted through the de-identified linkage of NT population datasets to conduct policy-relevant analyses not previously possible.
From 2009 - 2013, Menzies CCDE has partnered with NT Health Gains and Planning to conduct the SA NT Datalink Early Childhood Development Demonstration Study (Silburn, Lynch, Guthridge & McKenzie, 2009). The design of the NT study was developed in collaboration with Professor John Lynch, Professor of Public Health, University of Adelaide, so that a parallel study could be conducted in South Australia with comparable SA data using key variables relevant to children’s early health and development matched to those used in the NT study.
The objectives of both the NT and SA demonstration studies have been to:
a)develop robust methods to link and de-identify population-wide perinatal health, early child development, school education and other relevant datasets for the purpose of describing the population dynamics of childhood growth and development;
b)document the precision, consistency and completeness of the data-linkage across different database, and establish the number of unique individuals within each dataset and across datasets,
c)demonstrate the feasibility of using de-identified linked data to quantify the prevalence, associations and consequences of factors in early life of key relevance to children’s longer-term outcomes in health, behaviour and learning.
The current status of the NT data-linkage demonstration study is that most of the lead up work required in the de-identification and linkage of selected data items from each of the first four datasets being used in the study is now complete. This has required securing HREC and agency approvals for the creation of the anonymous linkage-keys and the extraction of the relevant data items attached to their linkage keys from the respective health, education and AEDI datasets for all children born in the NT from 1993 to 2009.The reliability of the linkage keys used for de-identified linkage of perinatal, immunization, school enrolment, attendance, NAPLAN and the 2009/10 AEDI data has been examined to document the consistency and quality of key identifying variables and content variables of particular interest data within each dataset (e.g. date of birth, gender, indigenous status).
New statistical methods of multiple imputation have been tested and are being used to correct discrepancies between identifying variables across datasets to minimise sample loss when performing analyses involving data on individuals drawn from multiple datasets.
The success of the SA NT Datalink Demonstration Study in establishing the feasibility of data-linkage analysis to make better use of existing NT administrative data has provided the foundation for the proposed ‘Project’ component of the NTG-Menzies CCDE Research Partnership.
2. Aims and objectives
2.1 Aims
The overall aim of the NTG-Menzies CCDE Research Partnership project is to build upon the experience of the SA NT-Datalink Early Childhood Development Study increating a NT specific study population based on de-identified linked data spanning the first antenatal health care visit through to school year 9 covering the 1993-2006 birth cohorts.
This will involve a three year program of research aimed at identifying specific early life conditions and experiences that adversely or beneficially influence child outcomes with a view to informing and supporting policies and programs that have the most likelihood of success in improving child outcomes in the NT.
2.2 Objectives
Objective 1. Investigate and report the social, individual, health and family factors that influence achievement in AEDI and NAPLAN literacy and numeracy tests among Northern Territory children.
The key research question to address this objective is “given the unique socio-demographic characteristics of the NT what are the most salient and potentially modifiable early life determinants which should be the focus of policy and practice to improve the longer-term human capability of the NT population?” Establishing the associations between AEDI and NAPLAN outcomes with early life socio-demographic and health circumstances will require controlling for potential confounding and mediating effects, examining effects in sub populations and identifying multi-level effects. The covariates to be considered in these analyses will depend on the specific questions but will include such family-level variables as maternal education, age, occupation etc and community/neighbourhood-level characteristics assessed through community-level indices of environmental health (e.g. housing overcrowding), social functioning (e.g. per capita rates of child protection and domestic violence notifications, police call-outs etc) and family support of school education (e.g. average annual school attendance rates).
Research questions related to social factors could include:
a) Identifying the relative contribution of family socio-economic factors (including parental education and occupation and area of residence) that influence school readiness and achievement in literacy and numeracy tests,
b) Identifying the relative contribution of socio-demographic factors (including family size and structure, ethnicity, language background, maternal and paternal age and family mobility) on school readiness, school attendance and achievement in literacy and numeracy tests.
Research questions relating to health factors that influence school readiness (AEDI), attendance and achievement in literacy and numeracy tests (NAPLAN) among NT children are:
a) Determining the relative contribution of clinical factors (including birth weight, Apgar score, birth length, head circumference, infant growth and nutritional status (GAA data) that influence school readiness and achievement on literacy and numeracy tests,
2) Identifying the relative contribution of maternal and child health factors (including mother’s gestational health and medical conditions/health issues identified from birth) (e.g. early childhood anaemia) that influence school readiness and achievement in literacy and numeracy tests.
The final research question under this objective concerns the combined effect of all of these factors and the appropriate covariates; what are the interactions between them and which factors can enhance or mute the beneficial or detrimental effect of others, and are there discrete developmental pathways which can be identified for different groups of children.
OBJECTIVE 2: Investigate the significant differences between the number of children in birth, school enrolment and school attendance cohorts.
There is significant mobility of children between the Northern Territory and other states and territories which is poorly described in the currently available data. There are also significant numbers of children that remain in the Northern Territory but do not engage with the school system (not enrolling or not attending regularly). Can the information available from the linked datasets inform our understanding of these cohorts of children?
The specific research questions that require investigation include:
a)what are the numbers and demographic profiles of children entering or leaving the NT in their early years?
b)How many children of school age are not enrolled in school or attend irregularly?
c)What are their demographic characteristics? How useful is linked birth and education data in studying these questions?
Objective 3.Investigate the child health, parental, family and community factors that relate to children’s vulnerability to child abuse and neglect and the longer-term developmental consequences of such vulnerability.
Child protection notifications and substantiations are a significant indication of the disadvantage and vulnerability of Aboriginal children and families (AIHW, 2012). In 2010-2011, Aboriginal and Torres Strait Islander children were almost 8 times as likely to be the subject of substantiated child abuse and neglect as non-Indigenous children (rates of 34 .6 and 4.5 per 1,000 children, respectively). In June 2011, the rate of Aboriginal and Torres Strait Islander children on care and protection orders was over 9 times the rate of non-Indigenous children (rates of 51.4and 5.4 per 1,000 children, respectively). Similarly, the rate of Aboriginal and Torres Strait Islander children in out -of-home care was 10 times the rate of non-Indigenous children (rate of 51.7 and 5.1per 1,000 children, respectively).
Concerns about the level of Indigenous over-representation in out of home placements constituting another ‘Stolen Generation’ have been challenged in recent years by the view that child removal remains a necessary response to the high prevalence of neglect in some communities where high rates of social adversity, family breakdown, chronic stress and ill health, low levels of parental education and employment are reproduced in a ‘vicious cycle’ of disadvantage (Delfabbro et al, 2010).
Given the unique socio-demographic characteristics of the NT there are several research questions where data-linkage analysis could provide a more nuanced understanding of the key drivers and consequences of childhood vulnerability in the NT. Pending discussion with the Office of Children and Families, these questions could include:
a)what are the most salient and potentially modifiable early life determinants which should be the focus of policy and practice to reduce children’s vulnerability to abuse and neglect?
b)what are the combined effects of all of the early life determinants and their appropriate covariates; what are the interactions between them and which factors can enhance or mute the beneficial or detrimental effect of others, and are there discrete developmental pathways of vulnerability which can be identified for different groups of children?
c)Can the above analysis be used to establish an index of vulnerability which could be used with the available population data to answer the question of whether current child protection practice in the NT represents an under- or over-response of services and agencies, and whether there have been identifiable trends over time in terms the levels of service response.
d)What are the longer-term pathways of development to age 18 of children who have been in out-of-home care? Do these outcomes differ with regard to the age of the child at the time of placement, whether these are kinship placements, the number and frequency of placements, and the total time spent in alternative care? Key outcomes which should be examined include: developmental functioning at age 5 years (AEDI), school attendance and retention, academic outcomes (NAPLAN years 3, 5, 7 & 9), contact with the juvenile justice and mental health systems etc.
Investigating these child protection related questions will involve securing the approvals needed for the linkage of OCF service data with the already linked datasets. Once the de-identified data are available, the analysis will involve establishing the prevalence and relative contribution of early life circumstances that predict the likelihood of a child’s involvement with the NT child protection system.
This will include consideration of the relative contribution of child clinical factors (e.g. inter-uterine alcohol and nicotine exposure, birth-weight and peri-natal health status, infant growth and nutritional status); parental and family factors (e.g. maternal age and education, parents’ health and mental status, family composition, functioning and mobility), and; community factors (e.g. housing overcrowding, indicators of community safety and community social functioning).
Objective 4. Investigating the extent to which NT early childhood development data and its markers match and diverge from those in South Australia
Given that the SANT-Datalink Early Childhood Development Study was set up to be done in parallel with a comparable South Australian study there is an opportunity to Investigating the extent to which NT data and its markers match and diverge from those in South Australia. While this will need to occur after Research Objectives 1 to 3 have been completed in the NT, Objective 45 is technically feasible but will the preparation of requests for variation to the existing HREC approvals in the NT and SA for the merging of two independently confidentialised linked datasets. Each of the data custodians of the various datasets in each jurisdiction will also need to consent to their data being combined with the data from the other jurisdictions to address some broader questions relating to both jurisdictions. On the basis of current experience and assuming no unforseen complications, this could take anywhere from 12 – 18 months to obtain all the administrative consents and to complete the analysis and reporting of findings.
3. Project rationale
There is widespread scientific agreement that the early years of a child’s life is of critical importance in shaping longer-term outcomes in health, development, learning and wellbeing across the lifespan. The Commonwealth, state and territory governments through the Council of Australian Governments (COAG) have established a comprehensive agenda for investing in early childhood development and wellbeing to “ensure that by 2020 all children have the best start in life to create a better future for themselves and for the nation.” Key goals of the National Strategy are to reduce the impact of risk factors on children’s development, reduce inequalities in outcomes between groups and to improve outcomes for all children. Building better information and robust evidence was highlighted as one of six priorities to progress the goals of the National Strategy
(
In the Lancet special series on child development in developing countries, Engle et al (2007b) concluded that the most effective early child development programs are those which are: targeted towards disadvantaged children; provide services to younger children (less than age 3); have continued duration throughout early childhood; are of high-quality, defined by structure (e.g. child-staff ratio, staff training, processes which allow responsive interactions and a variety of activities), provide services directly to children and parents and are integrated into existing health programs. It is acknowledged that children’s development is shaped by a complex interplay between individual biological factors and a range of social, economic and environmental factors.
For government policy to be better informed by evidence we need to improve our understanding of how various factors impact at the population level and for significant sub populations. Understanding how these factors influence children’s developmental trajectories and their capacity to participate in life and learning is essential to the effective targeting and delivery of services and to investigating the extent to which our policies are working in achieving their stated aims.
While there is a growing body of international work exploring the relationship between specific risk factors and outcomes in early childhood, much less is known about how these risk and protective factors cumulatively impact in whole populations.
Given the continuing poor child health and educational outcomes in the NT and the new policy emphasis on the development and delivery of more effective early childhood and family support services, it is vital that the design, implementation and evaluation of these services is based on reliable evidence and a systematic understanding of the complex interplay between individual, environmental and social forces shaping the lives of children in the NT population context.
4. Data-linkage methodology
The mechanics of the data linkage process are as follows:
-The identification/linkage data only (e.g. date of birth and name but not birth weight) from each dataset is supplied to the SA NT data linkage unit.
-The SANT-datalink linkage service generate and attach linkage keys (unique to each individual) to each record supplied.
-This is returned to the custodians in each agency who attach the linkage keys but remove the identifying data before supplying the “information” datasets to the researchers.
-The researchers now have de-identified data sets but can use the linkage keys to match records from different sources.
SA NT-Datalink’s systems and protocols are based on the highest ethical and privacy standards and strong security measures have been implemented to prevent inappropriate use or disclosure of personal information. Only the data custodians have access to personal identifying information and only de-identified linkage keys will be provided to the research team. The linkage process is carefully designed to ensure that no identified information (other than that used for the actual linkage) is supplied by the data custodians and that they receive no identifiable data from other sources.
The de-identified datasets with their anonymous linkage keys are stored separately on a secure computer server at CDU. The nominated Menzies CCDE researchers working on the project (Messrs Silburn and McKenzie ) have secure access to these de-identified linkeddatasets..
The data cleaning stage of the project has included cross-validation analysis to examine the internal consistency/accuracy of the merged datasets and an audit of data completeness and analysis of possible determinants of missing data and how this might inform the treatment of missing data through standard multiple imputation methods.
With regard to the public reporting of findings from the analysis our approach is consistent with the data cell size guidelines for use of AIHW data While there is no national standard for public reporting of small cell sizes, for the purposes of this project, we will suppress any positive cell size less than 10 as well as adjacent cells so that back calculation is technically not possible.
5. Study design and analysis
A range of statistical methods will be used in addressing the four research objectives and their associated research questions.