NATIONAL STATISTICAL INSTITUTE

FINAL OPERATIONAL REPORT ON

EXECUTION ON PROJECT UNDER

GRANT AGREEMENT FOR AN ACTION

AGREEMENT NUMBER – 50403.2011.001-2011

„UrbanAuditdata collection 2012” (Regional and Urban Audit quality analysis and methodological improvements)

NATIONAL STATISTISTICAL INSTITUTE OF BULGARIA

Ivaylo Gavazki

E-mail:

Sofia, January 2014

1. Description of work

Bulgariais a part of the Urban Audit since 2004. The data collection provides information and comparable measurements on different aspects of the quality of urban life in cities. The aim of project is to ensure reliable information on indicators for the demographic, social, economic and environmental conditions of the cities, their Sub-City Districts and their Larger Urban Zones.The project goalis achieved through the application of uniform city definition that improves the geographical comparability and a pilot application of new methodology for production of Small Area Estimates at Local Administrative Units level.The “Urban Audit data collection 2012” is closely related to one of the priorities of Europe2020 Strategy – the inclusive growth in urban areas. The “Urban Audit data collection 2012”will support of the EU policy initiatives on urban development that often need statistical data on more detailed geographical level such as city and Sub-CityDistricts.

The spatial coverage of inquiry reached 25% of the country territory (18 cities - from 257 total in the country) with more than 50 000 inhabitants and 58 municipalities (from 264,composing their Larger Urban Zones). All Spatial Units are listed in the annex of Methodological Manual on City Statistics.

Urban Audit is a multiannual inquiry. The reference years for the main data collection are 2011, 2008, 2004 and 2001. 1996 and 1991 were the the reference years for the "historical" data collection. Few variables have been collected on annual bases for the period 2005-2012.The current data collection covers the years 2010-2012.

The project's tasksperformed during the current project’ phasemay be grouped as follows:

  • Delimitation of the new LUZ’s boundaries using 2011 Census data on commuters observing the Eurostat’s instructions. Sofia and Pernik were identified as “connected cities” because more than 15% of the employed residents of Pernik are working in Sofia;
  • Definition of the new SCDs of Sofia using the 2011 Census total resident population data. The redistricting was made in GIS environment base on the statistical units used for the 2011 Census purposes (enumeration areas and control sectors). Each of the 39 SCD is defined to be within one of the 24 districts of Stolichna municipality without crossing the administrative boundaries. The attribute table contains information about the population size in each SCD. SDC population varies from 16000 to 40000 inhabitants. It was ascertained that the previously defined trough the 2001 Census data SCDs could not be used because of the population growth of the capital city (from 1091772 in 2001 to 10202761 in 2011);
  • Screenning of data availability at different spatial levels and for different reference years;
  • Examination of the variables definitions included in the Methodological Manual on City Statistics;
  • Assessment of the methodology and quality criteria used in compiling the different Urban Audit variables in respect to the Eurostat's guidelines;
  • Examination of the existing practice and development of new methodology for production of Small Area Estimates from sample surveys data. The methodology was presented at NUAC meeting in June 2013;
  • Compilatioan of variables available (for the ANNUAL data collection and for the EXHAUSTIVE data collection);
  • Management of the quality;
  • Dissemination actions - the results of previous ANNUAL data collection round are available on: Continuous maintenance and improvement of the Urban Audit topic at the official web site of NSI is planned. Information on the current data collection exercise may be found at:
  • Preparation of an interim technical report;
  • Production of estimates of the missing variables through application of the new Small Area Estimation methodology and interpolation and extrapolation of structures in the time;
  • Besides the dissemination actions foreseen in the grant agreement a publication devoted to the new Small Area Estimation methodology came of the press (in Bulgarian in the official periodical of the NSI – Statistics, 1-2/2013);
  • Preparation of the metadata file according to the Euro SDMX Metadata Structure;
  • 187 rows containing potentially erroneous data were validated. At the moment some final checks are running.

As of 15/01/2014 the coverage of UA variables requested is as follows (the centrally collected variables are excluded from the calculation):

Completeness of variables for the ANNUAL Urban Audit on City level – 98.7%;

Completeness of variables for the ANNUAL Urban Audit on LUZ level – 100.0%;

Completeness of variables for the EXHAUSTIVE Urban Audit on City level – 99.3%;

Completeness of variables for the EXHAUSTIVE Urban Audit on LUZ level – 99.2%;

Completeness of variables for the EXHAUSTIVE Urban Audit on SCD level – 100%;  Completeness of variables for the EXHAUSTIVE Urban Audit on National level – 99.2%.

The final result of the Urban Audit data collection 2012 isbetween 98.7% and 100% coverage of the variables due to the application of estimation methodology, excluding the centrally collected variables, on which we will not work on.Detailed information on the completeness is given in Annex ІІІ – Coverage of variables. There are only few missing variables. One of them (EN3011V Percentage of the urban waste water load (in population equivalents) treated according to the applicable standard) is given for the first time in the frames of the Urban Audit and is missing at LUZ level. Ministry of Environment and Water collects and supplies each two years to the European Commission data on application of Directive 91/271/ЕЕС concerning urban waste-water treatment. The reports contain information on agglomerations with more than 2000 equivalent population, but do not contain information on municipalities as the Directive does not deal with administrative units.Treated according to the applicable standard are the urban waste waters of cities having waste water treatment station with acting treatment stages and ensuring treatment under the required indicators. In such cases, the percentage (in population equivalents) of waste water entering the treatment station is indicated. For cities without treatment station or having treatment station without treatment stages, the percentage of waste water treated according to the standard is considered “0”.

2. Timetable of the project

  • t Start;
  • t+6 months - Screening of variables;
  • t+12 months - Compilation of variables for the ANNUAL Urban Audit (reference year 2010/2011);
  • t+15 months - Compile an Interim operational report;
  • t+18 months - Compilation of variables for EXHAUSTIVE Urban Audit (reference year 2011);
  • t+18 months - Compilation of variables for the ANNUAL Urban Audit (reference years 2011/2012);
  • t+21 months - Participation in the quality control;
  • t+24 months - Dissemination of data;
  • t+24 months - Compilation of a final operational report.

3. Methodology and data sources

For most of variables,the existing international standardsand definitons of the Methodological Manual on City Statistics are followed. Further information on deviating definitions used in the data collection may be found in Annex І.Explanation on some specific cases in our statistical practice is included in the Annex also. All the variables that are deviating from the UA definitions are flagged with “D” in the database.Deviating methodologies are not used in the collection. EU Regulations on different statistical surveys are followed.

Basicinformation on data sources is included as flags in the data base. BNSI continues its collaboration with the National Association of the Municipalities in Republic of Bulgaria. An expert from NAMRB is collecting data on 15 variablesdirectly from the municipal authorities, connected mostly with the travel patterns. On these variables there were not data available for the country level. Because of the fact that 63.9% from the persons employed in the country live in the municipalities composing the LUZs the average values for the 58 municipalities were considered to be enough reliable estimates for the country level.

With reference to the increased requirements for methodological improvements set by EUROSTAT in the frames of current data collection, the previous project leader created a research team that developed a New Methodology for production of Small Area Estimates at Local Administrative Units levelusing sample survey data. Atthecurrentprojectstage, for about 30 variables on the employment (economic activity of the population, education and household’s income and living conditions) there is no another source of information apart sample surveys (excluding the census year). The methodology, based on clasterisation through structural analysis, gives a solution of the problem with sample sizes and heterogeneity between the different LAUs. Detailed information on the estimation methododlogy used is available in Annex ІІ.A misprint in the formula for defining the clusters in the version included in the Methodological assessment was corrected. The calculations are based on the correct formula. The methodology ispublished as an article in the official journal of the BNSI “Statistics”. The aspiration is to popularize the Urban Audit and to demonstrate the possibilities of NSI to produce Small Area Statistics among the scientific circles.

4. Quality analysis

Quality management is carried out by the following actions:

  • An examination of the quality and logical control of the data are done;
  • The quality assurance procedures detailed in the Methodological Manual on City Statistics have been applied;
  • The complete set of validation rules has been observed;
  • Data validation is done by running the different types of checks in the data sets.

Quality assessment includes the following procedures:

  • The reliability of results obtained is illustrated in the Annex bellow. In the table is shown that the intra-groups dispersion of the variables selected for the production of the clusters are smaller than the between-groups dispersion. So the clusters are considered to be homogeneous;
  • A check on the coherence between the LAU1 and LAU2 Estimates, produced based on the New Estimation Methodology and its aggregates at NUTS 3 and country level, and the country level data obtained in the frames of LFS and SILC was done. The differences for the different variables are in the order of tenths of the percent;
  • The accuracy of the LAU1 and LAU2 estimates assessed according to well-known scientific standards and it was found out that the quality is satisfying;

Besides the production of Small Area Estimates following quality improvementwere made. Until now information on some variables was provided by BNSI only for Census year. In the frames of this data collection round such type of missing data were estimated using adequate extrapolation, respectively interpolation curves.For instance, for the variables connected to the country of birth information at 01.01.2008, 01.01.2010 and 01.01.2012 was not available. Census data as of 01.02.2011 were considered to be approximately equal to data as of 01.01.2011.That is why an approach is applied based on the available data from the current demographic statistics at 31.12 and census data. Based on the population structures by country of birth and applying the least squares method an estimation is done of the parameters of power function , which is the non- linear dependency between the time factor “t” and the investigated result factor y(t). By application of logarithm function to the both equation sides the pointed dependency is transformed to linear one. The system of two linear equations with two unknowns is solved:

,

where

T = ln t, Y = ln y, A = ln and the symbol “ ” is a mean arithmetic value.

The solution is reached through Cramer’s formulas:

, .

Using an exponential function, the estimate of the parameter “a” is calculated: .

Based on the autocorrelation curves thus obtained, estimated through the empirical distribution of the structures of statistical data and interpolation, estimates of the searched variables at 01.01.2012 are calculated. Estimates at 01.01.2010 and 01.01.2008 are calculated through extrapolation in retrospect. The results are transformed from structures to absolute numbers and are rounded with precision to the one.

As the 2012 SILC data will be available for use after finishing the Urban Audit 2012 data collection project, preliminary estimates of the respective variables are calculated. Applying an analogous of the above mentioned approach, based on extrapolation of one-year perspective estimates obtained following the Small Area Estimation methodology and using 2010 and 2011 SILC data, preliminary data for 2012 are produced.

Two of the searched variables at level Sub-City Districts concern the total number of deaths and the number of deaths under age of 65 by sex. At present, no information on demographic events is available at NSI by census regions level. In order to obtain data by single ages and sex, the number of deaths in Sofia-city in 2011 is distributed following the population structure by census regions obtained through the census. Next, the estimates produced at census regions level are aggregated up to level Sub-City Districts. The results are rounded with precision to the one.

5. Conclusions

The main results for the project Urban Audit 2012 data collection are:

  • The New Methodology for production of Small Area Estimates at Local Administrative Units level and the reliable estimates produced on this methodological basis;
  • The redistricting of the SCDs of Sofia;
  • The new composition of the LUZs;
  • The metadata file that according to the Euro SDMX Metadata Structure;
  • Between 98.7% and 100% completeness of the data requested for the different spatial levels;
  • Link on the official NSI web site that refers to the 2005 – 2009 Annual data.

Annex

Clusters homogeneity by selected variables
Groups / Population / Live births / Deaths / Immi-grations / Emmi-grations / Persons employed / Employ-ment
(cluster) / (by age) / (by age) / (by age) / (by age) / (by age) / (by age) / (by sectors)
Between-groups dispersions / 0,242860 / 0,035136 / 0,157962 / 0,226736 / 0,218681 / 0,259312 / 0,173176
Intra-groups dispersions / 1 / 0,064961 / 0,000000 / 0,000000 / 0,055092 / 0,090462 / 0,010471 / 0,019462
2 / 0,084676 / 0,000000 / 0,062172 / 0,086273 / 0,098011 / 0,076144 / 0,126327
3 / 0,111220 / 0,005071 / 0,091881 / 0,115350 / 0,111281 / 0,089545 / 0,130858
4 / 0,107683 / 0,008403 / 0,118147 / 0,136991 / 0,130250 / 0,096460 / 0,124147
5 / 0,095044 / 0,012226 / 0,118730 / 0,148954 / 0,137497 / 0,098918 / 0,101621
6 / 0,085493 / 0,015778 / 0,102204 / 0,153705 / 0,142116 / 0,098095 / 0,093013
7 / 0,085884 / 0,022591 / 0,098230 / 0,155507 / 0,142538 / 0,110064 / 0,094326
8 / 0,078202 / 0,026330 / 0,081538 / 0,158883 / 0,136326 / 0,093407 / 0,092427
9 / 0,065611 / 0,025466 / 0,071349 / 0,163903 / 0,143941 / 0,091009 / 0,091059
10 / 0,058744 / 0,020221 / 0,069548 / 0,167386 / 0,140077 / 0,084782 / 0,090847
11 / 0,051943 / 0,014449 / 0,067509 / 0,168872 / 0,140646 / 0,067993 / 0,089509
12 / 0,040807 / 0,011734 / 0,069059 / 0,170836 / 0,154755 / 0,046279 / 0,088101
13 / 0,038577 / 0,009779 / 0,067312 / 0,153846 / 0,158977 / 0,035755 / 0,077811
14 / 0,025524 / 0,007223 / 0,060032 / 0,151008 / 0,148475 / 0,031428 / 0,075531
15 / 0,018726 / 0,003449 / 0,030992 / 0,127630 / 0,153421 / 0,012942 / 0,059738