Spring clean for a decade of South African stats data

02 October 2006

Clean data: Matthew Welch, Alison Siljeur and Lynn Woolfrey are part of the DataFirst team that will "clean" dozens of Statistics South Africa surveys.

The Mellon Foundation in the US has awarded R2-5-million to UCT's DataFirst statistics archive to "clean" more than a decade's worth of South African statistical data.

At this juncture of the country's development, it is crucial for researchers and policymakers that statistics on unemployment and poverty levels can be compared year on year and that accurate data from surveys feed research.

"We will be looking at dozens of Statistics South Africa (StatsSA) surveys since 1994," said economics professor Murray Leibbrandt, director of the Southern Africa Labour and Development Research Unit (SALDRU), who are collaborating with DataFirst on the project.

The work includes the six October Household Surveys conducted since 1994, the six-monthly Labour Force Surveys, census data from 1996 and 2001, income and expenditure surveys, and special once-off surveys such as StatsSA's Time Use Survey.

"There were shifts in questionnaire and sampling design that make it difficult to compare results year by year," Leibbrandt said. "Over time these kinds of comparisons raise their own special problems, which have to do with tracking changes accurately."

The team will look at how these changes affect survey results and what kinds of corrections can be made to make the data comparable.

Different economists and institutions use different data, producing markedly different results. Many are concerned about the reliability of surveys such as the Income and Expenditure Survey, claiming major technical problems with the data.

The project will take three years to complete. The principal investigators are DataFirst's head, Matthew Welch, and SALDRU deputy director, Dr Martin Wittenberg, an associate professor in the economics department.

Welch said their findings would be made available through the DataFirst website, with recommendations on how the data sets can be used and what their respective strengths and weaknesses are.

"Any adjustments to the officially-released datasets will be posted on the DataFirst website."

DataFirst will also hold workshops on the shortfalls of the various data, teaching researchers techniques to analyse the information.

Creative Commons License This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Please view the republishing articles page for more information.