This de-duplication process also

Exclusive, high-quality data for premium business insights.
Post Reply
asimj1
Posts: 417
Joined: Tue Jan 07, 2025 4:37 am

This de-duplication process also

Post by asimj1 »

We worked through approximately 140,000 cells at this stage. The reason for the rigorous QA was that we now have a consistent set of descriptions and we have ensured that the same concepts are described in the same way – and we can begin to de-duplicate the data so that we only hold one version of a particular data item (cell). takes into account the geographic extent of the cell and its granularity, which then enables us to append each instance of this data together to ensure we have the best vietnam rcs data coverage of the data in terms of geographic coverages and detail. We also spent a considerable amount of time working through the lists of appends to ensure what we weren’t appending data together which shouldn’t have been.”

Because this is a manual process, Jamey Hart, Data Quality Officer on the team, had to do rigorous Quality Assurance (QA) to check that our ‘normalised’ description of each table had captured the essence of what the published table contained. This included eyeballing our description of the table and comparing this to the original supplied version. Jamey continues:

At this point, we moved onto processing the Scottish and Northern Irish data, which is essentially a repeat of the steps outlined above. For Northern Ireland, this was relatively straightforward as the data was described using similar codelists and codes to those that the ONS had used. The differences in the labels used were often minor e.g. ‘Aged 4’ was changed to ‘Age 4’ to match what was in the working model of codelists/codes. Obviously for Northern Ireland, they had data that describes concepts only applicable there, so it was a case of adding new codelists and codes in those instances.
Post Reply