We have developed a templating

Exclusive, high-quality data for premium business insights.
Post Reply
asimj1
Posts: 417
Joined: Tue Jan 07, 2025 4:37 am

We have developed a templating

Post by asimj1 »

Project partners, the PRImA (Pattern Recognition & Image Analysis Lab) at the University of Salford have extended their Aletheia tool (a document analysis, recognition and annotation system) to lift the data from the census volumes and function that enables automated extraction of numeric values with attached metadata (meaning) from the images. Individual number recognition rates of over 97% are being achieved, removing the high-cost of ‘eyeball’ OCR. To date Aletheia has mainly usa rcs data been deployed on written/ typed text rather than data/tabular-type artefacts. It’s massively more efficient, quick and cost-effective. It also learns as it goes.

Professor Apostolos Antonacopoulos, who leads the PRImA Research Lab at University of Salford said: “We recognise the interesting nature and technical challenges of the project, as well as the greater good that will come from having more useful data for research.”

The UK Data Service Census Support team at Jisc has developed a comprehensive and robust quality assurance process based on internal logic within the data involving comparisons of groups of values that should have equivalent values (e.g. ‘all people’ should equal ‘all males’ + ‘all females’). Outputs from the OCR processes are taken through millions of comparisons to identify and rank numbers that don’t ‘fit’ with their surroundings and require correction.
Post Reply