Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?
Updated
Updated · arxiv.org · Jul 2
Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?
1 articles · Updated · arxiv.org · Jul 2
Summary
Researchers have developed a multimodal large language model (LLM) pipeline to digitize diverse historical vehicle registration tables from early 20th-century US state reports.
The LLM-based approach achieved a 95.4% exact cell match rate, drastically reduced critical parsing errors, and operated at a fraction of traditional digitization costs.
This method enables domain experts to create large-scale panel datasets more efficiently, potentially transforming access to historical data for economic and social research.