Updated
Updated · arxiv.org · Jul 2
Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?
Updated
Updated · arxiv.org · Jul 2

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

1 articles · Updated · arxiv.org · Jul 2

Summary

  • Researchers have developed a multimodal large language model (LLM) pipeline to digitize diverse historical vehicle registration tables from early 20th-century US state reports.
  • The LLM-based approach achieved a 95.4% exact cell match rate, drastically reduced critical parsing errors, and operated at a fraction of traditional digitization costs.
  • This method enables domain experts to create large-scale panel datasets more efficiently, potentially transforming access to historical data for economic and social research.