Transforming large collections of scientific publications to XML

lecting statistics about missing bindings and macros, and other errors. This guides debugging and development efforts, leading to iterative improvements in both the tools and the quality of the converted corpus. The build system thus serves as both a production conversion engine and software test harness. We have now processed the complete arχiv collection through 2006 consisting of more than 400,000 documents (a complete run is a processor-yearsize undertaking), continuously improving our success rate. We are now able to convert more than 90% of these documents to XHTML+MathML. We consider over 60% to be successes, converted with no or minor warnings. While the remaining 30% can also be converted, their quality is doubtful, due to unsupported macros or conversion errors
Keywords No keywords specified (fix it)
Categories No categories specified
(categorize this paper)
 Save to my reading list
Follow the author(s)
My bibliography
Export citation
Find it on Scholar
Edit this record
Mark as duplicate
Revision history Request removal from index Translate to english
Download options
PhilPapers Archive

Upload a copy of this paper     Check publisher's policy on self-archival     Papers currently archived: 15,865
External links
Setup an account with your affiliations in order to access resources via your University's proxy server
Configure custom proxy (use this if your affiliation does not provide a proxy)
Through your library
References found in this work BETA

No references found.

Add more references

Citations of this work BETA

No citations found.

Add more citations

Similar books and articles
Go Eguchi & Laurence L. Leff (2002). Rule-Based XML. Artificial Intelligence and Law 10 (4):283-294.
Howard Turtle (1995). Text Retrieval in the Legal World. Artificial Intelligence and Law 3 (1-2):5-54.
Robert Koons (2000). The Incompatibility of Naturalism and Scientific Realism. In William Lane Craig & James Porter Moreland (eds.), Naturalism: A Critical Analysis. Routledge 49--63.

Monthly downloads

Added to index


Total downloads

31 ( #101,076 of 1,724,892 )

Recent downloads (6 months)

31 ( #33,742 of 1,724,892 )

How can I increase my downloads?

My notes
Sign in to use this feature

Start a new thread
There  are no threads in this forum
Nothing in this forum yet.