Transforming large collections of scientific publications to XML

lecting statistics about missing bindings and macros, and other errors. This guides debugging and development efforts, leading to iterative improvements in both the tools and the quality of the converted corpus. The build system thus serves as both a production conversion engine and software test harness. We have now processed the complete arχiv collection through 2006 consisting of more than 400,000 documents (a complete run is a processor-yearsize undertaking), continuously improving our success rate. We are now able to convert more than 90% of these documents to XHTML+MathML. We consider over 60% to be successes, converted with no or minor warnings. While the remaining 30% can also be converted, their quality is doubtful, due to unsupported macros or conversion errors
Keywords No keywords specified (fix it)
Categories No categories specified
(categorize this paper)
 Save to my reading list
Follow the author(s)
My bibliography
Export citation
Find it on Scholar
Edit this record
Mark as duplicate
Revision history Request removal from index Translate to english
Download options
PhilPapers Archive

Upload a copy of this paper     Check publisher's policy on self-archival     Papers currently archived: 9,360
External links
  •   Try with proxy.
  • Through your library Only published papers are available at libraries
    References found in this work BETA

    No references found.

    Citations of this work BETA

    No citations found.

    Similar books and articles
    Go Eguchi & Laurence L. Leff (2002). Rule-Based XML. Artificial Intelligence and Law 10 (4):283-294.
    Howard Turtle (1995). Text Retrieval in the Legal World. Artificial Intelligence and Law 3 (1-2):5-54.
    Robert Koons (2000). The Incompatibility of Naturalism and Scientific Realism. In William Lane Craig & James Porter Moreland (eds.), Naturalism: A Critical Analysis. Routledge. 49--63.

    Monthly downloads

    Sorry, there are not enough data points to plot this chart.

    Added to index


    Total downloads


    Recent downloads (6 months)


    How can I increase my downloads?

    My notes
    Sign in to use this feature

    Start a new thread
    There  are no threads in this forum
    Nothing in this forum yet.