Skip to main content
Log in

Abstract

Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffer from a variety of problems in the areas of expressiveness, efficiency, and naturalness for linguistic query. We describe the domain of linguistic trees and discuss the expressive requirements for a query language. Then we present a language that can express a wide range of queries over these trees, and show that the language is first-order complete over trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Afanasiev, L. (2003). XML query evaluation via CTL model checking. Master’s thesis, University of Amsterdam, ILLC Scientific Publications, MoL-2003-07.

  • Alechina N., Immerman N. (2000) Reachability logic: An efficient fragment of transitive closure logic. Logic Journal of the IGPL 8(3): 325–337

    Article  Google Scholar 

  • Berwick, R. C., & Weinberg, A. S. (1984). The grammatical basis of linguistic performance: Language use and acquisition, Vol. 11 of Current studies in linguistics. Cambridge, Mass: MIT Press.

  • Bird, S., Chen, Y., Davidson, S., Lee, H., & Zheng, Y. (2006). Designing and evaluating an XPath dialect for linguistic queries. In 22nd International Conference on Data Engineering (ICDE) (pp. 52–61).

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O’Reilly Media Inc. http://www.nltk.org/.

  • Bird, S., & Lee, H. (2007). Graphical query for linguistic treebanks. In 10th Conference of the Pacific Association for Computational Linguistics (pp. 22–30).

  • Bird S., Liberman M. (2001) A formal framework for linguistic annotation. Speech Communication 33: 23–60

    Article  Google Scholar 

  • Blackburn P., de Rijke M., Venema Y. (2001) Modal logic. Cambridge University Press., New York, NY, USA

    Google Scholar 

  • Blackburn, P., Meyer-Viol, W., & de Rijke, M. (1996). A proof system for finite trees. In H. K. Büning, (Ed.), Computer science logic, Vol. 1092 of Lecture Notes in Computer Science (pp. 86–105). Springer.

  • Cassidy, S. (2002). XQuery as an annotation query language: A use case analysis. In Proceedings of LREC 2002, Las Palmas, Spain, May.

  • Cassidy, S., & Bird, S. (2000). Querying databases of annotated speech. In Database technologies: Proceedings of the Eleventh Australasian Database Conference (pp. 12–20).

  • Chomsky N. (1981) Lectures on government and binding. Foris, Dordrecht

    Google Scholar 

  • Clark, J., & DeRose, S. (1999). XML Path language (XPath). W3C. http://www.w3.org/TR/xpath

  • Gottlob, G., Koch, C., & Pichler, R. (2003). The complexity of XPath query evaluation. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS (pp. 179–190). San Diego, CA, USA.

  • Gottlob, G., Koch, C., & Schulz, K. U. (2004). Conjunctive queries over trees. In Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 189–200). Paris, France.

  • Harel, D., Kozen, D., & Tiuryn, J. (2002). Dynamic logic. In D. Gabbay & F. Guenthner (Eds.), Handbook of philosophical logic (Vol 4., 2nd ed., pp. 99–217). Dordrecht: Kluwer Academic Publishers.

  • Heid, U., Voormann, H., Milde, J.-T., Gut, U., Erk, K., & Pado, S. (2004). Querying both time-aligned and hierarchical corpora with NXT Search. In Fourth Language Resources and Evaluation Conference, Lisbon, Portugal.

  • Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., & Sandholm, A. (1995). Mona: Monadic second-order logic in practice. In Tools and Algorithms for the Construction and Analysis of Systems, First International Workshop, TACAS ’95, LNCS 1019.

  • Hinrichs, E. W., Bartels, J., Kawata, Y., & Kordoni, V. (2000). The VERBMOBIL treebanks. In KONVENS 2000 Sprachkommunikation, ITG-Fachbericht 161 (pp. 107–112).

  • Hoeksema, J. & Janda, R. D. (1988). Implications of process-morphology for categorial grammar. In R. T. Oehrle, E. Bach, & D. Wheeler (Eds.), Categorial grammars and natural language structures. Dordrecht: D. Reidel.

  • Kamp, J. (1968). Tense logic and the theory of order. Ph.D. thesis, University of California, Los Angeles.

  • Kepser, S. (2003). Finite structure query: A tool for querying syntactically annotated corpora. In EACL 2003: The 10th Conference of the European Chapter of the Association for Computational Linguistics (pp. 179–186).

  • Kepser, S. (2006). Properties of binary transitive closure logic over trees. In P. Monachesi, G. Penn, G. Satta, & S. Wintner (Eds.), Formal grammar 2006 (pp. 77–89). CSLI Publications.

  • König, E. & Lezius, W. (2001). The TIGER language – A description language for syntax graphs. Part 1: User’s guidelines. Technical report, University of Stuttgart, Stuttgart, Germany.

  • Kracht, M. (1997). Inessential features, Vol. 1328 of Lecture Notes in Artificial Intelligence (pp. 43–62). Berlin: Springer.

  • Lai, C. (2005). A formal framework for linguistic tree query. Master’s thesis, Department of Computer Science and Software Engineering, University of Melbourne, Australia.

  • Lai, C., & Bird, S. (2004). Querying and updating treebanks: A critical survey and requirements analysis. In Proceedings of the Australasian Language Technology Workshop (pp. 139–146).

  • Libkin L. (1998) Elements of finite model theory. Springer-Verlag, Berlin

    Google Scholar 

  • Marcus M.P., Santorini B., Marcinkiewicz M.A. (1994) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2): 313–330

    Google Scholar 

  • Marx M. (2005) Conditional XPath. ACM Transactions on Database Systems 30(4): 929–959

    Article  Google Scholar 

  • Marx, M. (2005b). First order paths in ordered trees. In T. Eiter & L. Libkin (Eds.), Database theory – ICDT 2005, 10th International Conference, Edinburgh, UK, January 5–7, 2005. Proceedings, Vol. 3363 of Lecture Notes in Computer Science (pp. 114–128).

  • Marx, M., & de Rijke, M. (2004). Semantic characterization of navigational XPath. In Proceedings of TDM’04 Workshop on XML Databases and Information Retrieval. The Netherlands: Twente.

  • Maryns, H., & Kepser, S. (2008). MonaSearch – A tool for querying linguistic treebanks. http://tcl.sfs.uni-tuebingen.de/MonaSearch/.

  • Mönnich, U., Morawietz, F., & Kepser, S. (2001). A regular query for context-sensitive relations. In IRCS Workshop on Linguistic Databases 2001 (pp. 187–195).

  • Palm, A. (1999). Propositional tense logic for trees. In Proceedings of the Sixth Meeting on Mathematics of Language: MOL6. University of Central Florida, Orlando, Florida.

  • Randall, B. (2008). CorpusSearch 2 users guide. http://corpussearch.sourceforge.net/CS-manual/Contents.html.

  • Rogers, J. (1994). Studies in the logic of trees with applications to grammar formalisms. Technical Report 95-04, Department of Computer & Information Sciences, University of Delaware, Newark, Delaware.

  • Rohde, D. (2001). TGrep2 user manual. http://tedlab.mit.edu/dr/Tgrep2/tgrep2.pdf.

  • Schlingloff, B.H. (1992). On the expressive power of modal logics on trees. In Proceedings of the Second International Symposium on Logical Foundations of Computer Science, Springer LNCS 620 (pp. 441–451).

  • Shieber S.M. (1985) Evidence against the context-freeness of natural language. Linguistics and Philosophy 8(3): 333–343

    Article  Google Scholar 

  • Steiner, I., & Kallmeyer, L. (2002). VIQTORYA – A visual query tool for syntactically annotated corpora. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1704–1711), ELRA.

  • Tiede H.J. (2008) Inessential features, ineliminable features, and modal logics for model theoretic syntax. Journal of Logic, Language and Information 17(2): 217–227

    Article  Google Scholar 

  • Tiede H.J., Kepser S. (2006) Monadic second-order logic and transitive closure logics over trees. Electronic Notes in Theoretical Computer Science 165: 189–199

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, C., Bird, S. Querying Linguistic Trees. J of Log Lang and Inf 19, 53–73 (2010). https://doi.org/10.1007/s10849-009-9086-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-009-9086-9

Keywords

Navigation