|
Peter A. Machonis
In Natural Language
Processing, there has been much recent attention paid to phrasal verbs (Jackendoff
2002, Sag et al. 2002, Villavicencio 2005). This paper represents an
attempt at parsing English phrasal verbs with and without insertion in large
corpora. We used manually constructed lexicon-grammar tables (Gross 1996)
of over 1,200 transitive and ergative phrasal verbs, along with NooJ, a
linguistic development environment that parses texts using large-scale
dictionaries and local grammars. In order to improve recall, NooJ uses
a Text Annotation Structure
that holds all unsolved ambiguities (Silberztein
2007).
The lexicon grammar tables of phrasal verbs were then used to
compile a NooJ phrasal verb dictionary, which was next used in conjunction
with a local grammar to identify phrasal verbs in the Henry James novel
The Portrait of a Lady. When applying the phrasal verb dictionary and
grammar to a lexical and syntactic analysis of the entire text, NooJ
identified 583 potential phrasal verbs. Although the overall accuracy was
only 84%, the concordances generated from our most exhaustive tables (the
particles up and out) achieved 97% accuracy.
Furthermore, the NooJ grammar and dictionary correctly
identified over 80 discontinuous instances of phrasal verbs, such as
Shall I show the
gentleman up, ma'am?
She has reasoned the matter well
out
In fact, most of the
problems encountered were with continuous examples. Nouns were sometimes
interpreted as part of phrasal verbs
He carried his hands in his pockets
This happens since the phrasal verb dictionary contains
the expression hand something in “return.”
False positives also highlighted the problem of distinguishing particles
from prepositions:
Ralph asked while they moved along the platform
Although the Text Annotation Structures also showed the correct
interpretations of these sentences, expanding selectional restrictions of
complements in the lexicon grammar of phrasal verbs to include semantic
classes and domains (Le Pesant & Mathieu-Colas 1998) might help solve
ambiguities such as these. Recall will also be improved in the future by
adding intransitive phrasal verbs and phrasal prepositional verbs to the
NooJ database.
Nevertheless, using
NooJ to identify phrasal verbs in large corpora has shown that it is indeed
a powerful linguistic tool that can help solve a key problem in Natural
Language Processing.
Work Cited
Gross, Maurice. 1996. Lexicon
grammar. In Concise Encyclopedia of Syntactic Theories. Keith Brown
and Jim Miller, eds. 244-258. New York: Elsevier.
Jackendoff,
Ray. 2002. English particle constructions, the lexicon, and the autonomy of
syntax. In Verb-particle explorations. Nicole Dehe, Ray Jackendoff, Andrew
McIntyre & Silke Urban, eds. 67-94. NY: Mouton
de Gruyter.
Le Pesant,
Denis & Michel Mathieu-Colas. 1998. Introduction aux classes d’objets.
Langages 131: 6-33.
Sag, Ivan A., Timothy
Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword
expressions: a pain in the neck for NLP. In Proceedings of the Third
International Conference on Intelligent Text Processing and Computational
Linguistics. 1-15. Mexico City: CICLING.
Silberztein, Max. 2002-2006. NooJ Manual.
http://www.nooj4nlp.net/
Silberztein Max. 2007. An Alternative to tagging. Proceedings of NLDB 2007.
Lecture Notes in Computer Science 1-11. Berlin: Springer Verlag.
Villavicencio, Aline. 2005. The availability of verb-particle
constructions in lexical resources: How much is enough? Computer Speech and
Language 19: 415-432.
|