INESS-logo
The XLE Web Interface

Treebanks

Tools


The XLE Web Interface (XLE-Web) is a web-based tool for parsing with LFG grammars and viewing c-structures and f-structures. Initially, XLE-Web was developed in the LOGON project, and is now an integral part of the INESS platform. XLE-Web uses the XLE tool, which was developed in cooperation with The Parallel Grammar Project (ParGram), a project that gave rise to many of the grammars available on XLE-Web.

XLE-Web allows the user to choose a grammar and type in a sentence to be analysed. The sentence is then processed by the XLE parser, and the resulting c-structures and f-structures are displayed, either one solution at a time, or all solutions together in the form of packed c- and f-structure representations.

1. Parsing sentences

For parsing, type a sentence into the text box and press ‘Parse sentence’. After XLE has finished parsing (which might take some time), c- and f-structures are displayed. If you tick the box ‘Packed representation’, all solutions are displayed simultaneously; otherwise, you can move between solutions using the ‘Previous’ and ‘Next’ buttons.

By default, the root of the parse tree is the ROOTCAT of the grammar specified in the grammar file. If no solution is found, the solutions rooted in the REPARSECAT category (if specified) are displayed. You may explicitly specify a different root category for parsing by prepending the category name and a colon to the string (e.g., ‘NP:small child’).

The number of analyses found for the sentence is shown immediately above the structures. In some cases the information may contain a plus, e.g. ‘4+1 solutions’. This means that one of the solutions is not displayed because the grammar categorizes it as ‘unoptimal’ compared to the alternative analyses. Unoptimal analyses can be included; see Show unoptimal under Boxes below.

2. Display

C-structure display

C-structures are either displayed as ordinary c-structure trees, or, in packed mode, as graphs simultaneously displaying the c-structures of all solutions (see further Packed representation under Boxes below.). Mousing over a c-structure node highlights both the f-structure projection of that node and all other nodes having the same projection. Whole lines in the c-structure connect nodes projecting the same f-structure. Clicking on a non-terminal c-structure node simplifies the tree dominated by the node to a triangle dominating the corresponding string of word forms. Clicking again replaces the dominated string by [...]. Clicking on a preterminal c-structure node (a lexical category) displays its sub-lexical morphological tree as produced by the finite-state morphological analysis that precedes the syntactic parsing.

F-structure display

In the same fashion, f-structures are either displayed as ordinary (uncontexted) f-structures, one for each solution, or as packed f-structures combining the f-structures of all solutions. (See further Packed representation under Boxes below.)

If several f-structure attributes share their value, the value sub-f-structure is displayed only once, whereas the other attributes only have a reference index (a red number, which is also displayed as a red subscript on the referenced structure) as displayed value. Mousing over the reference index highlights the referenced structure. Clicking on the index makes the highlighting stick.

Discriminants

In the packed representation, you can use discriminants to choose or reject solutions. The discriminants represent a reduction of the forest of alternative sentence analyses to a set of binary choices, or local ambiguities. The discriminants are of four different types: lexical, morphological, f-structure and c-structure discriminants. Choosing discriminants involves choosing among the existing set of complete analyses; it does not amount to building new analyses.

As an example, Figure 1 below shows discriminants for the ambiguous sentence I saw the girl with the binoculars.


Figure 1. Discriminants for the ambiguous sentence I saw the girl with the binoculars

Each discriminant has four fields: (1) the string position(s) of the analysed item, (2) the analysis of the item represented by the discriminant, chosen by clicking on it, (3) the number of analyses remaining if this choice is made, (4) the option of choosing the compl(ement) of the analysis, i.e., marking it as false, along with the number of analyses remaining if that choice is made (by clicking on it.)

There is usually more than one way of selecting the desired analysis. The choice of saw as the past tense of ‘see’ rather than the present tense of ‘saw’ can be made by means of the second morphological discriminant or by one of the f-structure discriminants mentioning the predicate ‘see’. Furthermore, the meaning where the girl has the binoculars can be chosen by means of the last f-structure discriminant, where the set of adjuncts of ‘girl’ contains a with-phrase (‘$’ stands for the membership operator ‘∈’), or by an appropriate c-structure discriminant. A c-structure discriminant shows the choice of a phrase structure rule to analyse the associated string, which is partitioned by ‘||’ according to the rule daughters.

3. Options


Not all options below are available for all grammars.

Buttons

Grammar. Brings up a menu of available grammars for a number of languages. The default choice on opening the page is Norwegian Bokmål (pruning) (where ‘pruning’ is a statistically simplified version of the grammar allowing the parser to handle longer sentences and take less time, but possibly missing some analyses).

Parse. Parses the sentence written in the text box according to the chosen grammar, and displays its analyses.

In extended mode:

Morphemes. Displays the morphological analyses of the word or words in the text box as produced by the finite-state morphological transducer which provides the input to the XLE syntactic parser.

Tokens. Lists the tokens of the string entered in the text box as identified by the tokenizer, which operates before morphological analysis and syntactic parsing. "TB" in the output stands for `token boundary´.

  1. Generate. ‘Reverses’ the parse, i.e., generates strings back from the f-structure found by the most recent parse.

Prolog. Gives the parse result (the c- and f-structures) in Prolog form, suitable for further processing.

  1. GIT update grammar. Updates the current grammar to its latest version from a repository (relevant for users editing grammars).

Previous. Displays the previous analysis in the list when there is more that one analysis of the sentence.

Next. Displays the next analysis in the list when there is more that one analysis of the sentence.

Top-ranked. Shows the analysis ranked as most probable by a stochastic disambiguator.

Boxes

Packed representation. When a sentence has more than one analysis, an alternative to going through them by means of the buttons ‘Previous’ and ‘Next’ is ticking the box ‘Packed representation’ (preferably before parsing). This will show all alternative analyses in one packed representation, where both the c-structure and the f-structure are packed. In addition a menu of discriminants is displayed, allowing the user to zoom in on the desired analysis. (See Discriminants above.) If the number of alternatives exceeds a certain threshold, packed c- and f-structures are not displayed until the number of chosen analyses is lower than the threshold.

Substructures of a packed f-structure pertaining to a given context (= a subset of the set of all solutions) are labeled by green context labels, which correspond to the context labels in the packed c-structures. The abbreviated complex context labels (e.g. cv_005) are mouse-sensitive; mousing over them shows their definition in terms of basic choice labels (e.g. b3-b5|c4).

Show XLE messages. When this box is ticked possible messages generated by the XLE parser will be displayed. Such messages may in some cases explain why a parse fails.

Show unoptimal. If this box is ticked before the parse, ‘unoptimal’ analyses are included in the output. There will be unoptimal analyses whenever the number of solutions returned contains a plus, e.g. ‘4+1 solutions’. This means that one of the solutions is not displayed (unless this box is ticked), because the grammar categorizes it as ‘unoptimal’ compared to the alternative analyses.

Show dependencies. Ticking this box displays an experimental conversion of the chosen LFG analysis of the sentence to a dependency structure. A menu will appear allowing the choice between three kinds of dependensy structures: projective, non-projective and Universal Dependencies (UD).

Suppress CHECK. One of the f-structure attributes in the ParGram LFG grammars is CHECK. The value of this attribute contains information which is consulted by the parsing algorithm during the parsing process, but which does not consist in the kind of linguistic information about the sentence which is intended to be displayed in the parse results. Therefore the CHECK feature is suppressed by default, but can be shown by unchecking this box.

Suppress complex categories. XLE grammars may use an option called ‘complex categories’ whereby some of the syntactic features are expressed directly on the c-structure nodes rather than only in the f-structure. This may make parsing more efficient and also simplify the grammars. A complex category node has the relevant features added to it within square brackets, e.g. ‘V[fin]’. Ticking this box removes the bracketed features from the displayed node labels in the c-structure, giving ‘V’ in this example..

PREDs only. This choice simplifies the displayed f-structure by removing all attributes that do not lead to PRED values: only lexical predicates and the syntactic functions leading to them are kept in the displayed structure.

Show discriminant weights. Adds a fifth field to each of the discriminants that are shown when choosing ‘Packed representation’. This added field contains the weight assigned to the discriminant by the stochastic disambiguator which ranks alternative analyses according to probability.

Include non-top F-structures. Some grammars may give phrases analyses that are not included in the full analyses of the sentence. If this box is ticked, such analyses will be included in the display.

Show discriminants / c-structure / f-structure. Allows the choice of what parts of the parse result to display.