FreeLing Tagset Description

FreeLing morphological analyzers and taggser encode morphological information in PoS tags which are based on the proposals by EAGLES.

EAGLES intends to be able to encode all existing morphological features for most European languages.

EAGLES PoS tags consist of variable-length labels where each character corresponds to a morphological feature. First character in the tag is always the category (PoS). The category determines the length of the tag and the interpretation of each character in the label.

For instance, for category noun we could have the definition:

Position Atribute Values
0 category N:noun
1 type C:common; P:proper
2 gen F:f; M:m; C:c
3 num S:s; P:p; N:n

That would allow PoS tags such as NCMS (standing for noun/common/masculine/singular)

Features that are not applicable or underspecified for a particular word are set to 0 (zero). For instance the tag NC00 stands for noun/common/underspecified-gender/underspecified-number.

Note that the interpretation of a character at a certain position of a tag depends on the PoS (indicated by the first character) and on the target language.

For instance, in a language where nouns can have additional features (e.g. case) the tag definition would include one additional position for case feature. E.g.:

Position Attribute Values
0 category N:noun
1 type C:common; P:proper
2 case N:nominative; G:genitive; D:dative; F:accusative;
3 gen F:f; M:m; C:c
4 num S:s; P:p; N:n

The order of the features is decided by the tagset designer, although EAGLES guidelines recommend that most general and most informative features are placed at the beginning of the label. In this way, labels can be shortened to a prefix that will always keep the essence of the PoS tag (being the extreme case the reduction to one single character that would be the category).

Next sections contain valid Part-of-Speech tags and attributes with their values for each language supported in FreeLing: