Statistical Dependency Parser and Semantic Role Labelling Module
As an alternative to rule-based Txala dependency parser, a statistical dependency parsing module is also available. It is based on Treeler machine learning library.
The dependency parser is based on the paper [Car07].
The API of the class is the following:
class dep_treeler : public dependency_parser {
public:
/// constructor
dep_treeler(const std::string &cfgfile);
/// destructor
~dep_treeler();
/// analyze given sentence.
void analyze(sentence &s) const;
/// analyze given sentences.
void analyze(std::list<sentence> &ls) const;
/// return analyzed copy of given sentence
sentence analyze(const sentence &s) const;
/// return analyzed copy of given sentences
std::list<sentence> analyze(const std::list<sentence> &ls) const;
};
The constructor for class dep_treeler
expects a configuration file with the contents described below.
Statistical Parser Configuration File
The configuration file for the statistical dependency parser and semantic role labelling module has a single section <Dependencies>
, containing two lines, with a keyword and a value each:
-
The
DependencyTreeler
keyword should be followed by a path to a Treeler configuration file with the dependency parsing model to use. The path may be either absolute or relative to the Statistical Parser configuration file. -
The
Tagset
keyword should be followed by a path to a tagset definition file which will be used to convert the input PoS tags to the short versions and MSD features expected by the Treeler model.
An example of the <Dependencies>
section:
<Dependencies>
## treeler config file for dep parser
DependencyTreeler ./dep/config.dat
## Tagset description file
Tagset ./tagset.dat
</Dependencies>