DiRT (Digital Research Tools) has a new home! Please visit Bamboo DiRT to explore this excellent collection of research tools.
Web site: The Stanford Natural Language Processing Group
Date of first review/ name of reviewer: 5/29/08 [Matthew Jockers]
Additional reviewer(s):
Produced by: The Stanford Natural Language Processing Group
Cost: Free
Description: "A Java implementation of a maximum-entropy (CMM) part-of-speech (POS) tagger."
Platform: Java, Command Line Application
License:The tagger is licensed under the GNU GPL.
Maturity: v1.5
Features:
- Tags Parts of Speech using Penn Treebank Tag set
- Customizable
- Ability to tag XML content (added in v1.5)
- Ability to produce XML results instead of default Penn Treebank
Advantages:
- Comes with several pre-trained models but can easily be retrained
- Full API for Developers
- Simple to set up and install--The system requires Java 1.5+ to be installed.
- Growing user community with an email list
Disadvantages:
- There is a GUI, but it is only for demo purposes. You must be comfortable with the command line to really use this tool
Tips:
Tutorials:
More information: