• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View
 

Mine data

DiRT (Digital Research Tools) has a new home! Please visit Bamboo DiRT to explore this excellent collection of research tools.

Definition: 

Data mining tools enable researchers to sift through and search for patterns across large amounts of data. 

 

 

Tools:

  • Connexor Technologies Machinese: a resource for "fact-finding, metadata discovery, name recognition, sentiment detection, and language identification tools" (Commercial)
  • D2K: Data to Knowledge: "a rapid, flexible data mining and machine learning system that integrates analytical data mining methods for prediction, discovery, and deviation detection, with data and information visualization tools" (Free academic license, runs on any platform with Java Virtual Machine)
  • dtSearch: "generic text indexing, searching, and retrieval engine" (Commercial; Windows 2000 or later, requires Internet Explorer)
  • Expert System: "integrated natural language analysis and unstructured text management software" (Commercial)
  • GMDH Shell: "predictive analytics and data mining software; provides GMDH-based machine learning technology for classification, continuous value prediction and time series forecasting" (Commercial)
  • HyperPro: provides a graphical, "user-friendly interface to analytic and interpretive tools" for reading electronic texts (Free)
  • Infonic Sentiment: "linguistic processing technology for analysing digital newsfeeds, archives and generic text documents" (Commercial)
  • Insightful Miner: "scalable, data mining and analysis workbench that enables organizations to deliver customized predictive intelligence...specifically designed for statisticians and business analysts without specialized programming skills" (Commercial)
  • Intellexer: "semantic technology for custom built search engines based on natural language processing" (Commercial)
  • Inxight: "easily set up visual environments to explore extensive hierarchies and relationships in your applications, spot relationships and analyze trends in tabular data, and visualize long time horizons...[through] federated search, high-fidelity extraction and visualization" (Commercial)
  • JGAAP: "Java-based, modular program for textual analysis, text categorization, and authorship attribution" (Free)
  • Juxta: "tool for comparing and collating multiple witnesses to a single textual work. The software allows users to set any of the witnesses as the base text, to add or remove witness texts, to switch the base text at will, and to annotate Juxta-revealed comparisons and save the results." (Free, cross-platform)
  • Linguamatics I2E: "provides agile, high performance enterprise next mining, enabling rapid discovery of new intelligence from text...enables you to answer business-critical questions by rapidly extracting relevant facts and relationships from large document collections " (Commercial)
  • MALLET: "a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.: (open source using the Common Public License, Java-based)
  • Megaputer Text Analyst: "offers semantic analysis of free-form texts, summarization, clustering, navigation, and natural language retrieval with search dynamic refocusing" (Commercial)
  • MONK: "a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study. It supports both micro analyses of the verbal texture of an individual text and macro analyses that let you locate texts in the context of a large document space consisting of hundreds or thousands of other texts." (Free, web-based)
  • NORA: "a text-mining application intended to allow the exploration of verbal patterns in text collections" (superseded by MONK; source code & demo available)
  • Philologic: "primary full-text search, retrieval and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago"; support for TEI, DocBook, & plain; see also Philomine, which supports machine learning, text mining, and document clustering tasks (Free, Mac/Linux)
  • Rapid Miner: a data mining solution: "The setup is described by XML files which can easily be created with a graphical user interface (GUI)...follows the concept of rapid prototyping leading very quickly to the desired results...can be used as a Java data mining library" (Free; open source, cross-platform)
  • Saplo: A text analysis API with text recommendations, text filtering, text categorization, automatic tagging, automatic related articles and sentiment analysis. Read more in the text analysis API documentation (Free to try; special offers for researchers and universities are available, web-based).  
  • SAS Text Miner: suite of text processing and analysis tools: "transforms textual data into a usable, intelligible format that facilitates classifying documents, finding explicit relationships or associations between documents, and clustering documents into categories" (Commercial, cross-platform)
  • SEASR: provides tools & frameworks for sharing data and research in virtual work environments (Free; open source, Windows/Mac/Linux)
  • SPSS Text Mining for Clementine: extract key concepts, sentiments, and relationships from unstructured data, and convert it to structured format for predictive modeling (commercial)
  • SVM Light“an implementation of Vapnik's Support Vector Machine for the problem of pattern recognition, for the problem of regression, and for the problem of learning a ranking function…[the optimization algorithm] has scalable memory requirements and can handle problems with many thousands of support vectors efficiently” (Free for scientific use, permission required for commercial distribution; open source, cross-platform)
  • SWAPit:  visual text mining and retrieval capabilities, including search, term statistics, and summary; visualises semantic relationships among text documents (commercial)
  • TACT Web: text analysis software that makes TACT TDB databases available on the web to both TACT and non-TACT users (Free, web-based)
  • TAMS Analyzer: "an open source qualitative package for the analysis of textual themes. It can be used for transcribing digital media and for conducting discourse analysis in the social and cultural sciences." (Free, Mac/GnuStep)
  • TAPoR: a searchable list of tools available through the Text Analysis Portal for Research that can be used online.  TAPoR is "a gateway to tools for sophisticated analysis and retrieval, along with representative texts for experimentation...manage electronic texts, experiment with online text tools, [and] learn about digital textuality."  The TAPoRware tools are also available separately. (Free, web-based)
  • TextOre: providing B2B analytic software and services to examine and extract information from large volumes of unstructured text (commercial)
  • TextPipe Pro: a text conversion, extraction and manipulation workbench: "specify all your text processing functions in one place, rather than remembering and managing multiple manual jobs across various text editors, command line tools, custom scripts and Word and Excel macros"  (Commercial, Windows) 
  • TokenX: text visualization, analysis and play tool (open source)
  • VisualText: IDE for building information extraction systems, natural language processing systems, and text analyzers: "has been used to build a number of applications, including accurate analyzers for extracting information from resumes, systems that categorize web pages, an analyzer that monitors a financial transaction chat, email analyzers, selective web spiders, and more" (Free for non-commercial use)
  • WEKA 3: "a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." (Free; open source, requires Java)
  • WMatrix: "a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains." (Free, browser-based)

 

 

 

Resources:

  • KDNuggets: a directory of data mining tools
  • The Data Mine: a directory of data mining software, applications and tools

 

See Also:

 

Comments (0)

You don't have permission to comment on this page.