Nit wrapper for Stanford CoreNLP

Stanford CoreNLP provides a set of natural language analysis tools which can take raw text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, etc.

This wrapper needs the Stanford CoreNLP jars that run on Java 1.8+.

See http://nlp.stanford.edu/software/corenlp.shtml.

NLPProcessor

Java client

var proc = new NLPProcessor("path/to/StanfordCoreNLP/jars")

var doc = proc.process("String to analyze")

for sentence in doc.sentences do
    for token in sentence.tokens do
        print "{token.lemma}: {token.pos}"
    end
end

NLPServer

The NLPServer provides a wrapper around the StanfordCoreNLPServer.

See https://stanfordnlp.github.io/CoreNLP/corenlp-server.html.

var cp = "/path/to/StanfordCoreNLP/jars"
var srv = new NLPServer(cp, 9000)
srv.start

NLPClient

The NLPClient is used as a NLPProcessor with a NLPServer backend.

var cli = new NLPClient("http://localhost:9000")
var doc = cli.process("String to analyze")

NLPIndex

NLPIndex extends the StringIndex to use a NLPProcessor to tokenize, lemmatize and tag the terms of a document.

var index = new NLPIndex(proc)

var d1 = index.index_string("Doc 1", "/uri/1", "this is a sample")
var d2 = index.index_string("Doc 2", "/uri/2", "this and this is another example")
assert index.documents.length == 2

matches = index.match_string("this sample")
assert matches.first.document == d1

TODO

  • Use JWrapper
  • Use options to choose CoreNLP analyzers
  • Analyze sentences dependencies
  • Analyze sentiment

All subgroups and modules

group examples

nlp > examples

module nlp

nlp :: nlp

Natural Language Processor based on the StanfordNLP core.
module stanford

nlp :: stanford

Natural Language Processor based on the StanfordNLP core.
package_diagram nlp\> nlp vsm vsm nlp\>->vsm opts opts nlp\>->opts dom dom nlp\>->dom curl curl nlp\>->curl pthreads pthreads nlp\>->pthreads counter counter vsm->counter config config vsm->config core core opts->core parser_base parser_base dom->parser_base curl->core json json curl->json pthreads->core ...counter ... ...counter->counter ...config ... ...config->config ...core ... ...core->core ...parser_base ... ...parser_base->parser_base ...json ... ...json->json nlp\>examples\> examples nlp\>examples\>->nlp\>

Ancestors

group codecs

core > codecs

Group module for all codec-related manipulations
group collection

core > collection

This module define several collection classes.
group core

core

Nit common library of core classes and methods
group counter

counter

Simple numerical statistical analysis and presentation
group meta

meta

Simple user-defined meta-level to manipulate types of instances as object.
group parser_base

parser_base

Simple base for hand-made parsers of all kinds
group poset

poset

Pre order sets and partial order set (ie hierarchies)
group serialization

serialization

Abstract serialization services
group text

core > text

All the classes and methods related to the manipulation of text entities

Parents

group curl

curl

Data transfer powered by the native curl library
group dom

dom

Easy XML DOM parser
group opts

opts

Management of options on the command line
group pthreads

pthreads

POSIX Threads support
group vsm

vsm

Vector Space Model

Children

group examples

nlp > examples