lib/nlp/README.md

   1 # Nit wrapper for Stanford CoreNLP
   2
   3 Stanford CoreNLP provides a set of natural language analysis tools which can take
   4 raw text input and give the base forms of words, their parts of speech, whether
   5 they are names of companies, people, etc., normalize dates, times, and numeric
   6 quantities, and mark up the structure of sentences in terms of phrases and word
   7 dependencies, indicate which noun phrases refer to the same entities, indicate
   8 sentiment, etc.
   9
  10 This wrapper needs the Stanford CoreNLP jars that run on Java 1.8+.
  11
  12 See http://nlp.stanford.edu/software/corenlp.shtml.
  13
  14 ## Usage
  15
  16 ~~~nitish
  17 var proc = new NLPProcessor("path/to/StanfordCoreNLP/jars")
  18
  19 var doc = proc.process("String to analyze")
  20
  21 for sentence in doc.sentences do
  22         for token in sentence.tokens do
  23                 print "{token.lemma}: {token.pos}"
  24         end
  25 end
  26 ~~~
  27
  28 ## Nit API
  29
  30 For ease of use, this wrapper introduce a Nit model to handle CoreNLP XML results.
  31
  32 ### NLPDocument
  33
  34 [[doc: NLPDocument]]
  35
  36 [[doc: nlp::NLPDocument::from_xml]]
  37 [[doc: nlp::NLPDocument::from_xml_file]]
  38 [[doc: nlp::NLPDocument::sentences]]
  39
  40 ### NLPSentence
  41
  42 [[doc: NLPSentence]]
  43
  44 [[doc: nlp::NLPSentence::tokens]]
  45
  46 ### NLPToken
  47
  48 [[doc: NLPToken]]
  49
  50 [[doc: nlp::NLPToken::word]]
  51 [[doc: nlp::NLPToken::lemma]]
  52 [[doc: nlp::NLPToken::pos]]
  53
  54 ### NLP Processor
  55
  56 [[doc: NLPProcessor]]
  57
  58 [[doc: nlp::NLPProcessor::java_cp]]
  59
  60 [[doc: nlp::NLPProcessor::process]]
  61 [[doc: nlp::NLPProcessor::process_file]]
  62 [[doc: nlp::NLPProcessor::process_files]]
  63
  64 ## Vector Space Model
  65
  66 [[doc: NLPVector]]
  67
  68 [[doc: vector]]
  69
  70 [[doc: nlp::NLPVector::cosine_similarity]]
  71
  72 ## NitNLP binary
  73
  74 The `nitnlp` binary is given as an example of NitNLP client.
  75 It compares two strings and display ther cosine similarity value.
  76
  77 Usage:
  78
  79 ~~~raw
  80 nitnlp --cp "/path/to/jars" "sort" "Sorting array data"
  81 0.577
  82 ~~~
  83
  84 ## TODO
  85
  86 * Use JWrapper
  87 * Use options to choose CoreNLP analyzers
  88 * Analyze sentences dependencies
  89 * Analyze sentiment