You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stanfordcorenlp is a Python wrapper for Stanford CoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.
Prerequisites
Java 1.8+ (Check with command: java -version) (Download Page)
# Simple usagefromstanfordcorenlpimportStanfordCoreNLPnlp=StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')
sentence='Guangdong University of Foreign Studies is located in Guangzhou.'print'Tokenize:', nlp.word_tokenize(sentence)
print'Part of Speech:', nlp.pos_tag(sentence)
print'Named Entities:', nlp.ner(sentence)
print'Constituency Parsing:', nlp.parse(sentence)
print'Dependency Parsing:', nlp.dependency_parse(sentence)
nlp.close() # Do not forget to close! The backend server will consume a lot memery.
Note: you must download an additional model file and place it in the .../stanford-corenlp-full-2018-02-27 folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar file if you want to process Chinese.
# _*_coding:utf-8_*_# Other human languages support, e.g. Chinesesentence='清华大学位于北京。'withStanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') asnlp:
print(nlp.word_tokenize(sentence))
print(nlp.pos_tag(sentence))
print(nlp.ner(sentence))
print(nlp.parse(sentence))
print(nlp.dependency_parse(sentence))
General Stanford CoreNLP API
Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.
# General json outputnlp=StanfordCoreNLP(r'path_to_corenlp', memory='8g')
printnlp.annotate(sentence)
nlp.close()
pipelineLanguage: en, zh, ar, fr, de, es (English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)
outputFormat: json, xml, text
text='Guangdong University of Foreign Studies is located in Guangzhou. ' \
'GDUFS is active in a full range of international cooperation and exchanges in education. 'props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
printnlp.annotate(text, properties=props)
nlp.close()
# Use an existing servernlp=StanfordCoreNLP('https://localhost', port=9000)
Debug
importloggingfromstanfordcorenlpimportStanfordCoreNLP# Debug the wrappernlp=StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)
# Check more info from the CoreNLP Server nlp=StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()
Build
We use setuptools to package our project. You can build from the latest source code with the following command: