PragmaticSegmenterNet

This project is a direct port of Pragmatic Segmenter which provides rule-based sentence boundary detection.

Usage

The Segmenter class provides the Segment method which in the simplest usage takes a string:

using PragmaticSegmenterNet;
IReadOnlyList<string> result = Segmenter.Segment("One Sentence. And another sentence.");
// ["One Sentence.", "And another sentence."]
IReadOnlyList<string> result2 = Segmenter.Segment("Anything.", Language.Italian);
// ["Anything"]

The Segment method has a number of optional parameters:

IReadOnlyList<string> Segment(string text, Language language = Language.English, bool cleanText = true, DocumentType documentType = DocumentType.Any)

Language - An enum representing the supported languages, the default is English, see the supported languages list below for the list of currently supported languages.
CleanText - A boolean indicating whether the input text should be cleaned prior to segmentation. Cleaning removes extra newlines and whitespace. Defaults to true.
DocumentType - Used by the text cleaning process to determine which reformatting to apply. For PDFs this handles newlines in the middle of a sentence whereas for HTML documents this will handle HMTL tags. Defaults to any which does not apply any special formatting.

Languages

English = 0 (default)
Amharic = 1
Arabic = 2
Armenian = 3
Bulgarian = 4
Burmese = 5
Chinese = 6
Danish = 7
Dutch = 8
French = 9
German = 10
Greek = 11
Hindi = 12
Italian = 13
Japanese = 14
Kazakh = 15 (partial support, potentially only for the Cyrillic form of the alphabet)
Persian = 16
Polish = 17
Russian = 18
Spanish = 19
Urdu = 20

Credit

This project wouldn't be possible without the work done by Pragmatic Segmenter team.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
PragmaticSegmenterNet.Tests.Annotator		PragmaticSegmenterNet.Tests.Annotator
PragmaticSegmenterNet.Tests.Unit		PragmaticSegmenterNet.Tests.Unit
PragmaticSegmenterNet		PragmaticSegmenterNet
.gitignore		.gitignore
LICENSE		LICENSE
PragmaticSegmenterNet.sln		PragmaticSegmenterNet.sln
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PragmaticSegmenterNet

Usage

Languages

Credit

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

UglyToad/PragmaticSegmenterNet

Folders and files

Latest commit

History

Repository files navigation

PragmaticSegmenterNet

Usage

Languages

Credit

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages