CARVIEW |
Select Language
HTTP/2 200
date: Mon, 28 Jul 2025 17:47:47 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"989375e11888368b10a8cf933cc25b65"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=GKVzOiSQFWqxPicmb7RUn8T%2BAXyQ3Z2BnNWZUkP8I0zQfQnYKQ%2BHfwMq5tS06ikeepRnO%2FwpAcHJSWUiHkiJA5MrWd43AKlfYD1a9Z1X9Xhk8xO6lGR6TBU0GVMTCfXokTMzlTj6eXTDviY7s8n0I%2BQBMpH1iHLQtfbiMgmPf4etqgmPtsDiep1dTB%2FLNCUPr7a2mV1vDPKq2tRc4iP9b2cAR3SSnzbKr4Vdu1UruOyPoWtspzdNnY%2FRYmWhKARlCPIMhbmVOhCjoLSCRHb8gg%3D%3D--%2BmaO9tMCBQ26vuGO--FqQMKRl4eRNNOlYxobdApA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1587797573.1753724867; Path=/; Domain=github.com; Expires=Tue, 28 Jul 2026 17:47:47 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 28 Jul 2026 17:47:47 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: A194:78ABB:9525A:B1D10:6887B7C3
Release v1.4.0: Improved language data and alpha Dutch support Β· explosion/spaCy Β· GitHub
Loading
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
v1.4.0: Improved language data and alpha Dutch support
Compare
β¨ Major features and improvements
- NEW: Alpha support for Dutch tokenization.
- Reorganise and improve format of language data.
- Add shared tag map, entity rules, emoticons and punctuation to language data.
- Convert entity rules, morphological rules and lemmatization rules from JSON to Python.
- Update language data for English, German, Spanish, French, Italian and Portuguese.
π΄ Bug fixes
- Fix issue #649: Update and reorganise stop lists.
- Fix issue #672: Make
token.ent_iob_
return unicode. - Fix issue #674: Add missing lemmas for contracted forms of "be" to
TOKENIZER_EXCEPTIONS
. - Fix issue #683:
Morphology
class now supplies tag map value for the special space tag if it's missing. - Fix issue #684: Ensure
spacy.en.English()
loads the Glove vector data if available. Previously was inconsistent with behaviour ofspacy.load('en')
. - Fix issue #685: Expand
TOKENIZER_EXCEPTIONS
with unicode apostrophe (β
). - Fix issue #689: Correct typo in
STOP_WORDS
. - Fix issue #691: Add tokenizer exceptions for "gonna" and "Gonna".
β οΈ Backwards incompatibilities
No changes to the public, documented API, but the previously undocumented language data and model initialisation processes have been refactored and reorganised. If you were relying on the bin/init_model.py
script, see the new spaCy Developer Resources repo. Code that references internals of the spacy.en
or spacy.de
packages should also be reviewed before updating to this version.
π Documentation and examples
- NEW: "Adding languages" workflow.
- NEW: "Part-of-speech tagging" workflow.
- NEW: spaCy Developer Resources repo β scripts, tools and resources for developing spaCy.
- Fix various typos and inconsistencies.
π₯ Contributors
Thanks to @dafnevk, @jvdzwaan, @RvanNieuwpoort, @wrvhage, @jaspb, @savvopoulos and @davedwards for the pull requests!
Assets 2
You canβt perform that action at this time.