CARVIEW |
Select Language
HTTP/2 200
server: nginx
content-type: text/html;charset=utf-8
cache-control: must-revalidate
expires: Fri, 01 Jan 1999 00:00:00 GMT
set-cookie: trac_form_token=af0cf4e6e31b893fe1a991ab; HttpOnly; Path=/; Secure
set-cookie: trac_session=6f1f351961792c6c8f1e26f0; expires=Tue, 21 Oct 2025 05:27:04 GMT; HttpOnly; Path=/; Secure
strict-transport-security: max-age=31536000; includeSubDomains; preload
permissions-policy: interest-cohort=()
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
accept-ranges: bytes
via: 1.1 varnish, 1.1 varnish
date: Wed, 23 Jul 2025 05:27:04 GMT
x-served-by: cache-fra-etou8220136-FRA, cache-bom-vanm7210054-BOM
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1753248425.645402,VS0,VE341
vary: Accept-Encoding
UnicodeInDjango – Django
Back to Top
Django
The web framework for perfectionists with deadlines.
Issues
Unicode and Django
This page is to make an impact analysis on the Django source to see what parts of it need what changes if we want to switch Django from using utf-8 bytestrings internally to fully use unicode strings internally.
Just a pin-down of things that spring to mind, all of them need more complete checking:
- database backends need to handle unicode vs. DATABASE_CHARSET translations
- special casing: the psycopg backend will need type handlers for string types (just as it already has type handlers for date/time types)
- the HTTPResponse sending machinery needs to do the unicode to DEFAULT_CHARSET translation
- the HTTPRequest creation process needs to turn outside strings into unicode strings, using the provided charset (if given) or defaulting to DEFAULT_CHARSET (as that is what was sent to the browser when the form was transmitted)
- There should be a way to access the original "raw" (as bytes) GET and POST data. Django already provides raw POST data using the raw_post_data attribute. Perhaps raw_get_data should also be added.
- Special casing: what happens with GET parameters? those don't provide charsets, what should we do if DEFAULT_ENCODING is utf-8, but the GET parameters aren't valid utf-8? The clean way would be to throw an exception (like with all other places, too)
- The current URI spec (RFC 3986) clearly states that all URIs must be encoded according to UTF-8 so we can assume that this is the case. If this causes a UnicodeDecodeError it makes sense to fall back on windows-1252 or latin-1. Has anyone taken a look at Mark Pilgrim's Universal Encoding Detector? - Noah Slater
- template loaders need to do DEFAULT_CHARSET to unicode translation
- internal usage of str() needs to be checked and supposedly changed over to unicode() usage
- debugging stuff needs to use repr() on strings, not str() (or use unicode() and let the HTTP response handling stuff handle the conversion - most debugging stuff is working with the response machinery anyway)
- mail sending functions need to do the right thing with the MIME type
- we should decide wether to normalize the input unicode data so that at the database or application level we can match strings regardless of their decomposition (see the standard lib’s unicodedata module with its
normalize()
function). I would go for NFC, if there’s consensus around normalizing. - Lazy evaluated method calls do not currently work with Unicode return values, see #1664. I have provided a potential workaround. - Noah Slater
Please either complete the above list or add headlines with more detailed discussions of the points above. Please only post results here, discussion should take place on the django-developer list.
References
Last modified
13 years ago
Last modified on Jun 7, 2012, 5:29:52 AM
Note:
See TracWiki
for help on using the wiki.
Download in other formats:
Django Links
Learn More
Get Involved
Follow Us
- Hosting by In-kind donors
- Design by Threespot &
© 2005-2025 Django SoftwareFoundation unless otherwise noted. Django is a registered trademark of the Django Software Foundation.