| CARVIEW |
#35316 closed defect (bug) (fixed)
Images with latin extended characters in exif (slovak/czech) are missing thumbnails
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 4.4.2 | Priority: | normal |
| Severity: | normal | Version: | 4.4 |
| Component: | Media | Keywords: | has-patch |
| Focuses: | Cc: |
Description
after uploading image with latin extended characters in exif (slovak/czech) there is no thumbnail in media library and image sizes (thumb, medium, large and all custom sizes) are missing in wp, so wordress will autmatically insert fullsize image in themes where thumbnail size should be. the image sizes (files) are created but are NOT registered in wordpress.
Attachments (3)
- 35316.patch (504 bytes) - added by ocean90 10 years ago.
- Image_keywords.png (134.8 KB) - added by pavelevap 10 years ago.
- gabcikovo.jpg (3.1 MB) - added by dd32 10 years ago.
- Image from https://core.trac.wordpress.org/ticket/35316#comment:6 for archival purposes
Change History (24)
#1
@swissspidy
10 years ago
- Milestone Awaiting Review deleted
- Resolution set to duplicate
- Status changed from new to closed
- Version 4.4 deleted
#2
@michalrusina
10 years ago
- Keywords reporter-feedback added
- Resolution duplicate deleted
- Status changed from closed to reopened
- Version set to 4.4
i dont think that this is a duplicate of #15955, in this case there is nothing wrong with filenames (filename gabcikovo.jpg has no extended characters), the whole problem lies in exif data. as i mentioned earlier, the files are successfully generated, but wordpress assumes that theyre not.
#4
@swissspidy
10 years ago
- Keywords reporter-feedback removed
#5
@pavelevap
10 years ago
@michalrusina: Interesting, could you please upload or link any image example for testing?
#6
@michalrusina
10 years ago
@pavelevap image for testing: https://a-static.projektn.sk/2016/01/gabcikovo.jpg
#7
@pavelevap
10 years ago
Confirmed. It works well in 4.2.4 and 4.3.1, but 4.4 is broken.
#8
@pavelevap
10 years ago
_wp_attached_file is created, but _wp_attachment_metadata is missing.
This ticket was mentioned in Slack in #core by pavelevap. View the logs.
10 years ago
#10
@michalrusina
10 years ago
- Severity changed from normal to major
@ocean90
10 years ago
- Attachment 35316.patch added
#11
@ocean90
10 years ago
- Keywords has-patch added
- Milestone changed from Awaiting Review to 4.4.2
- Severity changed from major to normal
Since #33772 attachment metadata includes IPTC keywords. But the keywords are not UTF8 encoded like titles or captions, see tags/4.4/src/wp-admin/includes/image.php?marks=405-409#L404.
Current output:
array(5) {
["width"]=>
int(4016)
["height"]=>
int(2673)
["file"]=>
string(23) "2016/01/gabcikovo-3.jpg"
["sizes"]=>
array(5) {
["thumbnail"]=>
array(4) {
["file"]=>
string(23) "gabcikovo-3-150x150.jpg"
["width"]=>
int(150)
["height"]=>
int(150)
["mime-type"]=>
string(10) "image/jpeg"
}
["medium"]=>
array(4) {
["file"]=>
string(23) "gabcikovo-3-300x200.jpg"
["width"]=>
int(300)
["height"]=>
int(200)
["mime-type"]=>
string(10) "image/jpeg"
}
["medium_large"]=>
array(4) {
["file"]=>
string(23) "gabcikovo-3-768x511.jpg"
["width"]=>
int(768)
["height"]=>
int(511)
["mime-type"]=>
string(10) "image/jpeg"
}
["large"]=>
array(4) {
["file"]=>
string(24) "gabcikovo-3-1024x682.jpg"
["width"]=>
int(1024)
["height"]=>
int(682)
["mime-type"]=>
string(10) "image/jpeg"
}
["post-thumbnail"]=>
array(4) {
["file"]=>
string(24) "gabcikovo-3-1200x799.jpg"
["width"]=>
int(1200)
["height"]=>
int(799)
["mime-type"]=>
string(10) "image/jpeg"
}
}
["image_meta"]=>
array(12) {
["aperture"]=>
float(4)
["credit"]=>
string(4) "TASR"
["camera"]=>
string(8) "NIKON D4"
["caption"]=>
string(126) "Na snímke turbína na výrobu elektrickej energie vo Vodnej elektrárni Gabèíkovo 9. marca 2015. FOTO TASR - Martin Baumann"
["created_timestamp"]=>
int(1425908436)
["copyright"]=>
string(22) "Tlaèová agentúra SR"
["focal_length"]=>
string(2) "14"
["iso"]=>
string(4) "2500"
["shutter_speed"]=>
string(17) "0.066666666666667"
["title"]=>
string(46) "Vodná turbína na výrobu elektrickej energie"
["orientation"]=>
int(1)
["keywords"]=>
array(2) {
[0]=>
string(58) "Slovensko vl�da energetika Vodn� elektr�re� Gab��kovo prem"
[1]=>
string(17) "Fico n�v�teva TTX"
}
}
}
35316.patch encodes the keywords.
Until this gets fixed in core you can use the following function:
<?php function trac35316_fix_iptc_keywords_encoding( $meta ) { foreach ( $meta['keywords'] as $key => $keyword ) { if ( ! seems_utf8( $keyword ) ) { $meta['keywords'][ $key ] = utf8_encode( $keyword ); } } return $meta; } add_filter( 'wp_read_image_metadata', 'trac35316_fix_iptc_keywords_encoding' );
#12
follow-up:
↓ 13
@pavelevap
10 years ago
@ocean90: Great, patch works well for adding available sizes.
How could wrong encoding leads to missing _wp_attachment_metadata? I am not sure about that...
There are still some encoding issues:
- 3 original keywords from image
Slovensko vláda energetika Vodná elektráreň Gabčíkovo premiér Fico návšteva TTX Slovensko vláda energetika Vodná elektráreň Gabčíkovo prem
- Only 2 results from
_wp_attachment_metadatawith some wrong encoding
[keywords] => Array
(
[0] => Slovensko vláda energetika Vodná elektráreò Gabèíkovo prem
[1] => Fico návteva TTX
)
#13
in reply to:
↑ 12
@ocean90
10 years ago
Replying to pavelevap:
How could wrong encoding leads to missing
_wp_attachment_metadata? I am not sure about that...
Because of the broken chars the data gets blocked by some sanity checks in wpdb.
3 original keywords from image
I have only 2:
("Slovensko vl\U00e1da energetika Vodn\U00e1 elektr\U00e1re\U0148 Gab\U010d\U00edkovo premi\U00e9r ","Fico n\U00e1v\U0161teva TTX")
with some wrong encoding
It uses utf8_encode() which is also used for titles and captions. Are those wrong too? If yes that should probably be handled in a separate ticket.
But I also noticed that the long keyword is truncated, but it's already truncated when the the data comes from iptcparse().
#14
@pavelevap
10 years ago
@ocean90: You are right, chars are probably stripped inside strip_invalid_text(): https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2788
And that is why process_fields() returns false here: https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2085
Function returns only 2 keywords, but Windows shows 3 (see attached screenshot). I am not sure what is wrong.
Encoding: Yes, also caption (later saved as post_excerpt) is wrong: Gabèíkovo should be Gabčíkovo. Also copyright is wrong, title probably does not contain problematic chars.
@pavelevap
10 years ago
- Attachment Image_keywords.png added
#15
@dd32
10 years ago
A test image full of non-utf8 data that we can test with and also throw into some unit tests would be beneficial here.
#16
@michalrusina
10 years ago
@dd32 this image triggers this bug (was referenced earlier) https://a-static.projektn.sk/2016/01/gabcikovo.jpg
@dd32
10 years ago
- Attachment gabcikovo.jpg added
Image from https://core.trac.wordpress.org/ticket/35316#comment:6 for archival purposes
#17
@dd32
10 years ago
Thanks @michalrusina I read over that and missed it :(
I've uploaded it here for archival purposes incase the origin ever switches it out.
#18
@ocean90
10 years ago
- Owner set to ocean90
- Resolution set to fixed
- Status changed from reopened to closed
#20
@DrewAPicture
10 years ago
#21
@johnbillion
10 years ago
#35325 was marked as a duplicate.
Hi there, thanks for the report.
We're tracking this issue in #15955, see also comment:11:ticket:15955.
Related/duplicates: #18634, #19842, #21217, #22363, #28808, #23588, #32887.