#013038: Error parsing non us-ascii attachment files names

Description:

When parsing a mail with an attached file which name is encoded using the RFC 2047, the file name is not properly extracted.

Here is an header example :


content-disposition: attachment;
 filename="=?iso-8859-1?Q?Val=E9rie_TEST_CV=2Epdf?="

The file name should be extracted as "Valérie_TEST_CV.pdf", but is currently empty and the temporary file on disk is set as "=_iso-8859-1_Q_Val=E9rie_TEST_CV=2Epdf_="

Here are the modifications I've done to fix this problem :
---
In rfc2231_implementation.php :
Line 49, ereg changed (greedy matching) :


'/\s*(\S*)="?([^;"]*);?/i'

to:

'/\s*(\S*?)="?([^;"]*);?/i'

Line 143 :


$cd->fileName = $data['value'];

to:

$cd->fileName = ezcMailTools::mimeDecode($data['value']);

In file_parser.php, line 93 :


$fileName = trim( $matches[1], '"' );

to:

$fileName = ezcMailTools::mimeDecode(trim( $matches[1], '"' ));

---
Bu it still don't work with an header like this one :


Content-Disposition: attachment; filename="=?iso-8859-1?q?Lettre=20de=20motivation=20directeur=20de=20client=E8le.do?=
 c?="

Neither with this one (which seems to come from Apple) :


content-disposition: attachment;
    filename*=ISO-8859-1''CV%20Robert%20Dom%E9nie.doc


Environment:

Operating System:
PHP Version: 5.2.2


Steps to Reproduce:

Parse a mail with one of the header above.


- Attachments

No attachments for this issue.


- Comments

Fixed in rev. 8265. Support was added for this kind of attachment filenames:


content-disposition: attachment;
    filename*=ISO-8859-1''CV%20Robert%20Dom%E9nie.doc

and


content-disposition: attachment;
 filename="=?iso-8859-1?Q?Val=E9rie_TEST_CV=2Epdf?="

This is not supported (broken):


Content-Disposition: attachment; filename="=?iso-8859-1?q?Lettre=20de=20motivation=20directeur=20de=20client=E8le.do?=
 c?="

or this one (see issue http://issues.ez.no/13098url):


Content-Type: application/x-stuff;
                    title*1*=us-ascii'en'This%20is%20even%20more%20
                    title*2*=%2A%2A%2Afun%2A%2A%2A%20
                    title*3=\"isn't it!"

#256939 by Alexandru Stanoi on June 4th, 2008 [Permanent Link]

We've always the problem :

Content-Disposition: attachment;
filename="=?ISO-8859-1?Q?Copie_de_im=E0ge=5Faccentu=E9.jpg?="

The file name should be extracted as "Copie de imàge_accentué.jpg"

But we have :

=_ISO-8859-1_Q_Copie_de_im=E0ge=5Faccentu=E9.jpg_=

Ezc version : 2008.1

#257908 by feriani khaled on August 25th, 2008 [Permanent Link]

It says in the previous comment (#256939) that these types of headers are not supported, because they are broken.

#257916 by Alexandru Stanoi on August 26th, 2008 [Permanent Link]

The fix for this issue was added to SVN in rev. 8265. The fix was to add the displayFileName property to the ezcMailContentDispositionHeader class, which contains the MIME-decoded file name of an attachment. The tutorial was not updated to reflect this change, so it was easily missed.

Feriani can you please check if using $part->contentDisposition->displayFileName solves the problem?

As always there can be broken headers sent by broken client software. One example is in this bug report in comment #256939.

#257956 by Alexandru Stanoi on August 27th, 2008 [Permanent Link]

Why is it broken because it seems ok ?

Is it the correct header :

Content-Type: image/jpeg;
name="Copie de =?ISO-8859-1?Q?im=E0ge=5Faccentu=E9=2Ejpg?="

But some mail client send invalid header, and ezc don't return a value to inform us if the header is broken, so what can we do ?

#257944 by feriani khaled on August 27th, 2008 [Permanent Link]

We have realised 2 test one with Gmail and the other with thunderbird :

The name of the attachment is : Copie de imàge_accentué.jpg

Extract of the Original headers for gmail :

Content-Type: image/jpeg;
name="=?ISO-8859-1?Q?Copie_de_im=E0ge=5Faccentu=E9.jpg?="
Content-Transfer-Encoding: base64
X-Attachment-Id: f_fke3wnng0
Content-Disposition: attachment;
filename="=?ISO-8859-1?Q?Copie_de_im=E0ge=5Faccentu=E9.jpg?="

Extract of the Original headers for thunderbird :

Content-Type: image/jpeg;
name="Copie de =?ISO-8859-1?Q?im=E0ge=5Faccentu=E9=2Ejpg?="
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename*0*=ISO-8859-1''%43%6F%70%69%65%20%64%65%20%69%6D%E0%67%65%5F%61;
filename*1*=%63%63%65%6E%74%75%E9%2E%6A%70%67


echo $part->fileName;
echo $part->contentDisposition->displayFileName;

Results with a mail from gmail :

- .../tmp/11455-5/=_ISO-8859-1_Q_Copie_de_im=E0ge=5Faccentu=E9.jpg_=
- Copie de imà ge_accentué.jpg

Results with a mail from thunderbird :
- .../var/tmp/11455-4/Copie de =_ISO-8859-1_Q_im=E0ge=5Faccentu=E9=2Ejpg_=
- Copie de imà ge_accentué.jpg

Using $part->contentDisposition->displayFileName did the trick. Thanks

#257976 by feriani khaled on August 28th, 2008 [Permanent Link]

- History
Properties
Type Bug
Priority Medium
Component Components » Mail
Affects 2007.2.1 - eZ components 2007.2.1
2007.2beta1 - eZ components 2007.2beta1
Fix Version 2008.1RC1 - eZ components 2008.1RC1
Reporter Nicolas Huguet
Responsible Alexandru Stanoi
Status 0 Closed
Resolution Fixed
Created May 16th, 2008
Updated August 28th, 2008
Resolved June 4th, 2008
 
Navigation [Permanent Link]
Previous Issue
Back to Issues List
Next Issue: #015537
  Graph shows to small and truncated rotated axis labels