Display an HTML email

From REALbasicWiki

Jump to: navigation, search
Overall article skill Skill ranges from beginner (green) to expert (red)
Image:Warning24.png The following solution has some major limitations and should be considered as a draft that you should improve in your REALbasic project. If you really want to dive into email parsing, have a look at the open source project Zymail.

As HTML-enriched emails are more and more common, you may be interested into displaying such email (with inline pictures) in your REALbasic project.

For the moment, I only tried an email sent and received by Mac OS X Mail and saved to disk using the Select a menu item: find the first item in the menu bar and select each other item in order File ▶ Save As... command. Such file has a .eml extension.

Contents

[edit] The basics

HTML emails with images mays either use a link to an image stored on a server (bad) or may embed the image files inside the body of the email (good). The latter are called inline images. To achieve this goal, image files are stored as encoded attachments inside the email, each of them having a different Content ID, aka CID. To display such images, the HTML code of the email uses a special notation cid:content_ID_of_the_attachment. The problem is that the HTMLViewer control in REALbasic has not been designed to handle all that stuff.

[edit] Step 1: decoding and writing attachments to disk

HTMLViewer needs real image files and, if they are not on a server, they must be somewhere in a folder. Also, attachments are usually encoded to avoid exotic characters, for example in Base64, but we will not have to handle this.

Sub DisplayHTMLemail( eMail as EmailMessage, intoViewer as HTMLViewer )
Dim attachments() as EmailAttachment
Dim TmpMail as FolderItem
Dim f as FolderItem

//==== PART 1 ====

//First make a temporary folder to store the files
TmpMail = TemporaryFolder.Child( "Tempmail" + Str( Ticks ) )
TmpMail.CreateAsFolder

//Then save the attachments to disk.
attachments() = eMail.Attachments //Get all the attachments of this email

For each ema as EmailAttachment in attachments() //"ema" stands for "EMail Attachment"
f = TmpMail.Child( ema.name )
call ema.SaveToFile( f ) //Quick'n'dirty, we do not even check if it has succeeded :-(
//NOTE: SaveToFile will automatically decode the attachment. Smart !
Next

//For other parts, see below
End Sub

[edit] Step 2: getting the missing information

Unfortunately, the EmailAttachment class does not give us access to the content ID that we need. So we will have to find it by ourselves.

With Apple's Mail, each attachment header looks like

--Apple-Mail-23--848320398
Content-Disposition: inline;
	filename=bg_pattern.jpg
Content-Transfer-Encoding: base64
Content-Type: image/jpeg;
	x-unix-mode=0664;
	x-apple-mail-type=stationery;
	name="bg_pattern.jpg"
Content-Id: <1CB71515-3D77-4D6A-B1C4-1F8DDB38B9D3/bg_pattern.jpg>

<-- Here comes the encoded data

So we would:

  1. Scan the raw email content to find every line with begins with "Content-Id: <"
  2. Once we found such a line we must:
    • Go backward until we find a line beginning with "--" (which indicates the beginning of the attachment)
    • Go forward until we find an empty line, which separates the header from the data
  3. Inside this isolated block of text, we must collect the filename and the corresponding Content ID.

NOTE: The code below uses class extensions defined in String extensions (by SteffX)

Sub DisplayHTMLemail( eMail as EmailMessage, intoViewer as HTMLViewer )

// <-- here is the part 1 as above ...

//==== PART 2 ====

Dim lines() as string
Dim a, b as integer
Dim names(), cids() as string

lines = Split( eMail.RawSource, EndOfLine.Windows )

for i as integer=0 to ubound( lines ) //Scan the lines
if lines( i ).BeginsWith( "Content-Id: <" ) then //We found the base line
//We have a match. Find the first line of this header
for j as integer=i downto 0 //Go backward starting from i
if lines( j ).BeginsWith( "--" ) then //First line of header
a = j //Store the line number
exit //Exit the for...next
end if
next

//Now find the LAST line of the header
for j as integer=i+1 to UBound( lines ) //Go forward starting from i+1
if lines( j )="" then //An empty line separated the header from the data
b = j - 1 //Store the last line index
exit //Exit the for...next loop
end if
next

//So now we know that the full attachment header is between lines( a ) and lines( b ), inclusive

//Find the ‘ name="..." ’ value
for j as integer=a to b //Search inside the header
if lines( j ).Instr( "name=""" ) <> 0 then //The line contains a ( name=" ) string
//Store the value of name="..." into names() array
names.Append NthField( lines( j ), """", 2 )
//Also store the corresponding Content ID into cids() array
cids.Append lines( i ).StringBetween( "Content-ID: <", ">" )

end if
next

end if
next
End sub

[edit] Part 3: transforming the HTML code

As we said above, HTML inline pictures use cid:content_ID_of_the_attachment as a reference but we stored the attachments by their name. Hence, we need to replace all the references to the cid by a simple reference to a file name.

Sub DisplayHTMLemail( eMail as EmailMessage, intoViewer as HTMLViewer )

// <-- insert part 1 as above ...
// <-- insert part 2 as above ...

//==== PART 3 ====

Dim html as string

html = eMail.BodyHTML

for j as integer=0 to Ubound( names ) //For each name we previously found
//Replace every cid:..... by the corresponding file name
html = ReplaceAll( html, "cid:" + cids( j ), names( j ) )
next

[edit] Part 4: displaying the email

HTMLViewer class accepts the LoadPage method with the HTML code as string and a FolderItem which represents the folder containing the files referenced from the HTML code.

Sub DisplayHTMLemail( eMail as EmailMessage, intoViewer as HTMLViewer )

// <-- insert part 1, 2 and 3 as above ...

//==== THE BEST PART ====

intoViewer.LoadPage html, TmpMail

And it works !! (well it should)

[edit] Download

A REALbasic project using the code described above is available here http://ufos-software.com/HTMLmail.rbp

Personal tools
related