E-Mail Q&A

Preview:

DESCRIPTION

E-Mail Q&A. Telecooperation Group TU Darmstadt. Interoperability. No need to implement everything from RFCs 2045-2047 Way too much work Correctly implemented, you would out-standard most common e-mail clients Your implementation should have this functionality 7Bit encoding - PowerPoint PPT Presentation

Citation preview

Telecooperation

Technische Universität Darmstadt

Copyrighted material; for TUD student use only

E-Mail Q&A

Telecooperation GroupTU Darmstadt

2

Prof. Dr. M. MühlhäuserTelekooperation

©

Interoperability

• No need to implement everything from RFCs 2045-2047

– Way too much work

– Correctly implemented, you would out-standard most common e-mail clients

• Your implementation should have this functionality

– 7Bit encoding

– Quoted printable & Base64 encoding with all charsets Java can handle (i.e. every charsetName that does not throw an UnsupportedEncodingException)

– Multipart messages are recognized and decoded correctly

– Robustness: Do not choke on unrecognized headers

• Programs will be tested with public test cases + secret ones

– Secret test cases only use above mentioned functionality, too

3

Prof. Dr. M. MühlhäuserTelekooperation

©

Headers

• Multiline-Headers

– Line continuations start with a “folding whitespace” –may be space or tab (\t)

• Ignore every header you do not know

– If you want, you can also display additional headers like BCC – but required are only those mentioned in milestone 3.1

• Case-sensitivity

– Header names are always case-insensitive

• c.f. RFC 2822, section 1.2.2. „Characters will be specified […] by a case-insensitive literal value enclosed in quotation marks“

– Header values used in the assignment are usually case-insensitive, e.g. Content-Transfer-Encoding: Base64 and base64 are both possible

• Exceptions: multipart-boundaryall header values displayed to the user

4

Prof. Dr. M. MühlhäuserTelekooperation

©

Date

• Look into the documentation of SimpleDateFormat– no need to parse each item for yourself, even recognizes

“GMT” and “UTC” as timezones

– Modify the parser with Locale.US in order to let it parse things like “May”

• Output via DateFormat.getDateTimeInstance()

• Timezone– Setting via SimpleDateFormat or Calender#setTimeZone

is preferred to manual time manipulation

– Reason: DateFormat may be configured to display the timezone

5

Prof. Dr. M. MühlhäuserTelekooperation

©

Attachments

• Base64 encoded lines are always 76 characters wide – only exception is the last line

• If numberofchars % 4 != 0, you may just throw an exception and terminate

• Do not use javax.mail.internet.MimeUtility or similar additional libraries for decoding

• Use the Content-Disposition header to suggest a name for saving

• Attachments that are not of type text/… don’t have and don’t need a charset

– Just treat as stream of bytes/byte array

6

Prof. Dr. M. MühlhäuserTelekooperation

©

Base64-Example

• Take group of 4 charactersS W 4 g

• Decode according to RFC

– S = 0x12; W = 0x16; 4 = 0x38; g = 0x20

– Decoding may be done in groups: A-Z char – ‘A’; a-z char – ‘a’ + 26;0-9 = char – ‘0’ + 26*2; +, /, = must be treated separately

• Combine to 24 bit number, shift according to index (big endian)

– 0x12 << 18 | 0x16 << 12 | 0x38 << 6 | 0x20 << 0 0x496e20

• Shift number back in 8 bit blocks (also big endian)

– Byte 0 = 0x496e20 >> 16 & 0xff = 0x49

– Byte 1 = 0x496e20 >> 8 & 0xff = 0x6e

– Byte 2 = 0x496e20 >> 0 & 0xff = 0x20

7

Prof. Dr. M. MühlhäuserTelekooperation

©

Decoding

• Your own input stream– Elegant way of decoding Base64 and Quoted-Printable

data(you can do it differently, only a suggestion)

1. Extend java.io.InputStream2. Take character-array of undecoded data as

parameter3. Overwrite read()

– Decode the character data when– Return -1 if end of data reached

4. Let the InputStreamReader deal with the nasty problem of decoding charsets• Sample application has only 50 LoC for decoding quoted

printable, 100 LoC for Base64

8

Prof. Dr. M. MühlhäuserTelekooperation

©

Regular Expressions

• Regular expressions are a nice way for filtering out substrings

• A bit like file name patterns (*, ?), but more powerful

– Letters, Numbers remain the same

– Punctuation characters usually have a special meaning, for characters escape them by a \

• to use the character [, use \[

• Attention: you need to escape the Backslash in Java-Strings \[ == "\\["

– Alternatives: use []

• [abc] matches a or b or c

• [A-Z] matches A or B or … or Z

• Negation: [^abc] matches everything but a or b or c

– Wildcard . matches everything

– Repetition

• * means “the previous element zero or more times”

• + means “the previous element one or more times”

9

Prof. Dr. M. MühlhäuserTelekooperation

©

Regular Expressions with Java

• Part of java.util.regex

• First, compile the pattern to search:– Pattern p = Pattern.compile("charset=[^ ]*")

– The compile method has a variant that takes flags – use it for case-insensitivity: Pattern.CASE_INSENSITIVE

• Next, make a Matcher for a String out of it– Matcher m = p.match("Content-Type: text/plain;

charset=\"us-ascii\"")

• Be sure to call the Matcher’s find method– m.find()

• m.group(0) now contains everything that maches– charset="us-ascii"

10

Prof. Dr. M. MühlhäuserTelekooperation

©

Grouping

• You need the thing after “charset=“– Solution 1: parse for yourself

– Solution 2: add groups to the expression

• Groups are signified by () and counted from 1– Pattern p = Pattern.compile("charset=([^ ]*)")

• After matching, group(1) contains "\"us-ascii\")

11

Prof. Dr. M. MühlhäuserTelekooperation

©

Debugging

• Mail clients should be able to connect to the server and fetch the mail

• Always helpful: try to connect to the pop-server via telnet and issue POP commands manually– For closer examination, you may unzip the JAR-file and

have a look at “mailbox.xml”

Recommended