Introduction
This document is a tutorial for encoding data into PDF417 bar codes. It is a step-by-step guide with a direct and practical approach. It will explain how we can convert data into a PDF417 bar code.
One primary objective during the development of PDF417 was to provide the users with an extremely flexible symbology that could satisfy a wide variety of unique requirements. PDF417, we believe, achieves that objective.
If you are a bar code hardware manufacturer, as you review the PDF417 Specification and this tutorial, please keep in mind the user-specified options and plan your hardware and software implementation accordingly. The user-specified options are x-dimension, y-dimension, symbol aspect ratio, and security level .
PDF417 Overview
PDF417, or Portable Data File 417, is a two-dimensional stacked bar code symbology capable of encoding over a kilobyte of data per label. This is important for applications where a bar code must be more than merely an identifier such as an index to reference a database.
The “portable data file” approach is well suited to applications where it is impractical to store item information in a database or where the database is not accessible when and where the item’s bar code is read. Because a PDF417 symbol can store so much data, item data such as the content of shipping manifest or equipment maintenance history can be carried on the item, without requiring access to a remote database.
Encoding data into a PDF417 bar code is a two-step process. First, data is converted into codeword values of 0 - 928, which represent the data. This is “high-level encoding.” Then the values are physically represented by particular bar/space patterns, which is “low-level encoding.” Decoding is the reverse process.
In addition, PDF417 is an error-correcting symbology designed for real-world applications where portions of labels can get destroyed in handling. It performs error correction by making calculations, if necessary, to reconstruct undecoded or corrupted portions of the symbol.
PDF417 Symbol Description
At first glance, a PDF417 symbol looks like a set of stacked bar codes. When we look closer to analyze how the symbol is put together, there are several key elements. These are rows, start patterns, stop patterns, codewords, and modules.
Symbology Characteristics Basic Characteristics
Additional Features
Basic Characteristics Encodable character set:
Symbol character structure:
Maximum number of data characters:
Symbol size:
Selectable error correction:
Non-data overhead:
Code type:
Character self-checking:
Bi-directionally decodable:
‘PDF417' is a multi-row symbology with the following basic characteristics:
Encodable character set:
Text Compaction mode permits all printable ASCII characters to be encoded, i.e. values 32 - 126 inclusive in accordance with ISO/IEC 646, as well as selected control characters.
Byte Compaction mode permits all 256 possible 8-bit byte values to be encoded. This includes all ASCII characters value 0 to 127 inclusive and provides for international character set support.
Numeric Compaction mode permits efficient encoding of long numeric data digit strings.
Up to 811 800 different character sets or data interpretations.
Various function codewords for control purposes.
Symbol character structure:
n, k, m symbology of 17 modules (n), 4 bar elements (k), with the largest element 6 modules wide (m).
Maximum number of data characters:
Per symbol (at error correction level 0) 925 data codewords which can encode:
Text Compaction mode:
1850 characters (at 2 data characters per codeword).
Byte Compaction mode:
1108 characters at 1.2 data characters per codeword.
Numeric Compaction mode:
2710 characters at 2.93 data characters per codeword.
Symbol size:
Number of rows: 3 to 90.
Number of columns: 1 to 30.
Width in modules: 90X to 583X including quiet zones.
Maximum codeword capacity: 928 codewords.
Maximum data codeword capacity: 925 codewords.
Since the number of rows and columns are selectable, the aspect ratio of a ‘PDF417' symbol may be varied when printing to suit the spatial requirements of the application.
Selectable error correction:
A user may define one of 9 error correction levels labeled levels 0 to 8. All error correction levels, except Level 0, not only detect errors but also can correct erroneously decoded or missing codewords. There are 2 (Level 0) to 512 (Level 8) codewords per symbol.
Non-data overhead:
1. Per row: 73 modules, including quiet zones.
2. Per symbol: a minimum of 3 additional codewords, represented as symbol characters.
Code type:
Continuous, multi-row two-dimensional.
Character self-checking:
Yes.
Bi-directionally decodable:
Yes.
Additional Features The following are additional features, which are inherent or optional in ‘PDF417':
Data Compaction:
Extended Channel Interpretations:
Macro ‘PDF417':
Edge to edge decodable:
Cross row scanning:
Error correction:
Data Compaction:
Inherent: These schemes are defined to compact a number of data characters into codewords. Generally data is not directly represented on a one character for one codeword basis.
Extended Channel Interpretations:
Optional: These mechanisms allow up to 811,800 different data character sets or interpretations to be encoded.
Macro ‘PDF417':
Optional: This mechanism allows files of data to be represented logically and consecutively in a number of ‘PDF417' symbols. Up to 99 999 different ‘PDF417' symbols can be so linked or concatenated and be scanned in any sequence to enable the original data file to be correctly reconstructed.
Edge to edge decodable:
Inherent: As an n, k symbology ‘PDF417' can be decoded by measuring elements from edge to similar edge.
Cross row scanning:
Inherent: The combination of the following three characteristics in ‘PDF417' that allow a single linear scan to cross a number of rows and achieve a partial decode of the data so long as at least one complete symbol character per row is decoded into its codeword. The decoding algorithm can then place the individual codewords into a meaningful matrix.
Being synchronized horizontally, or self clocking
row identification
being vertically synchronized, by using the cluster values to achieve local row discrimination
Error correction:
Inherent: A user may define one of 9 error correction levels. All error correction levels, except Level 0, not only detect errors but can correct erroneously decoded or missing codewords.
Symbol Structure
PDF417 Symbol Parameters
Row Parameters
Codeword Sequence
PDF417 Symbol ParametersEach PDF417 symbol consists of a stack of vertically aligned rows with a minimum of 3 rows (maximum 90 rows). Each row includes a minimum of 1 symbol character (maximum 30 symbol characters), excluding start, stop and row indicator columns. The symbol shall include a quiet zone on all four sides. Figure 1 illustrates a PDF417 symbol.
Figure 1
Row ParametersEach PDF417 row has the following parameters:
A leading quiet zone
A start character
A left row indicator symbol character
1 to 30 symbol characters (or columns)
A right row indicator symbol character
A stop character
A trailing quiet zone
NOTE: The number of symbol characters (or codewords) defined in item 'd' above identifies the number of columns in the PDF417 symbol.
Codeword SequenceA PDF417 symbol may contain up to 928 symbol characters or codewords. ‘Symbol character’ is the more appropriate term used to refer to the printed bar/space pattern; ‘codeword’ is the more appropriate term for the numeric value of the symbol character. The codewords follow this sequence:
The first codeword, the symbol length descriptor, always encodes the total number of data codewords in the symbol, including the symbol length descriptor itself, data codewords and pad codewords, but excluding the number of error correction codewords.
The data codewords follow starting with the most significant encodable character.
Function codewords may be inserted to achieve data compaction.
Pad codewords are subsequently added to enable the codeword sequence to be represented in a rectangular matrix. Pad codewords may also be used to fill additional complete rows to achieve the desired aspect ratio or as specified by an application.
An optional Macro PDF417 Control Block.
Error correction codewords for error detection and correction.
The codewords are arranged with the most significant codeword adjacent to the symbol length descriptor, and are encoded from left to right top row to bottom. Figure 2 illustrates in layout format the sequence for the symbol like that shown in Figure 1. In Figure 2 an error correction level of 1 has been used and one pad character was needed to completely fill the symbol matrix.
The rules and advice for structuring the matrix are included in the table below.
L1 d15 d14 R1
L2 d13 d12 R2
S L3 d11 d10 R3 S
T L4 d9 d8 R4 T
A L5 d7 d6 R5 O
R L6 d5 d4 R6 P
T L7 d3 d2 R7
L8 d1 d0 R8
L9 e3 e2 R9
L10 e1 e0 R10
PDF417 Example of Symbol Layout Schematic
where L, R, d and e are as defined in 3.2
d15 = symbol length descriptor (in this example, with a value of 16)
d14 to d1 = encoded representation of data
d0 = pad codeword
Lowest Level Structure of PDF417Codeword Structure
Start and Stop Pattern Structure
Row Structure
Overall Symbol Structure
Codeword StructureA codeword is the unit for encoding a value representing, or associated with, numbers, letters, or symbols. There are 929 codeword values defined in PDF417. These values are 0 - 928.
There are four bars and four spaces in each codeword. Individual bars or spaces can vary in width from one to six modules, but the combined total per codeword is always 17 modules. The width of one module, or narrowest element width, is the x-dimension of that symbol. Its height is the y-dimension.
Each codeword is defined by an 8 digit sequence, which represents the 4 sets of alternating bar and space widths within that codeword. This is called the x-sequence. For instance, in an x-sequence of 31111334, the first element is a bar three modules wide, followed by five elements one module wide each, one element three modules wide, and the last element six modules wide.
Focusing on element widths along the x-sequence (x0 ... x7), we might diagram another PDF417 codeword like this:
Start and Stop Pattern StructureStart and stop patterns announce where the symbol begins and ends. By doing this, they make reading the bar code possible whether the scan is left-to-right or right-to-left.
For these purposes, PDF417 uses unique start and stop patterns. The start pattern, or left end of the symbol, has the unique pattern, or x-sequence, of 81111113. The stop pattern, or right end of the symbol, has the unique x-sequence of 711311121. No character pattern other than start/stop uses more than six modules per element. For space termination, the stop pattern contains one extra bar (or ninth element), of width one, at the end.
Row StructureEach row consists of a start pattern, a left row indicator codeword, data codewords, a right row indicator codeword, and a stop pattern. Note that codewords are numbered from right to left.
Row indicators, discussed later, help synchronize the symbol's structure. They help define
which row a particular one is, which is the left and right side of that row, how many rows are in the symbol, what security level is encoded in the symbol, and how many data columns are in the rows.
Overall Symbol StructureAny PDF417 symbol is made up of at least 3 rows, and at most 90 rows. The minimum number of codewords in a row is 3; this includes the left row indicator codeword, 1 data codeword, and the right row indicator.
Every symbol contains 1 codeword (the first data codeword in row 0) indicating the total number of data codewords within the symbol, and at least 2 error-detection codewords.
High Level EncodingHigh Level Encoding
Clusters
Encoding Data
Modes
Text Compaction Mode
Byte Compaction Mode
Numeric Compaction Mode
Encoding Data in Numeric Compaction Mode
High Level EncodingNow that we have an idea of the physical layout of a PDF417 bar code, we need to know how to encode data.
All data is represented by codewords with values between 0 and 928. High-level encoding converts the data to be printed into corresponding codeword values. How we encode each codeword, described below, depends on what the codeword represents.
Low-level encoding, by contrast, involves physically converting the codeword values into their respective bar/space patterns.
PDF417 encodes values according to clusters, or mutually exclusive encodation sets. The bar-space pattern of each codeword depends on both the value to be encoded and the cluster used by that row.
ClustersThe entire set of 929 codewords in PDF417 is represented in three mutually exclusive symbol character sets, or clusters. Each cluster encodes the 929 available ‘PDF417' codewords into different bar-space patterns so that one cluster is distinct from another. The cluster numbers are 0, 3, 6. The cluster definition applies to all ‘PDF417' symbol characters, except for start and stop characters.
Each row uses only one of the three clusters (0, 3, or 6) to encode data, with the same cluster repeating sequentially every third row. Row 0 codewords use cluster 0, row 1 uses cluster 3, and row 2 uses cluster 6, etc. In general, cluster number = ((row number) mod 3) *3. To encode any codeword value, it is necessary first to know what cluster it must belong to.
Each cluster presents the 929 available values with distinct bar-space patterns so that one cluster cannot be confused with another. Because any two adjacent rows use different clusters, scan stitching can be used without confusing rows.
Encoding DataHigh level encoding converts the data characters into their corresponding codewords. Data compaction schemes are used to achieve high level encoding. Depending on data type, there are several ways to encode data in PDF417, which involve selecting a mode and
encoding codeword values.
ModesA mode is simply a method of compacting data. PDF417 provides three modes to encode data. In one PDF417 symbol it is possible to switch back and forth between modes as often as required.
The optimal mode for an application may be a combination of the three modes, each used in different portions of the symbol. By using mode latchesshifts you can move at will from mode to mode within the same symbol.
The three modes are defined below. Each mode defines a particular efficient mapping between user-defined data and codeword sequences:
Text Compaction mode
Byte Compaction mode
Numeric Compaction mode
900 codewords are available in each mode for data encodation and other functions within the mode. The remaining 29 codewords are assigned to specific functions independent of the current compaction mode.
Codewords 900 to 928 are assigned as function codewords as follows:
Switching between modes
Enhanced applications using Extended Channel Interpretations (ECIs)
Other enhanced applications
At present codewords 903 to 912 and 914 to 920 are reserved. Below is a table that defines the list of assigned and reserved function codewords.
Assignments of ‘PDF417' Function Codewords
Codeword Function
900 mode latch to Text Compaction mode
901 mode latch to Byte Compaction mode
902 mode latch to Numeric Compaction mode
903 to 912 reserved for future functions
913 mode shift to Byte Compaction mode
914 to 920 reserved for future functions
921 reader initialization
922 terminator codeword for Macro PDF control block
923 sequence tag to identify the beginning of optional
fields in the Macro PDF control block
924 mode latch to Byte Compaction mode (used differently from 901)
925 identifier for a user defined Extended Channel Interpretation (ECI)
926 identifier for a general purpose ECI format
927 identifier for an ECI of a character set or code page
928 Macro marker codeword to indicate the beginning of a Macro PDF Control Block
A mode latch codeword may be used to switch from the current mode to the indicated destination and remains in effect until another mode switch is explicitly brought into use. Codewords 900 to 902 and 924 are assigned to this function. Below is a table that defines the Mode Definition and Mode Switching Codewords function.
The mode shift codeword 913 causes a temporary switch from Text Compaction mode to Byte Compaction mode. This switch shall be in effect for only the next codeword, after which the mode shall revert to the prevailing sub-mode of the Text Compaction mode. Codeword 913 is only available in Text Compaction mode; its use is described in.
Mode Definition and Mode Switching Codewords
Destination Mode Mode Latch Mode Shift
Text Compaction 900
Byte Compaction 901/924 913
Numeric Compaction 902
‘PDF417' also supports the Extended Channel Interpretation system, which allows different interpretations of data to be accurately encoded in the symbol
To select the optimal mode, first select the mode that supports the characters that must be encoded. Next, when there is a choice among modes, consider the “efficiency” of the mode. This involves answering two questions:
1. Per mode, how many characters per codeword may be encoded?
2. If mode switching seems promising, what additional overhead (e.g., shift, latch characters) is needed, or is it more efficient to use one mode only?
Each available mode in PDF417 is described in the following sections. The issue of “efficiency” will become clear as we discuss each mode’s advantages and limitations.
Text Compaction ModeText Compaction mode permits all printable ASCII characters to be encoded, i.e. values 32
- 126 inclusive in accordance with ISO/IEC 646, as well as selected control characters.
The Text Compaction Mode has four sub-modes:
• Alpha (uppercase alphabetic)
• Lower (lowercase alphabetic)
• Mixed (numeric and some punctuation)
• Punctuation
Each sub-mode contains 30 characters, including sub-mode latch and shift characters.
The default compaction mode for ‘PDF417' in effect at the start of each symbol is always Text Compaction Mode Alpha sub-mode (uppercase alphabetic). A latch codeword from another mode to the Text Compaction mode will always switch to the Text Compaction Alpha sub-mode.
Byte Compaction ModeByte Compaction mode permits all 256 possible 8-bit byte values to be encoded. This includes all ASCII characters value 0 to 127 inclusive and provides for international character set support.
The Bite Compaction Mode lets you encode 256 international characters, including the full ASCII set, as well as any 8-bit value from 0 to 255. This mode encodes about 1.2 bytes per codeword. In terms of the breadth of encodable data, it is a powerful mode; in terms of the resulting size of the printed symbol, it is the least efficient mode.
Encoding Data in Byte Compaction Mode
Switching to Byte Compaction Mode
Compaction Rules for Encoding Longer Byte Compaction Character Strings (Using Mode Latch 924 or 901)
Do the Math
Encoding Data in Byte Compaction ModeThe Byte Compaction mode enables a sequence of 8-bit bytes to be encoded into a sequence of codewords. It is accomplished by a Base 256 to Base 900 conversion, which achieves a compaction ratio of six Byte Compaction characters to 5 codewords (1,2:1).
All the characters and their values (0 to 255) are defined in ______?. This is the default graphical and control character interpretation. When ECIs are invoked this interpretation may be defined as either ECI 000000 or ECI 000002.
Switching to Byte Compaction Mode Because the default mode for ‘PDF417' is Text Compaction mode, to switch to Byte Compaction mode, it is necessary to use one of the following codewords:
• mode latch 924 shall be used when the total number of Byte Compaction characters to be encoded is an integer multiple of 6
• mode latch 901 shall be used when the total number of Byte Compaction characters to be encoded is not a multiple of 6
• mode shift 913 can be used instead of codeword 901 when a single Byte Compaction character has to be encoded
If there is a need to encode only one Binary/ASCII Plus digit (8-bit), the Binary/ASCII Plus mode shift 913 may be used. When mode shift 913 is used, the next codeword is treated as a single 8 bit value.
Compaction Rules for Encoding Longer Byte Compaction Character Strings (Using Mode Latch 924 or 901) The following procedure shall be used to encode Byte Compaction character data:
1. Establish the total number of Byte Compaction characters.
2. If a perfect multiple of 6, mode latch 924 shall be used; or else mode latch 901 shall be used.
3. Sub-divide the number of Byte Compaction characters into a sequence of 6 characters, from left to right (the most to least significant characters). If less than 6 characters go to Step 7.
4. Assign the decimal values of the 6 data bytes to be encoded in Byte Compaction mode as b5 to b0 (where b5 is the first data byte).
5. Carry out a base 256 to base 900 conversion to produce a sequence of 5 codewords. Annex
Do the MathEncoding binary data using latches 901/924 requires converting the data from base 256 to base 900. The encoding process, specified by latch codeword 924, takes 6 base 256 numbers or values at a time, converting them to 5 base 900 codewords. The process continues until no more numbers or values need to be encoded.
If the number of digits to encode is not a multiple of 6, use mode 901. This just converts each group of 6 digits using the above algorithm and appends any remaining digits using 1 codeword per digit.
The following algorithm performs a base 256 to base 900 conversion:
To calculate the codewords, we must first define the following:
n = number of codewords (In this case, it is always 5.)
t = temporary variable
We then calculate t as follows:
t = d5*2565 + d4*2564 + d3*2563 + d2*2562 + d1*2561 + d0*2560
Next, we calculate each codeword as follows:
For each codeword ci = c0 ... cn-1
BEGIN
ci = t mod 900
t = t div 900
END
Where:
ci = codeword n
Remember: mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.
div is the integer division operator
For example:
Encode the numbers {1,2,3,4,5,6}
Calculate sum of data bytes to encode:
t = 1*2565 + 2*2564 + 3*2563 + 4*2562 + 5*2561 + 6*2560
= 1108152157446
Calculate codeword 0
c0 = 1108152157446 mod 900 = 846
t = 1108152157446 div 900 = 1231280174
Calculate codeword 1
c1 = 1231280174 mod 900 = 74
t = 1231280174 div 900 = 1368089
Calculate codeword 2
c2 = 1368089 mod 900 = 89
t = 1368089 div 900 = 1520
Calculate codeword 3
c3 = 1520 mod 900 = 620
t = 1520 div 900 = 1
Calculate codeword 4 (last codeword)
c4 = 1 mod 900 = 1
t = 1 div 900 = 0
The codeword sequence is (including Binary/ASCII Plus mode latch 924): 924,1,620,89,74,846
Example:
Encode the numbers 1,2,3,4,5,6,7,8,4
The mode shift character 901 is used since the number of digits to encode is not a multiple of 6. The above results apply for the first 6 digits. The remaining digits are appended as codewords, in order of their occurrence.
The codeword sequence is: 901,1,620,89,74,846,7,8,4
Numeric Compaction Mode The Numeric Compaction mode is a method for base 10 to base 900 data compaction and
should be used to encode long strings of consecutive numeric digits. The Numeric Compaction mode encodes up to 2.93 numeric digits per codeword.
Compaction Rules for Encoding Long Strings of Consecutive Numeric Digits
Latch to Numeric Compaction Mode
Encoding Data in Numeric Mode
Switching from Numeric Compaction Mode
Latch to Numeric Compaction Mode Numeric Compaction mode may be invoked when in Text Compaction or Byte Compaction modes using mode latch 902.
Encoding Data in Numeric Mode
Compaction Rules for Encoding Long Strings of Consecutive Numeric Digits
The following procedure is used to compact numeric data:
1. Divide the string of digits into groups of 44 digits. The last group may contain fewer digits.
2. For each group add the digit 1 to the most significant position to prevent the loss of leading zeros.
EXAMPLE:
original data 00246812345678
after step 2 100246812345678
NOTE: The leading digit 1 is removed in the decode algorithm.
Perform a base 10 to base 900 conversion as follows:
Starting from least significant codeword 0, calculate the following:
For each codeword ci = c0 ... cn-1
BEGIN
ci = x mod 900
x = x div 900
If x = 0, then
stop encoding
END
where:
x = numeric string to encode, preceded by number 1
Remember: mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.
div is the integer division operator.
Repeat from Step 2 as necessary.
For example:
Encode the numeric string 00021329000.
First precede the string by the number one:
100021329000
Next use the conversion algorithm to produce the desired codewords:
Start by setting x = 100021329000
Calculate codeword 0
c0 = 100021329000 mod 900 = 0
x = 100021329000 div 900 = 111134810
Calculate codeword 1
c1 = 111134810 mod 900 = 110
x = 111134810 div 900 = 123483
Calculate codeword 2
c2 = 123483 mod 900 = 183
x = 123483 div 900 = 137
Calculate codeword 3
c3 = 137 mod 900 = 137
x = 137 div 900 = 0
The codeword sequence then is (including numeric mode latch 902): 902,137,183,110, 0
The following rules can be used to determine the precise number of codewords in Numeric Compaction mode:
• Groups of 44 numeric digits compact to 15 codewords.
• For groups of shorter sequences of digits, the number of codewords can be calculated as follows:
Codewords = INT (number of digits / 3) +1
EXAMPLE:
For a 28 digit sequence
INT (28 / 3) + 1
= 9 + 1
= 10 codewords
Switching from Numeric Compaction Mode
Numeric Compaction mode may be terminated by the end of the symbol, or by any of the following codewords:
• 900 (Text Compaction mode latch)
• 901 (Byte Compaction mode latch)
• 902 (Numeric Compaction mode latch)
• 924 (Byte Compaction mode latch)
• 928 (Beginning of Macro ‘PDF417' Control Block)
• 923 (Beginning of Macro ‘PDF417' Optional Field)
• 922 (Macro ‘PDF417' Terminator)
The latter three codewords only occur within the Macro ‘PDF417' Control Block of a Macro ‘PDF417' symbol. Numeric Compaction mode is also affected by the presence of a reserved codeword.
Re-invoking Numeric Compaction mode (by using codeword 902 while in Numeric Compaction mode) serves to terminate the current Numeric Compaction mode grouping as described in??, and then to start a new grouping. This procedure may be necessary when an ECI assignment number needs to be encoded.
During the decode process for Numeric Compaction mode, the result of the base 900 to base 10 conversion results in a number whose most significant digit is a ‘1'. Otherwise, the symbol is invalid. The leading ‘1' is removed to produce the original number.
Encoding the Left and Right Row Indicators
The row indicators in a PDF417 symbol contain several key components: row number, number of rows, security level, and number of data columns.
Not every row indicator contains every component. The information is spread over several rows, and the pattern repeats itself every three rows. This pattern, the spreading and repeating below, makes the symbol as robust as possible.
Row 0: Left R.I. (Row #, # of Rows) Right R.I. (Row #, # of Columns)
Row 1: Left R.I. (Row #, Security Level) Right R.I. (Row #, # of Rows)
Row 2: Left R.I. (Row #, # of Columns) Right R.I. (Row #, Security Level)
For example, row 3's (or 6’s, 9’s, 12’s, etc.) right row indicator is calculated the same way as row 0's right row indicator; row 3's (or 6’s, 9’s, 12’s, etc.) left row indicator is calculated the same way as row 0's left row indicator, etc.
To make this clear, formulas and specific examples follow.
Left Row IndicatorsLeft row indicators are calculated as follows:
Row 0 30 * (row number div 3) + ((number of rows - 1) div 3)
Row 1 30 * (row number div 3) + security level * 3 + (number of rows - 1) mod 3
Row 2 30 * (row number div 3) + (number of columns - 1)
Notes: div is the integer division operator
mod is the integer remainder after division. If the remainder is negative, take the complement to get the correct result. For example, the remainder of -29160 divided by 929 is -361. The complement is 929 - 361, or 568.
Error Detection/Correction EncodingOne of PDF417's primary features is error detection and error correction. Each label has 2 codewords of error detection. The error correction capacity may be selected by the user, based on application needs, when the label is to be generated.
Error detection requires two error-detection codewords per symbol. The error correction scheme involves flexibility and user selection; it compensates for label defects and misdecodes.
At the time of printing a PDF417 bar code, the user specifies a security level in the range 0 - 8. This means, for example, that at level 6, a total of 126 codewords can be either missing or destroyed and the entire label can be read and decoded; up to that limit, for security level 6, the contents of the entire label can be recovered.
In addition to correcting for missing or destroyed data, PDF417 also can recover from misdecodes of codewords. Since it requires two codewords to recover from a misdecode, one to detect the error and one to correct for it, a given security level can support half the number of misdecodes that it can of undecoded codewords. Erasures + 2(Misdecodes) < Maximum Limit.
Recommended Error Correction Level The minimum level of error correction level should be as defined in Table E.1.
Table E.1: Recommended Error Correction Level
Number ofData Codewords
Minimum ErrorCorrection
Level
1 to 40 2
41 to 160 3
161 to 320 4
321 to 863 5
As a guide to using Table E.1, it should be noted that for normal data encodation in Text Compaction mode, a data codeword typically encodes 1,8 text characters, and in Numeric Compaction mode, a data codeword typically encodes 2,93 digits.
Higher levels of error correction should be used where significant symbol damage or degradation is anticipated. Lower than recommended error correction levels may be used in closed system applications.
Other User Consideration of the Error Correction Level
The objective in an application standard should be to make use of the features of error correction without sacrificing the data content capacity.
The following factors should be taken into account by the user in selecting an error correction level:
The recommended error correction level (see Table E.1) should be followed.
Since the maximum number of data codewords per symbol is fixed at 925, large numbers of data codewords limit the maximum level of error correction that can be implemented. More than 415 data codewords precludes Error Correction Level 8. More than 671 data codewords precludes Levels 7 and 8. More than 799 data codewords precludes Levels 6, 7 and 8.
Where ‘PDF417' symbols are likely to have missing or totally obliterated codewords, the Error Correction Level may be increased up to Error Correction level 8, or up to a level where the number of error correction codewords fills the maximum sized matrix appropriate for the application.
Instead of adopting a high error correction level it may be simpler, in the application specification, to specify particular substrates and materials which can protect the ‘PDF417' symbol.
Error Correction should not be used as a substitute for achieving adequate quality bar code symbols designed for the application or to compensate for a poor printing process.
The relation of security level to error correction capability is as follows:
Security Level Maximum Limit of E + 2(M)*
0 0
1 2
2 6
3 14
4 30
5 62
Security Level Maximum Limit of E + 2(M)*
6 126
7 254
8 510
* E = Erasures (Undecodable); M = Misdecodes
Calculating error correction/detection codewords is based on all data codewords except for the row indicators.
To calculate these codewords, we must first define the following:
n = number of codewords
s = user selectable security level
k = number of error detection/correction codewords or 2s+1
gk(x) = generator polynomial (x - 3)(x-32) ... (x-3k), where x is the unknown variable
ak = coefficient of powers of x taken from the generator polynomial gk(x)
ck = error detection/correction codeword* c0 .. ck-1
dn = data codeword d0 .. dn-1
t1, t2, t3 = temporary variables
* Notations c0 and c1 are used for error detection, while c2 to ck-1 are used for error correction, where k varies according to security level (s).
The generator polynomial coefficients are calculated as follows:
First, expand the equation into powers of x:
gk(x) = (x - 3)(x-32) ... (x-3k)
= ak0 + ak1x + ... + ak-1x k-1 + xk
Next, calculate the complement of the coefficients from above:
For ai = a0 ... ak-1
BEGIN
ai = ai mod 929
END
Example:
Calculate generator polynomial coefficients for security level 1.
s = 1 security level 1
k = 2s+1 = 4 number of error detection/correction codewords
g4(x) = (x-3) (x-32) (x-33) (x-34)
= 59049 - 29160x + 3510x2 - 120x3 + x4
a0 = 59049 mod 929 = 522
a1 = -29160 mod 929 = 568
a2 = 3510 mod 929 = 723
a3 = -120 mod 929 = 809
Note: Error Detection/Correction Coefficients contains all of the coefficient tables necessary to encode a PDF417 label with any security level.
The error detection/correction codewords are calculated as follows:
Initialize error detection/correction codewords c0, ..., ck-1 to 0.
Initialize temporary variables t1,t2, and t3 to 0.
For each codeword di = dn-1 ... d0,
BEGIN
t1 = (di + ck-1) mod 929
For each error detection/correction codeword cj = ck-1 ... c1,
BEGIN
t2 = (t1 * aj) mod 929
t3 = 929 - t2
cj = (cj-1 + t3) mod 929
END
t2 = (t1 * a0) mod 929
t3 = 929 - t2
c0 = t3 mod 929
END
Calculate the complement for each error detection/correction
codeword cj = c0 ... ck-1
BEGIN
if cj not 0
cj = 929 - cj
END
Example:
The string PDF417 represented as codewords is: 453,178,121,239
Next, precede the codewords by the symbol length descriptor, which is 5.
5,453,178,121,239
Then,
n = 5 number of codewords (including symbol length descriptor)
d4 = 5
d3 = 453
d2 = 178
d1 = 121
d0 = 239
Let's use a security level of 1. So,
s = 1
k = 2s+1 = 4
a0, ... ,a3 = 522,568,723,809
The calculations are:
Initialize c0, ... ,c3 to 0
t1 = (d4 + c3) mod 929 = (5 + 0) mod 929 = 5
t2 = (t1 * a3) mod 929 = (5 * 809) mod 929 = 329
t3 = 929 - t2 = 929 - 329 = 600
c3 = (c2 + t3) mod 929 = (0 + 600) mod 929 = 600
t2 = (t1 * a2) mod 929 = (5 * 723) mod 929 = 828
t3 = 929 - t2 = 929 - 828 = 101
c2 = (c1 + t3) mod 929 = (0 + 101) mod 929 = 101
t2 = (t1 * a1) mod 929 = (5 * 568) mod 929 = 53
t3 = 929 - t2 = 929 - 53 = 876
c1 = (c0 + t3) mod 929 = (0 + 876) mod 929 = 876
t2 = (t1 * a0) mod 929 = (5 * 522) mod 929 = 752
t3 = 929 - t2 = 929 - 752 = 177
c0 = t3 mod 929 = 177 mod 929 = 177
t1 = (d3 + c3) mod 929 = (453 + 600) mod 929 = 124
t2 = (t1 * a3) mod 929 = (124 * 809) mod 929 = 913
t3 = 929 - t2 = 929 - 913 = 16
c3 = (c2 + t3) mod 929 = (101 + 16) mod 929 = 117
t2 = (t1 * a2) mod 929 = (124 * 723) mod 929 = 468
t3 = 929 - t2 = 929 - 468 = 461
c2 = (c1 + t3) mod 929 = (876 + 461) mod 929 = 408
t2 = (t1 * a1) mod 929 = (124 * 568) mod 929 = 757
t3 = 929 - t2 = 929 - 757 = 172
c1 = (c0 + t3) mod 929 = (177 + 172) mod 929 = 349
t2 = (t1 * a0) mod 929 = (124 * 522) mod 929 = 627
t3 = 929 - t2 = 929 - 627 = 302
c0 = t3 mod 929 = 302 mod 929 = 302
t1 = (d2 + c3) mod 929 = (178 + 117) mod 929 = 295
t2 = (t1 * a3) mod 929 = (295 * 809) mod 929 = 831
t3 = 929 - t2 = 929 - 831 = 98
c3 = (c2 + t3) mod 929 = (408 + 98) mod 929 = 506
t2 = (t1 * a2) mod 929 = (295 * 723) mod 929 = 544
t3 = 929 - t2 = 929 - 544 = 385
c2 = (c1 + t3) mod 929 = (349 + 385) mod 929 = 734
t2 = (t1 * a1) mod 929 = (295 * 568) mod 929 = 340
t3 = 929 - t2 = 929 - 340 = 589
c1 = (c0 + t3) mod 929 = (302 + 589) mod 929 = 891
t2 = (t1 * a0) mod 929 = (295 * 522) mod 929 = 705
t3 = 929 - t2 = 929 - 705 = 224
c0 = t3 mod 929 = 224 mod 929 = 224
t1 = (d1 + c3) mod 929 = (121 + 506) mod 929 = 627
t2 = (t1 * a3) mod 929 = (627 * 809) mod 929 = 9
t3 = 929 - t2 = 929 - 9 = 920
c3 = (c2 + t3) mod 929 = (734 + 920) mod 929 = 725
t2 = (t1 * a2) mod 929 = (627 * 723) mod 929 = 898
t3 = 929 - t2 = 929 - 898 = 31
c2 = (c1 + t3) mod 929 = (891 + 31) mod 929 = 922
t2 = (t1 * a1) mod 929 = (627 * 568) mod 929 = 329
t3 = 929 - t2 = 929 - 329 = 600
c1 = (c0 + t3) mod 929 = (224 + 600) mod 929 = 824
t2 = (t1 * a0) mod 929 = (627 * 522) mod 929 = 286
t3 = 929 - t2 = 929 - 286 = 643
c0 = t3 mod 929 = 643 mod 929 = 643
t1 = (d0 + c3) mod 929 = (239 + 725) mod 929 = 35
t2 = (t1 * a3) mod 929 = (35 * 809) mod 929 = 445
t3 = 929 - t2 = 929 - 445 = 484
c3 = (c2 + t3) mod 929 = (922 + 484) mod 929 = 477
t2 = (t1 * a2) mod 929 = (35 * 723) mod 929 = 222
t3 = 929 - t2 = 929 - 222 = 707
c2 = (c1 + t3) mod 929 = (824 + 707) mod 929 = 602
t2 = (t1 * a1) mod 929 = (35 * 568) mod 929 = 371
t3 = 929 - t2 = 929 - 371 = 558
c1 = (c0 + t3) mod 929 = (643 + 558) mod 929 = 272
t2 = (t1 * a0) mod 929 = (35 * 522) mod 929 = 619
t3 = 929 - t2 = 929 - 619 = 310
c0 = t3 mod 929 = 310 mod 929 = 310
Finally calculate the complement of the results from above, to get the 4 error detection/correction codewords for the symbol PDF417 as follows:
c3 = 929 - c3 = 929 - 477 = 452
c2 = 929 - c2 = 929 - 602 = 327
c1 = 929 - c1 = 929 - 272 = 657
c0 = 929 - c0 = 929 - 310 = 619
Coefficient Tables for PDF417 Error Correction CalculationCoefficient Table for Security Level 0
Coefficient Table for Security Level 1
Coefficient Table for Security Level 2
Coefficient Table for Security Level 3
Coefficient Table for Security Level 4
Coefficient Table for Security Level 5
Coefficient Table for Security Level 6
Coefficient Table for Security Level 7
Coefficient Table for Security Level 8
Coefficient Table for Security Level 0
27 917
Coefficient Table for Security Level 1
522 568 723 809
Coefficient Table for Security Level 2
237 308 436 284 646 653 428 379
Coefficient Table for Security Level 3
274 562 232 755 599 524 801 132 295 116 442 428 295 42 176 65
Coefficient Table for Security Level 4
361 575 922 525 176 586 640 321 536 742 677 742 687 284 193 517
273 494 263 147 593 800 571 320 803 133 231 390 685 330 63 410
Coefficient Table for Security Level 5
539 422 6 93 862 771 453 106 610 287 107 505 733 877 381 612
723 476 462 172 430 609 858 822 543 376 511 400 672 762 283 184
440 35 519 31 460 594 225 535 517 352 605 158 651 201 488 502
648 733 717 83 404 97 280 771 840 629 4 381 843 623 264 543
Coefficient Table for Security Level 6
521 310 864 547 858 580 296 379 53 779 897 444 400 925 749 415
822 93 217 208 928 244 583 620 246 148 447 631 292 908 490 704
516 258 457 907 594 723 674 292 272 96 684 432 686 606 860 569
193 219 129 186 236 287 192 775 278 173 40 379 712 463 646 776
171 491 297 763 156 732 95 270 447 90 507 48 228 821 808 898
784 663 627 378 382 262 380 602 754 336 89 614 87 432 670 616
157 374 242 726 600 269 375 898 845 454 354 130 814 587 804 34
211 330 539 297 827 865 37 517 834 315 550 86 801 4 108 539
Coefficient Table for Security Level 7
524 894 75 766 882 857 74 204 82 586 708 250 905 786 138 720
858 194 311 913 275 190 375 850 438 733 194 280 201 280 828 757
710 814 919 89 68 569 11 204 796 605 540 913 801 700 799 137
439 418 592 668 353 859 370 694 325 240 216 257 284 549 209 884
315 70 329 793 490 274 877 162 749 812 684 461 334 376 849 521
307 291 803 712 19 358 399 908 103 511 51 8 517 225 289 470
637 731 66 255 917 269 463 830 730 433 848 585 136 538 906 90
2 290 743 199 655 903 329 49 802 580 355 588 188 462 10 134
628 320 479 130 739 71 263 318 374 601 192 605 142 673 687 234
722 384 177 752 607 640 455 193 689 707 805 641 48 60 732 621
895 544 261 852 655 309 697 755 756 60 231 773 434 421 726 528
503 118 49 795 32 144 500 238 836 394 280 566 319 9 647 550
73 914 342 126 32 681 331 792 620 60 609 441 180 791 893 754
605 383 228 749 760 213 54 297 134 54 834 299 922 191 910 532
609 829 189 20 167 29 872 449 83 402 41 656 505 579 481 173
404 251 688 95 497 555 642 543 307 159 924 558 648 55 497 10
Coefficient Table for Security Level 8
352 77 373 504 35 599 428 207 409 574 118 498 285 380 350 492
197 265 920 155 914 299 229 643 294 871 306 88 87 193 352 781
846 75 327 520 435 543 203 666 249 346 781 621 640 268 794 534
539 781 408 390 644 102 476 499 290 632 545 37 858 916 552 41
542 289 122 272 383 800 485 98 752 472 761 107 784 860 658 741
290 204 681 407 855 85 99 62 482 180 20 297 451 593 913 142
808 684 287 536 561 76 653 899 729 567 744 390 513 192 516 258
240 518 794 395 768 848 51 610 384 168 190 826 328 596 786 303
570 381 415 641 156 237 151 429 531 207 676 710 89 168 304 402
40 708 575 162 864 229 65 861 841 512 164 477 221 92 358 785
288 357 850 836 827 736 707 94 8 494 114 521 2 499 851 543
152 729 771 95 248 361 578 323 856 797 289 51 684 466 533 820
669 45 902 452 167 342 244 173 35 463 651 51 699 591 452 578
37 124 298 332 552 43 427 119 662 777 475 850 764 364 578 911
283 711 472 420 245 288 594 394 511 327 589 777 699 688 43 408
842 383 721 521 560 644 714 559 62 145 873 663 713 159 672 729
624 59 193 417 158 209 563 564 343 693 109 608 563 365 181 772
677 310 248 353 708 410 579 870 617 841 632 860 289 536 35 777
618 586 424 833 77 597 346 269 757 632 695 751 331 247 184 45
787 680 18 66 407 369 54 492 228 613 830 922 437 519 644 905
789 420 305 441 207 300 892 827 141 537 381 662 513 56 252 341
242 797 838 837 720 224 307 631 61 87 560 310 756 665 397 808
851 309 473 795 378 31 647 915 459 806 590 731 425 216 548 249
321 881 699 535 673 782 210 815 905 303 843 922 281 73 469 791
660 162 498 308 155 422 907 817 187 62 16 425 535 336 286 437
375 273 610 296 183 923 116 667 751 353 62 366 691 379 687 842
37 357 720 742 330 5 39 923 311 424 242 749 321 54 669 316
342 299 534 105 667 488 640 672 576 540 316 486 721 610 46 656
447 171 616 464 190 531 297 321 762 752 533 175 134 14 381 433
717 45 111 20 596 284 736 138 646 411 877 669 141 919 45 780
407 164 332 899 165 726 600 325 498 655 357 752 768 223 849 647
63 310 863 251 366 304 282 738 675 410 389 244 31 121 303 263
Putting It All TogetherLet’s now take all of the data from the other sections, to create an entire PDF417 label.
From the examples, we encoded the string PDF417. The following codewords were calculated:
Data codewords d4 ,..., d0: 5,453,178,121,239
Error Detection/Correction Codewords c3 ,..., c0: 452,327,657,619
Now we need to decide what type of aspect ratio is used for our label. Clearly, this is a user’s decision; the printer simply follows what the user specifies.
Since we have 9 codewords, we can arrange the bar code to be 3 rows of 3 codewords, or 9 rows of 1 codeword. If we need some other arrangement, we would have to pad the data codewords with null codewords (unused shift or latch characters at the end of the data).
Let’s make it 3 rows of 3 codewords.
The bar code layout would be:
Start Pattern
Lr Data Data Data Rr
Stop PatternLr Data Data ECC Rr
Lr ECC ECC ECC Rr
Where Lr = Left Row Indicators and Rr = Right Row Indicators and
ECC = Error Correction Codewords
From all of the sample calculations above, take the data, row indicators, and error detection/correction codewords. Substitute into the above layout to get:
Start Pattern
0 5 453 178 2
Stop Pattern5 121 239 452 0
2 327 657 619 5
Now the codewords must be converted to bar-space patterns (low-level encoding).
81111113
31111136 51111251 31312223 51211142 51111152
71131112141111315 41131115 42113231 22163111 51111125
11111246 12211532 32411132 12361121 11111345
The entire set of codewords is divided into three mutually exclusive encodation sets, or clusters. Each cluster encodes the 929 available PDF417 codewords with distinct bar-space patterns so that one cluster cannot be confused with another. A codeword must be encoded according to the cluster used by that row.
Each row uses only one of the three clusters (0, 3, or 6) to encode data, with the same cluster repeating sequentially every third row. Row 0 codewords use cluster 0, row 1 uses cluster 3, and row 2 uses cluster 6, etc. In general, cluster number = ((row number) mod 3) *3.
Look up each value from the appropriate cluster tables, add the start pattern 81111113 to
the left of the left row indicators, and add the stop pattern 711311121 to the right of the right row indicators.
The bar code bar space patterns would then be:
Codewords 5 453 178 121 239 452 327 657 619
Patterns 51111251 31312223 51211142 41131115 42113231 22163111 12211532 32411132 12361121
The resulting bar code itself is:
Definitions
A
B
Basic Channel Model: A standard system for encoding and transmitting bar code data where data message bytes are output from the decoder but no control information about the message is transmitted.. A decoder, complying to this model, operates in Basic Channel Mode
C
Cluster: One of three subsets of ‘PDF417' symbol characters, all of which are mutually exclusive. The symbol characters in a given cluster conform to particular structural rules which are used in decoding the symbology.
Codeword:A single group of bars and spaces (or "elements") representing one or more numbers, letters, or other symbols (i.e., codeword values for the data to be encoded). Each PDF417 codeword contains four bars and four spaces, for a total of 17 module widths. Each codeword starts with a bar and ends with a space.
Compaction Mode:The name given to one of three data compaction algorithms in ‘PDF417': Text, Numeric and Byte Compaction modes. These modes efficiently map 8-bit data bytes into ‘PDF417' codewords.
D
E
Error Correction:Mathematical calculations used to reconstruct undecoded or corrupted portions of a PDF417 symbol.
Error Correction Codeword:A codeword in a symbol which encodes a value derived from the error correction codeword algorithm to enable decode errors to be detected and, depending on the error correction level, to be corrected.
Extended Channel Interpretation:A procedure within some symbologies, including ‘PDF417', to replace the default interpretation with another interpretation in a reliable manner. The interpretation intended prior to producing the symbol can be retrieved after decoding the scanned symbol to recreate the data message in its original format.
Extended Channel Model:A system for encoding and transmitting both data message bytes and control information about the message. A decoder, complying to this model, operates in Extended Channel Mode. The control information is communicated using Extended Channel Interpretation (ECI) escape sequences.
F
Function Codeword: A codeword in a symbology which initiates a particular operation within the symbology, for example to switch between data encoding sets, to invoke a compaction scheme, to program the reader, to invoke Extended Channel Interpretations.
G
Global Label Identifier: A procedure in the ‘PDF417' symbology, which behaves in a similar manner to Extended Channel Interpretation. The GLI system was the symbology-dependent precursor to the symbology-independent ECI system.
H
I
J
K
L
Latch Character:A latch character causes a mode switch to another mode. That mode stays active until another latch or shift character occurs and activates a new mode.
M
Macro ‘PDF417': A procedure within the ‘PDF417' symbology to logically distribute data from a computer file across a number of related ‘PDF417' symbols. The procedure considerably extends the data capacity beyond that of a single symbol.
Micro PDF417:
Modes:Compaction schemes used to reduce the overall size of the PDF417 symbol.
Mode Latch Codeword: A codeword in ‘PDF417' which is used to switch from one mode to another mode, which stays in effect until another latch or shift codeword is implicitly or explicitly brought into use, or until the end of the label is reached.
Mode Shift Codeword: A codeword in ‘PDF417' that is used to switch from one mode to another for only one codeword.
Module:The nominal unit of measure that is the narrowest width of a bar or space in a bar code. In a PDF417 bar code, all bars or spaces are multiples (up to six times) of this width.
N
O
P
Q
R
Row:A lateral set of elements made up of a start pattern, codewords, and a stop pattern. Each PDF417 symbol must have at least 3 rows. In each row, between left and right row indicators, there may be from 1 to 30 data codewords. Collectively, among all rows, these codewords form data columns.
Row Indicator Codeword: A ‘PDF417' codeword adjacent to the start or stop character in a row, which encodes information about the structure of the ‘PDF417' symbol in terms of the row identification, total number of rows and columns, and the error correction level.
S
Security Level:The level of error correction used in the PDF417 symbol.
Shift CharacterA shift character causes a mode switch for only one character or codeword, returning immediately afterward to the previous mode.
Start pattern:A unique pattern of light and dark elements that indicates the leftmost part of a bar code label.
Stop pattern:A unique pattern of light and dark elements that indicates the rightmost part of a bar code label.
Symbol Aspect Ratio:The ratio of the height and width of a PDF417 symbol.
Symbol Length Descriptor: The codeword in a ‘PDF417' symbol that encodes the total number of data codewords in the symbol. The symbol length descriptor is always the first codeword in a ‘PDF417' symbol.
T
U
V
W
X
X-dimension:The width of the narrowest bar or space in a bar code symbol.
X-sequence: The sequence that represents the module widths of the elements of a symbol character.
Y
Y-dimension:The height of a row in PDF417.