Upload
vodang
View
226
Download
2
Embed Size (px)
Citation preview
ISE HOT Data User Guide
1
International Securities Exchange (ISE)
Historical Option Tick (HOT) Data User Guide
International Securities Exchange
60 Broad Street
New York, NY 10004
June 21, 2011– v5
ISE HOT Data User Guide
2
ISE HOT Data User Guide
3
Contents 1. Product Definition............................................................................................................. 5
2. OPRA Background and Overview of Data Distribution ............................................................ 5
3. Tick Collection Process ...................................................................................................... 5
4. Access to Daily Files ......................................................................................................... 6
5. Access to Historical Files ................................................................................................... 6
6. Data Maintenance ............................................................................................................. 7
7. ISE Support – 877 473-9989 ............................................................................................. 7
8. Unplanned Interruptions of Service .................................................................................... 7
9. History of Changes to OPRA Data and the ISE Options Data set ............................................. 7
10. OPRA Data Distribution .................................................................................................. 8
11. OPRA Data Formats ..................................................................................................... 12
11.1 ASCII OPRA Format – Beginning April 18, 2008 ............................................................. 12
11.2 OPRA FAST Encoding .................................................................................................. 13
11.3 OPRA FAST 2.0 Encoding ............................................................................................. 13
12. Option Symbology and Mapping from Underlying to Option .............................................. 13
12.1 Option Root Symbol ................................................................................................... 14
12.2 Option Expiration Month Code ...................................................................................... 14
12.3 Expiration Day of Month .............................................................................................. 15
12.4 Strike Price Code ..................................................................................................... 15
12.5 Explicit Strike Price ..................................................................................................... 16
13. Options Symbology Initiative (OSI) ............................................................................... 16
14. Comparison of Pre and Post OSI Symbols ....................................................................... 17
15. Processing the OPRA Tick Data ..................................................................................... 17
15.1 Hot Data Daily Retransmission Files .............................................................................. 18
16. Steps for Processing Historical OPRA Tick Data ............................................................... 19
16.1 User Name and Password ............................................................................................ 19
16.2 Download .................................................................................................................. 19
16.3 Integrity Check .......................................................................................................... 19
16.4 OPRA Data Formats .................................................................................................... 20
16.5 Decoding OPRA ASCII FAST – As of April 24, 2008 ......................................................... 20
16.6 Processing OPRA ASCII ............................................................................................... 20
17. Trouble Shooting FTP Access Queries ............................................................................. 20
ISE HOT Data User Guide
4
18. Frequently Asked Questions ......................................................................................... 22
Appendix A: Ensuring the integrity of data files ........................................................................ 25
Appendix B: Working with the OPRA FAST Decoder for Unix/Linux Users ..................................... 26
Appendix C - Working with the OPRA FAST Decoder for Windows Users ...................................... 27
This document was produced specifically for HOT Data subscribers to provide
background on the data set and how to make use of the files. The authors are:
Richard Holowczak, Ph.D.
Associate Professor Computer Information Systems
Baruch College, City University of New York
Jeff Soule
Former Head of Market Data
International Securities Exchange
ISE HOT Data User Guide
5
1. Product Definition
The ISE HOT Data set provides historical Options Price Reporting Authority (OPRA) data which
includes trades and quotes for all US listed equities, index and ETFs options from June 1, 2005 to the
present. This offering consists of two separate products or daily files that include either just the end
of day summary file or a file with the full day of tick data for each option series.
The full tick data file for each option series includes all trades and quotes from all option exchanges
and the OPRA designated National Best Bid and Offer (NBBO). The end of day summary file is a
much smaller subset of data and includes summary details for each series, such as price ranges and
volume for the day from each exchange. Please refer to Section 7, Field Descriptions, for a complete
list of available data fields at www.opradata.com/specs/data_recipient_interface.pdf.
2. OPRA Background and Overview of Data Distribution
The US market for listed equity, ETF and index options consists of nine exchanges as of January
2011. Each exchange manages quotes and orders from their market makers, brokers and other
members. Each exchange is responsible for computing their best bid offer (BBO) in real-time and
sending that data to the Securities Industry Automation Corp (SIAC) where the data are merged to
form what is commonly called the Options Price Reporting Authority (OPRA) Feed. SIAC also tags a
bid and ask price of each series with a code indicating if the price is currently the national best bid or
offer (NBBO). The logical flow of the market data is shown in Figure 1.
Figure 1 Logical flow of market data from exchanges to OPRA to redistributors
3. Tick Collection Process
The ISE, one of the largest equities options exchange in the world, is responsible for the creation of
the historical files that are made available with this offering. The ISE has two fully redundant data
centers with separate and distinct connections to the real-time OPRA feed.
ISE HOT Data User Guide
6
The ISE has a number of servers that listen to the OPRA real-time feed and captures the real-time
data across the redundant infrastructures. At the end of every day the ISE compresses the OPRA
files and uploads them to separate FTP servers at each data center. The tick collection process runs
on all weekdays except for US scheduled exchange holidays.
While the tick collection process is running each day, there is a separate process that monitors the
OPRA sequence numbers and identifies any gaps or missing sequence numbers. At the end of every
day another process runs that summarizes the gaps and then makes a request to SIAC for the
missing messages which are sent back to the ISE as retransmission files.
The sizes of daily files vary so it would be impossible to provide an exact time each day when the
data is available for downloading. However, based on the average size of files for the previous 12
months, ISE will commit to having the files available for subscribers to download by 9:00pm EST for
subscribers that are existing OPRA redistributors. The files are available for non-OPRA redistributors
the next trading day after the market opens. This level of service is subject to adjustment based on
an annual review.
4. Access to Daily Files
Although there is separate FTP servers at each data center, the subscribers will only need a single
address to access the FTP servers at either data center to download the data. The ISE will maintain
up to five days of OPRA data on the servers. Subscribers can access the FTP servers 24x7x365,
except during the hours of scheduled maintenance or in event of catastrophic failure. Fail-over
between sites would be transparent for the standard FTP process.
The ISE will provide a notification period of at least 60 days for major changes that will affect the
subscriber‟s ability to access the data under normal circumstances i.e. a change to the URL to access
the FTP servers. For any unplanned outages that are critical for the tick collection process, a
notification period may be less than 60 days.
5. Access to Historical Files
All historical data, other than the most recent five days which are on the FTP servers, will be
delivered to subscribers on portable external hard drives, which hold a terabyte or more of data.
Subscribers can choose a specific calendar period for a limited set of history or purchase the full
history. The full set of history will be shipped on the portable hard drives within two weeks of receipt
of the order form. Subsets of the complete data set will be delivered in less than two weeks i.e. up
to two months of data will be shipped in three business days. Should a hard drive fail within 60
business days of shipment, the ISE will process an order to replace the data onto another hard drive
upon receipt of the failed drive. Failed drives must be sent to:
Geralyn Endo / (212)897-8171 / [email protected]
International Securities Exchange
60 Broad Street
New York, NY 10004
ISE HOT Data User Guide
7
6. Data Maintenance
Although the ISE creates the daily files, they have no responsibility or authority to edit or change the
data format. Changes and decisions to change the feed/file structures, including any additions,
deletions or changes to the interpretation of the data fields, application level protocols, and/or
documentation for this data are determined and controlled by OPRA. Hence this offering provides
standard native OPRA formatted data as a compressed file.
The subscribers are responsible for monitoring changes on OPRA‟s Web site at:
http://www.opradata.com/specs/data_recip.jsp. The ISE will make every effort to notify subscribers
of the changes when they are announced by OPRA.
7. ISE Support – 877 473-9989
The ISE provides support between the hours of 8:00am to 6:00pm EST on Mondays through Fridays,
with the exception of US exchange holidays. The contact number for the ISE‟s support desk is 877
473-9989 and the initial contact person will be Nick Piccirillo. In the event that an escalation is
required the contact person will be Dan Amar. In addition to calling, an email may be sent to
[email protected] and [email protected].
The calls will be directed to the appropriate department based on the nature of the call. The ISE will
provide support for inquiries on the following topics:
Access to Data
Data Content
Processing the Data
8. Unplanned Interruptions of Service
In the event of an unplanned interruption that materially or negatively impacts service, the ISE will
notify subscribers via email of the event and updates as they become available.
9. History of Changes to OPRA Data and the ISE Options Data set
Starting in June of 2005, the ISE began capturing and storing the real-time multicast OPRA feed.
This capture is the full OPRA feed with all quotes and trades from all participating exchanges
including the OPRA flagged NBBO. The data is captured as the raw OPRA stream and is not
massaged or reformatted prior to April 18, 2008. When OPRA switched to the OPRA FAST format on
April 18, 2008 a slight modification to the data was required (see section 16.5 of this guide for
details).
The following is a list of major OPRA changes since the initiation of this data set:
June 1, 2005 ISE data set begins with 8 transmission lines of OPRA ASCII data
April 5, 2006 ISE data set moves to 24 transmission lines of OPRA ASCII data
January 22, 2007 OPRA reassigns option root symbols across 24 transmission lines
March 5, 2007 OPRA reassigns option root symbols across 24 transmission lines
September 15, 2007 OPRA reassigns option root symbols across 24 transmission lines
April 18, 2008 Beginning this day the ISE data switched over to OPRA FAST data format
August 25, 2008 OPRA reassigns option root symbols across 24 transmission lines
ISE HOT Data User Guide
8
November 24, 2008 ISE data set moved to OPRA FAST 2.0 data format (see section 11.3 for list of
enhancements
October 5, 2009 OPRA reassigns option root symbols across 24 transmission lines
February 12, 2010 Market participants must be prepared to utilize OSI compliant data elements
August 23, 2010 OPRA reassigns option symbols across 24 transmission lines
April 1, 2011 OPRA adds two new codes for category k quote messages
May 2, 2011 OPRA increases traffic distribution to 48 transmission lines
May 2, 2011 OPRA reassigns option symbols across 48 transmission lines
July 25, 2011 OPRA will employ a new symbol distribution with 6 characters
Additional events and changes are posted on the OPRA web site:
http://www.opradata.com/specs/data_recip.jsp
10. OPRA Data Distribution
Prior to May 2011, OPRA data is distributed across 8 or 24 transmission lines. The ISE data set
begins on June 1, 2005 and used 8 files to store the OPRA transmission lines 2 through 9 for equity,
ETF and index options (line 1 was used for foreign currency options and was not captured or
included in the data set due to minimal interest in this data). Starting with the April 5, 2006 trading
day, OPRA began disseminating data over 24 transmission lines so the ISE began to generate 24
files to capture all lines from OPRA. Beginning on May 2, 2011 OPRA began disseminating data over
48 transmission lines so ISE began to generate 48 files to capture all lines from OPRA.
The most recent OPRA Data Recipient Interface document produced by OPRA, which is available at
http://www.opradata.com/specs/data_recipient_interface.pdf, outlines the distribution of data across
the transmission lines according to the first letter or letters of the option root symbol. For example,
the Data Recipient Interface Specification Version 1.5 dated March 31, 2005 contains the following
table in Appendix B OPRA Traffic Distribution:
OPRA updates the traffic distribution approximately once or twice a year in order to balance the load
of traffic across the 8 transmission lines. When a rebalance is done the traffic across all lines will be
approximately the same.
The ISE data set for trading days in 2005 and up to April 4, 2006 has 8 files that would be named as
follows:
ISE HOT Data User Guide
9
Symbol
Distribution
Line
Routing File Name
H, I, O, R L2 feedcapture.224.0.2.227_53578
A, S L3 feedcapture.224.0.2.228_53580
B, F, L, N L4 feedcapture.224.0.2.229_53582
M, W L5 feedcapture.224.0.2.230_53584
C, E, J, P L6 feedcapture.224.0.2.231_53586
G, Q, T, Z L7 feedcapture.224.0.2.232_53588
D, X, Y L8 feedcapture.224.0.2.233_53590
K, U, V L9 feedcapture.224.0.2.234_53592
So for example, Microsoft option series using option root code MQF would be found on line 5 in file
feedcapture.224.0.2.230_53584.
Starting with trading day April 5, 2006, the ISE data set began using 24 data transmission lines. The
OPRA Data Recipient Interface Specification Version 1.7 dated March 29, 2006 contains the following
table in Appendix B OPRA Traffic Distribution.
The ISE data set for trading days from April 5, 2006 have 24 files named as follows:
ISE HOT Data User Guide
10
Now the Microsoft option series using option root code MQF would be found on line 13 in file
feedcapture.233.43.202.13_11113.
Starting with trading day May 2, 2011, the ISE data set began using 48 data transmission lines. The
OPRA Data Recipient Interface Specification Version 1.19 dated May 27, 2011 contains the following
table in Appendix B OPRA Traffic Distribution and the file names have been amended to this table.
OPRA
Channel
Symbol
Distribution as
of May 2, 2011 File Name
1 A ADMZZ feedcapture.233.43.202.001_11101
2 ADN ALLZZ feedcapture.233.43.202.002_11102
3 ALM APAZZ feedcapture.233.43.202.003_11103
4 APB AZZZZ feedcapture.233.43.202.004_11104
5 B BGZZZ feedcapture.233.43.202.005_11105
6 BH BRCAA feedcapture.233.43.202.006_11106
7 BRD CCKZZ feedcapture.233.43.202.007_11107
8 CCL CMAZZ feedcapture.233.43.202.008_11108
9 CMB CORZZ feedcapture.233.43.202.009_11109
10 COS CVSZZ feedcapture.233.43.202.010_11110
11 CVT DHZZZ feedcapture.233.43.202.011_11111
Symbol Distribution Line Routing File Name
A 1 feedcapture.233.43.202.1_11101
B 2 feedcapture.233.43.202.2_11102
C 3 feedcapture.233.43.202.3_11103
D 4 feedcapture.233.43.202.4_11104
E 5 feedcapture.233.43.202.5_11105
F 6 feedcapture.233.43.202.6_11106
G 7 feedcapture.233.43.202.7_11107
H 8 feedcapture.233.43.202.8_11108
I 9 feedcapture.233.43.202.9_11109
J, Y 10 feedcapture.233.43.202.10_11110
K 11 feedcapture.233.43.202.11_11111
L 12 feedcapture.233.43.202.12_11112
M 13 feedcapture.233.43.202.13_11113
N 14 feedcapture.233.43.202.14_11114
O 15 feedcapture.233.43.202.15_11115
P 16 feedcapture.233.43.202.16_11116
Q 17 feedcapture.233.43.202.17_11117
R 18 feedcapture.233.43.202.18_11118
S 19 feedcapture.233.43.202.19_11119
T, Z 20 feedcapture.233.43.202.20_11120
U 21 feedcapture.233.43.202.21_11121
V 22 feedcapture.233.43.202.22_11122
W 23 feedcapture.233.43.202.23_11123
X 24 feedcapture.233.43.202.24_11124
ISE HOT Data User Guide
11
OPRA
Channel
Symbol
Distribution as
of May 2, 2011 File Name
12 DI DOAZZ feedcapture.233.43.202.012_11112
13 DOB EEMZZ feedcapture.233.43.202.013_11113
14 EEN ESMZZ feedcapture.233.43.202.014_11114
15 ESN FASZZ feedcapture.233.43.202.015_11115
16 FAT FSZZZ feedcapture.233.43.202.016_11116
17 FT GIKZZ feedcapture.233.43.202.017_11117
18 GIL GPZZZ feedcapture.233.43.202.018_11118
19 GQ HNZZZ feedcapture.233.43.202.019_11119
20 HO ICZZZ feedcapture.233.43.202.020_11120
21 ID IVZZZ feedcapture.233.43.202.021_11121
22 IW IYSZZ feedcapture.233.43.202.022_11122
23 IYT JZZZZ feedcapture.233.43.202.023_11123
24 K LLZZZ feedcapture.233.43.202.024_11124
25 LM MCDZZ feedcapture.233.43.202.129_16101
26 MCE MMMZZ feedcapture.233.43.202.130_16102
27 MMN MSZZZ feedcapture.233.43.202.131_16103
28 MT NDXZZ feedcapture.233.43.202.132_16104
29 NDY NVKZZ feedcapture.233.43.202.133_16105
30 NVL PABZZ feedcapture.233.43.202.134_16106
31 PAC PIZZZ feedcapture.233.43.202.135_16107
32 PJ PXBZZ feedcapture.233.43.202.136_16108
33 PXC QQQZZ feedcapture.233.43.202.137_16109
34 QQR RRBZZ feedcapture.233.43.202.138_16110
35 RRC SBUZZ feedcapture.233.43.202.139_16111
36 SBV SKMZZ feedcapture.233.43.202.140_16112
37 SKN SPXZZ feedcapture.233.43.202.141_16113
38 SPY SPYZZ feedcapture.233.43.202.142_16114
39 SPZ SWJZZ feedcapture.233.43.202.143_16115
40 SWK TISZZ feedcapture.233.43.202.144_16116
41 TIT TVZZZ feedcapture.233.43.202.145_16117
42 TW UPKZZ feedcapture.233.43.202.146_16118
43 UPL UYLZZ feedcapture.233.43.202.147_16119
44 UYM VYZZZ feedcapture.233.43.202.148_16120
45 VZ WLSZZ feedcapture.233.43.202.149_16121
46 WLT XHZZZ feedcapture.233.43.202.150_16122
47 XI XLZZZ feedcapture.233.43.202.151_16123
48 XM ZZZZZ feedcapture.233.43.202.152_16124
Now the Microsoft option series using option root code MSFT (post OSI) would be found on line 27 in
file feedcapture.233.43.202.131_16103.
OPRA continues to update the traffic distribution approximately once or twice a year in order to
balance the load of traffic across the 48 transmission lines. Please visit the
www.opradata.com/specs/data_recip.jsp Web site for additional Data Recipient Notices that affect
the distribution of the data.
ISE HOT Data User Guide
12
11. OPRA Data Formats
The OPRA data stored in the ISE data files are a copy of the multicast packets transmitted by OPRA.
From its inception in June 2005 until April 17, 2008, the data format is ASCII. Starting with April 18,
2008 the data format is OPRA FAST. From November 24, 2008 the data format is OPRA FAST 2.0
and each of these formats is described.
11.1 ASCII OPRA Format – Beginning April 18, 2008
The underlying format of OPRA data is an ASCII stream of transmission data blocks that correspond
to multicast data packets. One or more message blocks can be stored within a transmission block.
The ASCII format for OPRA is documented in the OPRA Data Recipient Interface Specification. The
current release of this document can be found on the OPRA web site (www.opradata.com) under the
menus for Specifications -> SIAC Specifications -> Data Recipient Interface. Currently this
document can also be accessed using this direct link:
http://www.opradata.com/specs/data_recipient_interface.pdf
An ASCII OPRA message consists of a message header followed by message specific data. According
to the Data Recipient Interface Specification, the header consists of:
Field Characters Description
Participant ID 1 A=AMEX, B=BOX, C=CBOE, I=ISE, etc.
Retransmission 1 Typically blank. Message should be ignored if non
blank.
Message Identification 2 Message category and Type codes
Message Sequence Number 8 OPRA Sequence number
Time 6 Message Time in format hhmmss (no separators)
This gives the header a total of 18 bytes. Subsequent data formats expanded on the OPRA sequence
number and time fields.
The Message Identification field uses two characters: Message Category and Message Type. The
main Message Categories are Administrative, Open Interest, Quote, Trade and End of Day
messages. The Message Type character applies to the given Message Category. For example, if
Message Category is “k” (indicating a quote message) and Message Type is “F”, then we interpret
this message as a non-firm quote.
The Message Sequence Number deserves some additional attention. As discussed in a previous
section, the OPRA data is transmitted across 8, 24 or 48 multicast lines. Within one transmission
line, messages at the start of the day begin with OPRA sequence number 1 and continue to
increment for each message on that multicast line over the course of the day. Message Sequence
Numbers are not unique among multicast lines and the same sequence numbers may appear on
multiple lines. Since each multicast line carries data from different options series, the sequence
numbers used cannot be compared across lines.
Prior to November 24, 2008 the Message Sequence Number had a length of up to eight digits so
when the sequence number reached 99999999, it would “roll over” and start back with 1 again. This
ISE HOT Data User Guide
13
sequence number “roll over” event typically took place in the afternoon. Beginning November 24,
2008 the Message Sequence Number was expanded to ten digits (refer to section 11.3).
Further details of these fields and message-specific fields can be found in the Data Recipient
Interface Specification: http://www.opradata.com/specs/data_recipient_interface.pdf.
11.2 OPRA FAST Encoding
Starting with trading day April 18, 2008, the ASCII OPRA data has been encoded using the OPRA
FAST (FIX Adapted for STreaming) encoder. FAST is an algorithm that is applied to the data to
reduce or compress the size of the messages by approximately 60%. The data files can be stored in
their OPRA FAST encoded format. When OPRA switched to the OPRA FAST format on April 18, 2008
a slight modification to the data was required (see section 16.5 of this guide for details).
11.3 OPRA FAST 2.0 Encoding
Starting with trading day November 24, 2008, a new OPRA FAST format was introduced to provide
better data compression, expand the sizes of several of the fields and to introduce a new field – the
expiration day of month for the option series. This format is often referred to as OPRA FAST 2.0 or
“FAST For Symbology”. The message header has been expanded to the following 25 bytes:
Field Old Size New Size Description
Participant ID 1 1 A=AMEX, B=BOX, C=CBOE, I=ISE, etc.
Retransmission 1 1 Typically blank. Message should be
ignored if non blank.
Message
Identification
2 2 Message category and Type codes
Message Sequence
Number
8 10 OPRA Sequence number
Time with
milliseconds
6 9 Message Time in format hhmmssmmm
(no separators) where mmm is
milliseconds
Expiration Day of
Month
N/A 2 Expiration day of month introduced in
OPRA FAST 2.0
12. Option Symbology and Mapping from Underlying to Option
Consider an “Option Series” to be an identifier of an option on an underlying instrument with specific
expiration date, strike price and right (Put or call). Options series are represented in the OPRA data
feed using a combination of:
Option Root Symbol
Expiration Month Code (also determines if series is a Put or Call)
Expiration Day of Month (added starting November 24, 2008)
Strike Price Code
Explicit Strike Price
A description for each of these items follows.
ISE HOT Data User Guide
14
12.1 Option Root Symbol
The OPRA code does not always use the underlying symbol for the option root symbol. Prior to
February 12, 2010, the option root symbol was limited to three characters and many U.S. OTC
symbols have more than three characters. OPRA does not include the underlying ticker symbol in the
OPRA feed itself. However it does include the option root symbol. For a given underlying instrument
there will typically be at least one root symbol for non-LEAP option series and one root symbol for
LEAP options series. Additional root symbols may be added to cover a wider range of strike prices
and/or expiration dates. The additional option root symbols could be introduced during an expiration
month as well as intra-day. If an underlying security experiences a large swing in price, new strike
prices can be added possibly requiring the introduction of a new option root symbol.
For example, on April 3, 2008, Microsoft (ticker MSFT) had 180 option series using four different
option root symbols associated with it:
Symbol Options Series
MQF April, July and October 2008 calls and puts at strike prices $10 through $20
MSQ April, May, July and October 2008 calls and puts at strike prices $22.50 through $50
VMF January 2009 calls and puts at strike prices $15 through $55
WMF January 2010 calls and puts at strike prices $20 through $50
The Options Clearing Corporation maintains the definitive record of the mapping between the
underlying instrument and the option root symbols used for that underlying‟s options. For most
trading days, prior to the OSI project, an underlying mapping file is also included with the HOT Data
daily files. This file will have a name that incorporates the trading date as follows:
underlyingopracodemap_YYYYMMDD.csv
Where YYYY is the 4 digit year, MM is the two digit month (with leading zero if necessary) and DD is
the two digit day of the month (with leading zero if necessary). The „underlyingopracodemap‟ file has
two columns separated by a comma. The first column is the underlying ticker symbol. The second
column is the OPRA root symbol. There is no longer a need for this file post OSI.
There may be situations where OPRA root symbols are not mapped in the „underlyingopracodemap‟
file. This can happen when new option root symbols are introduced intra-day. Generally the symbols
will appear in the subsequent trading day‟s „underlyingopracodemap‟ file.
12.2 Option Expiration Month Code
OPRA assigns a code for the expiration month prior to February 12, 2010. The expiration month
code indicates the month of expiration for the option series as well as an indication of the right or
whether the series is a put or call. The following table summarizes the expiration month code and
their meaning:
Code Call Options Code Put Options
A JANUARY Call M JANUARY Put
B FEBRUARY Call N FEBRUARY Put
C MARCH Call O MARCH Put
D APRIL Call P APRIL Put
ISE HOT Data User Guide
15
Code Call Options Code Put Options
E MAY Call Q MAY Put
F JUNE Call R JUNE Put
G JULY Call S JULY Put
H AUGUST Call T AUGUST Put
I SEPTEMBER Call U SEPTEMBER Put
J OCTOBER Call V OCTOBER Put
K NOVEMBER Call W NOVEMBER Put
L DECEMBER Call X DECEMBER Put
12.3 Expiration Day of Month
For OPRA data prior to November 24, 2008, expiration date is assumed to be the day after the third
Friday of the expiration month. Starting on November 24, 2008 a new OPRA field was introduced
indicating the exact day of the month on which an option series will expire. The introduction of this
field allows for the transmission of weekly and bi-weekly option series. Initially all values for this
field were “00”. Each of the exchanges began filling in this field with real data at different times. In
general, if the value of this field is 00 then it can be assumed the expiration date is the third Friday
of the month.
12.4 Strike Price Code
Prior to February 12, 2010, the Strike Price Code is the price per share for which the underlying
security maybe bought (call option) or sold (put option) by the holder of record upon exercise of the
option. The Data Recipient Interface Specification lists the following the strike price codes for whole
number strikes:
ISE HOT Data User Guide
16
The Data Recipient Interface Specification lists the following the strike price codes half strike
increment strikes:
12.5 Explicit Strike Price
The Explicit Strike Price represents the stated price per share for which the underlying security may
be bought (call option) or sold (put option) by the holder of record upon exercise of the option.
13. Options Symbology Initiative (OSI)
The OPRA symbology was developed over 25 years ago and typically used three to five letters to
identify a particular option series. Up to the first three characters identify the option root symbol and
one character represented the contract expiration month, as well as whether the series was a put or
a call and one character represented the option exercise or strike price. For example: IBMER is the
IBM 90 call option, where IBM is the option root symbol, „E‟ represents a May call option and „R‟
represents the $90 strike price of the option.
Today this methodology poses several limitations for the industry as the options market has evolved
over the years. First of all the three characters that represent the underlying security creates
inconsistencies with many of the U.S. OTC securities, which in general are more than three
characters. When options were originally launched in the U.S. they expired the day after the third
Friday of the month. However there are now flexible and weekly expiration dates so a single
character for a monthly expiration is no longer effective. Long-term Equity AnticiPation Securities
(LEAPS®) have never been standardized and generally require a separate option symbol root.
In the summer of 2005 an industry initiative began with industry representatives to develop a plan
to eliminate the use of OPRA codes and come up with a standard to ensure that all option strike
prices be represented in decimal format. There were representatives from exchanges, vendors,
broker dealers and the Options Clearing Corp. A plan was approved on December 5, 2006 and by the
end of January 2007 the record layouts to be used throughout the testing and implementation
phases of the project were approved. Detailed tests scripts were designed, approved and published
in September 2008. A testing period was in progress from September 2009 through January 2010.
There was a mandatory cut-over to the new data elements for the record layouts on February 12,
2010 and the roll-out of the new symbology, based on a fixed number of securities in each tranche,
ran from March through May 2010.
ISE HOT Data User Guide
17
The agreed upon symbology key for options represents the minimum data requirements used in the
transmission of listed option contracts between exchanges, Options Clearing Corp and the
participants. There were no rules defined to how the minimum data requirements had to be
displayed and can vary across different redistributors of OPRA data. An example of the Apple 200
call option that expires August 20, 2010 is as follows:
Symbol Year Month Day C/P Strike Price Price
Decimal
AAPL 10 08 20 C 00200 000
14. Comparison of Pre and Post OSI Symbols
One of the objectives of the OSI was to create option symbols that would be more intuitive and less
complicated for market participants to read. Let‟s look at an example of the OPRA symbols before
and after the project.
Here is an example of a legacy OPRA symbol:
Apple 200 Call, expiring 08/22/09 – APVHT
Here is an example of the new OPRA symbol on Yahoo! Finance:
Apple 200 Call, expiring 08/20/10 - AAPL100821C00200000
15. Processing the OPRA Tick Data
The data is delivered in the standard native OPRA format and once subscribers have access to the
data they will need to understand how to process the data. Most subscribers today have experience
and knowledge on processing end of day data that is delivered as flat files.
This offering has two services which is the full OPRA daily tick file and the OPRA end of day (EOD)
summary file. There are three separate directories on the FTP server and the subscriber will be
entitled to access the corresponding directories that they subscribed to. If a subscriber had access to
all three directories they would see the following:
Up to five days of the historical OPRA tick data can be downloaded from a FTP server which is
intuitively labeled and located at ftp://reports.ise.com/. Subscribers will see the following
representation for each day:
ISE HOT Data User Guide
18
As of May 2, 2011, there will be 96 separate files made up of:
48 separate zipped files of the OPRA tick data
48 corresponding md5 hash files that are used for checking download integrity
A view of a subset of the 96 files for OPRA lines 1-12 would be as follows:
15.1 Hot Data Daily Retransmission Files
There is also another directory called „HotData_Retrans,‟ which is only available for the full tick data
service. In the event that there were any processing interruptions during the day, the ISE will
request a retransmission of these missing messages after the market close. There will be a separate
retransmission file for each of the OPRA 48 lines, whether there were gaps or not. Subscribers
should download and process these files every day.
Each daily folder is intuitively labeled and within each folder there will be 96 separate files made up
of:
48 separate zipped files of the OPRA tick data
48 corresponding md5 hash files that are used for checking download integrity
ISE HOT Data User Guide
19
A view of a subset of the 96 retransmission files for OPRA lines 1-15 would be as follows:
16. Steps for Processing Historical OPRA Tick Data
16.1 User Name and Password
New subscribers will need to execute appropriate paperwork and will then be assigned a user name
and password. The subscriber will then download the appropriate data after logging onto the FTP
server which is located at: ftp://reports.ise.com.
16.2 Download
Once the download is complete, the subscriber will need to uncompress the files. Some of the data
files maybe larger than 2 GB so the unzip programs for UNIX or LINUX will need to have the 64-bit
internal file pointers to successfully decompress the files. Windows users running WinZip will need
version 9.0 or higher. The built-in Windows decompression routines will not be able to uncompress
large files.
16.3 Integrity Check
These are large files and it is possible the subscriber could have experienced an interruption on their
connection while downloading the file. The ISE produces a MD5 hash file for each OPRA capture file
ISE HOT Data User Guide
20
which can be used to verify the integrity of each file. Therefore subscribers should check the
integrity of the file to be sure they received the same amount of data from the FTP server that was
created and loaded onto the FTP server. This is done by computing a MD5 hash file locally and
comparing it to the MD5 hash value provided by the ISE. If MD5 hash files match then no data was
lost during the download process. See Appendix A for steps to perform this check.
16.4 OPRA Data Formats
Once the files are uncompressed they will be in the native OPRA ASCII FAST or OPRA ASCII format
(data prior to April 2008 is OPRA ASCII and subsequent files are in OPRA ASCII FAST). The OPRA
FAST packets must be decoded before processing the OPRA ASCII messages. Fortunately OPRA
provides an off-the-shelf decoder.
16.5 Decoding OPRA ASCII FAST – As of April 24, 2008
The subscriber will need to decode the data in order to process the data and OPRA provides a FAST
decoder that will decode the OPRA FAST packets into the native OPRA ASCII messages. The decoder
needs to know how big each packet is so the ISE affixes a two byte packet size before each packet
of data. The subscriber‟s application will first read the packet size, then the packet to effectively
decode the file. Once the FAST packets have been decoded, the resulting data can be processed as
described in the OPRA Data Recipient Interface Specification.
Since the ISE affixes the two byte packet size, the subscriber will need to make some minor edits to
the decoder to correctly decode the files provided by the ISE. The instructions and the link to
download the decoder for LINUX/UNIX users are in Appendix B. The instructions and the link to
download the decoder for Windows users are in Appendix C.
16.6 Processing OPRA ASCII
OPRA data consists of messages (quotes, trades, open interest, etc.) made up of ASCII characters.
The format of each of the OPRA message types is given in the OPRA Data Recipient Interface
Specification. Subscribers will need write a parser to be able to read the messages and should refer
to the OPRA Data Recipient Interface Specification
http://www.opradata.com/specs/data_recipient_interface.pdf
17. Trouble Shooting FTP Access Queries
The following is a list of problems that customers sometimes have when trying to access the FTP
server and downloading the HOT Data files with the suggested recommendations. ISE will update
this section if a description of the identified problem is sent to Geralyn Endo at gendo.com.
Problem description Suggested recommendation
1) Customer cannot connect to
FTP site
Customer must have an account set up and must be configured
for “Passive FTP” mode.
If customer is in “Passive FTP” mode and still cannot connect to
FTP server, customer should contact the ISE‟s support desk at
877 473-9989. The support desk should make sure the customer
is trying to access ftp://reports.ise.com.
If the ISE can connect but the customer cannot connect to the
FTP server, the customer may have a network or firewall issue
and should seek internal support.
ISE HOT Data User Guide
21
Problem description Suggested recommendation
2) Customer cannot log onto
FTP server (gets login /
username prompt but there
is problem with username
and password)
The support desk will then confirm if they can connect to FTP
server using the ISE‟s username and password.
The support desk will then confirm if they can connect to FTP
server using the subscriber‟s username and password (see
section 16.1 of this guide).
If support desk can connect then there may be a problem with
the customer‟s account. The support desk will check the
permissions for the subscriber on ftp://reports.ise.com.
3) Customer can log in to FTP
site but cannot see any files
Customer maybe looking in wrong directory or there are
permission problems preventing customer from accessing the
assigned directories. Customer should contact the support desk.
4) Customer can see files but
cannot download file
Customer may not be issuing correct ftp „get‟ command or using
the ASCII mode instead of BINARY transfer mode. It may also be
possible that the customer does not have enough disk space.
The support desk may ask Customer to try to connect and
downloading data from a different FTP site (such as ftp.ucsd.edu
or ftp.columbia.edu both of which provide anonymous access).
Also note that the files on ISE‟s web site are in mixed upper and
lower case letters.
Note the Windows FTP program will first save to a temp folder
then move/rename the file to the destination folder. This temp
folder is under the Windows profile in “c:\documents and
settings” and there can be problems if the file to be downloaded
is larger than 2GB.
5) Customer can download file
but cannot UNZIP the files or
read the files
A)When a customer cannot unzip a file it is generally due to the
size of the files, which can be larger than 2 GB. Recommend the
following:
For Windows customers:
The UNZIP program should be WinZip 9.0 or later. Ask
customers what version of WinZip they are running.
For Unix or Linux customers:
Unix or Linux users will need to have 64-bit internal file
pointers to successfully decompress the files. Alternatively,
try using commercial WinZip for Linux/Unix or try 7-zip.
http://www.7-zip.org/
B)Customer may be using the ASCII FTP transfer mode instead
of BINARY transfer mode.
C)Another issue associated with UNZIP problems can be that the
download was incomplete (e.g., terminated due to lack of disk
ISE HOT Data User Guide
22
Problem description Suggested recommendation
space or network failure, download was done using ASCII
transfer mode, etc.). Customers should compare their local MD5
hash with the MD5 hash file they downloaded to ensure their
download was complete (see Appendix A). If the MD5 hashes do
not match then customer should ensure they have enough disk
space to download files.
6) Customer can download and
unzip file but cannot see any
data
Prior to April 2008 the OPRA data format was ASCII. The OPRA
data after 2008 is encoded in OPRA FAST format and the data
files must first be decoded using FAST decoder. Refer to section
on “OPRA FAST Encoding” in sections 11.2 and 11.3 of this
guide. Also refer to Appendix B for Unix/Linux users and
Appendix C for Windows users.
7) Customer can download,
unzip and decode FAST data
but still cannot see any data
or data comes out in one
long stream.
OPRA ASCII data has no carriage returns (line feeds or end of
line characters) and therefore cannot easily be displayed or
loaded into a database or spreadsheet. The OPRA ASCII data
must be parsed according to the OPRA Data Recipient Interface
Specification.
8) Customer can download,
unzip, decode OPRA FAST
and parse OPRA ASCII data,
but cannot match up OPRA
root symbols to underlying
messages.
Prior to the completion of the OPRA Symbology Initiative (refer
to section 13 of this guide), the customer needs to use the
underlyingopracodemap.csv file to map OPRA root symbols to
underlying instruments. Refer to “Option Root Symbol” section of
12.1 of this guide.
9) Customer cannot find any
underlying price data in the
OPRA feed.
OPRA does not contain underlying equity or ETF price data.
There is some underlying index data in the OPRA “Underlying
Value” messages but it is inconsistently used and should not be
relied upon.
18. Frequently Asked Questions
The following is a list of frequently asked questions (FAQs) will address the majority of Subscriber‟s
questions. ISE will update this list if frequently asked questions are sent to Geralyn Endo at
gendo.com.
18.1 Does HOT Data contain all OPRA data?
The HOT Data files include all OPRA trades and quotes except for foreign currency options (FCOs),
from all participating OPRA exchanges.
18.2 How far back does your historical tick data go?
The ISE currently provides OPRA tick history from June 1, 2005 to the present.
18.3 What is the actual data content?
The ISE collects the full OPRA A/B broadcast, except for foreign currency options, from
approximately 6:00 a.m. to 5:55 p.m. ET.
ISE HOT Data User Guide
23
18.4 What is the delivery format?
The ISE collects and delivers data in the standard OPRA format. The data is split up over a number
of separate files in alphabetical order. Each file is compressed to reduce delivery bandwidth and
storage requirements. Please refer to Section 7, Field Descriptions, for a complete list of available
fields at www.opradata.com/specs/data_recipient_interface.pdf.
18.5 What is the size of the daily file?
For all of 2010, the average daily file size is approximately 40 GB compressed, or 75 GB
uncompressed. However, since OPRA began disseminating data over 48 lines in May 2011, the
average daily file size is approximately 70 GB compressed, or 150 GB uncompressed.
18.6 What is the total size of all the historical data?
The complete set of historical files is in excess of 100 TB. The annual totals from June 2005 through
December 2010 are as follows:
2005: 1.5TB compressed, 5.8TB uncompressed (June-December)
2006: 3.9TB compressed, 16.2TB uncompressed
2007: 5.9TB compressed, 24.2TB uncompressed
2008: 10.3TB compressed, 25.3TB uncompressed
2009: 8.5TB compressed, 15.8TB uncompressed
2010: 10.1TB compressed, 19.2TB uncompressed
The monthly totals are listed at www.ise.com/hotdata under the “Monthly File Sizes” tab.
18.7 How do I place an order for historical data?
Each new request requires an executed license agreement and order form. Subsequent requests will
only require an executed order form. Please contact Geralyn Endo ([email protected]) for the required
paperwork.
18.8 What are the delivery methods for the historical options tick data?
There are three methods:
1. A subscriber will download up to five days of the daily OPRA tick history files from a FTP
server.
2. For all OPRA tick history files that are not available on the FTP server, the data will be
delivered:
(a) on a portable hard drive to subscriber for a separate one-time fee
(b) over a cross connect for subscribers that have a direct connection to the ISE or are
collocated at our primary data center located in Secaucus, NJ (Equinix)
18.9 How long will it take to get access to the FTP server to begin downloading data?
A subscriber will receive a user name and password within 24 hours of the ISE receiving the
executed paperwork.
18.10 How long will it take to prepare and deliver the data for an ad-hoc request?
The delivery time for ad-hoc requests depends on the amount of data and when we receive the
order. If we receive an order before 10 a.m. (ET), the following delivery times can be expected: 1-3
months of data – approximately 4 business days; 4-6 months of data – approximately 6 business
days; 7-9 months of data – approximately 8 business days; the full data set – approximately two
weeks.
ISE HOT Data User Guide
24
18.11 What delivery/storage mechanisms are used to deliver the data for ad-hoc
requests?
The ISE currently uses portable hard drives with an eSata and USB connections. There is a separate
one-time fee for each hard drive required for the order and the hard drives are retained by the
subscribers for back-up.
18.12 How do I process this data?
Some OPRA feed handlers may have the capability to read this data. Keep in mind that a real-time
feed handler is configured to read multicast data but the HOT Data is delivered as flat files, so a
configuration change maybe required for the real-time feed handler to read the flat file.
Alternatively, an OPRA parser can be written using the OPRA Data Recipient Interface Specification.
Subscribers should refer to this guide.
18.13 Is this data cleansed or filtered in anyway?
We capture the raw OPRA feed and do not impose any judgmental cleansing, filtering criteria or
conflation on the data, which can impede the results of back testing.
ISE HOT Data User Guide
25
Appendix A: Ensuring the integrity of data files
Each of the data files corresponding to the 48 OPRA data lines is compressed using the ZIP
compression algorithm and the subscriber will need to uncompress the files. Because the data files
may each be larger than 2 GB, any unzip program for UNIX or LINUX will need to have 64-bit
internal file pointers to successfully decompress the files. Windows users running WinZip will need
version 9.0 or higher.
The integrity of the data within a zip file is ensured by using a Cyclic Redundancy Check (CRC) code
that is created by the ZIP application and stored within the ZIP file. To assess the integrity of the
data within a Zip file the WinZip program or other program that can unzip these kinds of files checks
the CRC code against the data stored inside of the ZIP file and will report any inconsistencies. The
process of creating the CRC while compressing and checking the data against the CRC while
decompressing the data are done automatically by the ZIP program.
The integrity of file transfers can be checked using the MD5 hash signature created by ISE on each
zipped file. Each zipped data file will have a corresponding MD5 hash file that is created by ISE.
These files will have a “.md5” filename extension. Clients should download both the ZIP file and the
MD5 file, compute their own local MD5 hash value on the downloaded ZIP file, and then compare
ISE‟s MD5 hash with their locally computed MD5 hash file. If these match then integrity of the file
transfer has been assured.
Below is an example set of steps that can be followed to ensure file transfer integrity:
1) Original data file created by ISE:
FeedCapture.233.43.202.9_11109__20091208__.dat
2) Zipped data file created by ISE:
FeedCapture.233.43.202.9_11109__20091208__.dat.zip
3) MD5 Hash of ZIP file created by ISE:
FeedCapture.233.43.202.9_11109__20091208__.dat.zip.md5
4) Contents of MD5 hash file created by ISE: 2e809aa4b314ef95054ccf3e38a37268
5) ZIP file and MD5 hash file transferred to client location
6) Client creates their own MD5 hash file on the ZIP file that has been downloaded
7) MD5 of downloaded ZIP file: FeedCapture.233.43.202.9_11109__20091208__.dat.zip.hash
8) Contents of local MD5 hash file: 2e809aa4b314ef95054ccf3e38a37268
9) Client compares their local MD5 hash with contents of MD5 hash file downloaded from the
ISE. If the two hash values match then integrity of the file transfer is assured.
ISE HOT Data User Guide
26
Appendix B: Working with the OPRA FAST Decoder for Unix/Linux Users
The OPRA FAST decoder will decode the OPRA FAST packets into the native OPRA ASCII messages.
We have tested the OPRA decoder with sample data. Since the decoder is designed to read multicast
traffic, we found that some small edits to the code are required to correctly decode the HOT Data™
files provided by the ISE.
1) Download the OPRA FAST decoder from:
http://www.opradata.com/specs/FASTforOPRA_Decode_2.zip
2) Use Unzip to decompress this file.
3) One of the files is named „fast_main.c‟ and this is the file that requires several changes to
operate properly with the ISE data files.
4) Edit the „fast_main.c‟ file and make the following three changes (a-c):
a) On line 150
Replace:
unsigned int rec_len = 0;
with
unsigned short rec_len = 0;
b) On line 178
Replace
while (fread(&rec_len, sizeof(unsigned int), 1, fp1) != 0)
with
while (fread(&rec_len, sizeof(rec_len), 1, fp1) != 0)
c) On line 180
Replace
rec_len = ntohl(rec_len);
with
rec_len = ntohs(rec_len);
In March, 2009 OPRA made some improvements to optimize the packet size which greatly reduced
the number of FAST packets. As a result there are two more changes to make to the decoder:
5) Edit the „fast_api.h‟ file and make the following change:
On line 82
Replace:
#define MAX_MSG_SIZE 2048
with
#define MAX_MSG_SIZE 8196
ISE HOT Data User Guide
27
6) Edit the „fast_process.h‟ file and make the following change:
On line 16
Replace:
#define PACKET_SIZE 1024 // Opra Packet Size
with
#define PACKET_SIZE 8192 // Opra Packet Size
7) Compile the program using the supplied Makefile. To work with files larger than 2GB, consider
adding the -D_FILE_OFFSET_BITS=64 compiler flag.
Appendix C - Working with the OPRA FAST Decoder for Windows Users
The OPRA FAST decoder will decode the OPRA FAST packets into the native OPRA ASCII messages.
We have tested the OPRA decoder with some sample data. Since the decoder is designed to read
multicast traffic, we found that some small edits to the code are required to correctly decode the
HOT Data™ files provided by the ISE.
1) Download the decoder from: http://www.opradata.com/specs/FASTforOPRA_Decode_2.zip
2) Use WinZip to decompress this file.
3) Extract the files into a separate folder. One of the files is named „fast_main.c‟ and this is the file
that requires several changes to operate properly with the ISE data files.
4) Edit the „fast_main.c‟ file and make the following three changes (a-c):
a) On line 150
Replace:
unsigned int rec_len = 0;
with
unsigned short rec_len = 0;
b) On line 178
Replace
while (fread(&rec_len, sizeof(unsigned int), 1, fp1) != 0)
with
while (fread(&rec_len, sizeof(rec_len), 1, fp1) != 0)
c) On line 180
Replace
rec_len = ntohl(rec_len);
with
rec_len = ntohs(rec_len);
March 30, 2009: OPRA made some improvements to optimize the packet size which greatly reduced
the number of messages so there are two more changes to make to the decoder:
ISE HOT Data User Guide
28
5) Edit the „fast_api.h‟ file and make the following change:
On line 82
Replace:
#define MAX_MSG_SIZE 2048
with
#define MAX_MSG_SIZE 8196
6) Edit the „fast_process.h‟ file and make the following change:
On line 16
Replace:
#define PACKET_SIZE 1024 // Opra Packet Size
with
#define PACKET_SIZE 8192 // Opra Packet Size
7) On Windows platforms, several other changes will need to be made to various source code files:
a) Remove the #include <unistd.h> directives from included files
b) Certain variables were declared mid-function, despite this being a "C" program. Move
these declarations to the top of their respective functions.
c) The compiled program must be linked with WS2_32.lib
8) Compile the program.