28
ISE HOT Data User Guide 1 International Securities Exchange (ISE) Historical Option Tick (HOT) Data User Guide International Securities Exchange 60 Broad Street New York, NY 10004 June 21, 2011v5

(HOT) Data User Guide

  • Upload
    vodang

  • View
    226

  • Download
    2

Embed Size (px)

Citation preview

Page 1: (HOT) Data User Guide

ISE HOT Data User Guide

1

International Securities Exchange (ISE)

Historical Option Tick (HOT) Data User Guide

International Securities Exchange

60 Broad Street

New York, NY 10004

June 21, 2011– v5

Page 2: (HOT) Data User Guide

ISE HOT Data User Guide

2

Page 3: (HOT) Data User Guide

ISE HOT Data User Guide

3

Contents 1. Product Definition............................................................................................................. 5

2. OPRA Background and Overview of Data Distribution ............................................................ 5

3. Tick Collection Process ...................................................................................................... 5

4. Access to Daily Files ......................................................................................................... 6

5. Access to Historical Files ................................................................................................... 6

6. Data Maintenance ............................................................................................................. 7

7. ISE Support – 877 473-9989 ............................................................................................. 7

8. Unplanned Interruptions of Service .................................................................................... 7

9. History of Changes to OPRA Data and the ISE Options Data set ............................................. 7

10. OPRA Data Distribution .................................................................................................. 8

11. OPRA Data Formats ..................................................................................................... 12

11.1 ASCII OPRA Format – Beginning April 18, 2008 ............................................................. 12

11.2 OPRA FAST Encoding .................................................................................................. 13

11.3 OPRA FAST 2.0 Encoding ............................................................................................. 13

12. Option Symbology and Mapping from Underlying to Option .............................................. 13

12.1 Option Root Symbol ................................................................................................... 14

12.2 Option Expiration Month Code ...................................................................................... 14

12.3 Expiration Day of Month .............................................................................................. 15

12.4 Strike Price Code ..................................................................................................... 15

12.5 Explicit Strike Price ..................................................................................................... 16

13. Options Symbology Initiative (OSI) ............................................................................... 16

14. Comparison of Pre and Post OSI Symbols ....................................................................... 17

15. Processing the OPRA Tick Data ..................................................................................... 17

15.1 Hot Data Daily Retransmission Files .............................................................................. 18

16. Steps for Processing Historical OPRA Tick Data ............................................................... 19

16.1 User Name and Password ............................................................................................ 19

16.2 Download .................................................................................................................. 19

16.3 Integrity Check .......................................................................................................... 19

16.4 OPRA Data Formats .................................................................................................... 20

16.5 Decoding OPRA ASCII FAST – As of April 24, 2008 ......................................................... 20

16.6 Processing OPRA ASCII ............................................................................................... 20

17. Trouble Shooting FTP Access Queries ............................................................................. 20

Page 4: (HOT) Data User Guide

ISE HOT Data User Guide

4

18. Frequently Asked Questions ......................................................................................... 22

Appendix A: Ensuring the integrity of data files ........................................................................ 25

Appendix B: Working with the OPRA FAST Decoder for Unix/Linux Users ..................................... 26

Appendix C - Working with the OPRA FAST Decoder for Windows Users ...................................... 27

This document was produced specifically for HOT Data subscribers to provide

background on the data set and how to make use of the files. The authors are:

Richard Holowczak, Ph.D.

Associate Professor Computer Information Systems

Baruch College, City University of New York

Jeff Soule

Former Head of Market Data

International Securities Exchange

Page 5: (HOT) Data User Guide

ISE HOT Data User Guide

5

1. Product Definition

The ISE HOT Data set provides historical Options Price Reporting Authority (OPRA) data which

includes trades and quotes for all US listed equities, index and ETFs options from June 1, 2005 to the

present. This offering consists of two separate products or daily files that include either just the end

of day summary file or a file with the full day of tick data for each option series.

The full tick data file for each option series includes all trades and quotes from all option exchanges

and the OPRA designated National Best Bid and Offer (NBBO). The end of day summary file is a

much smaller subset of data and includes summary details for each series, such as price ranges and

volume for the day from each exchange. Please refer to Section 7, Field Descriptions, for a complete

list of available data fields at www.opradata.com/specs/data_recipient_interface.pdf.

2. OPRA Background and Overview of Data Distribution

The US market for listed equity, ETF and index options consists of nine exchanges as of January

2011. Each exchange manages quotes and orders from their market makers, brokers and other

members. Each exchange is responsible for computing their best bid offer (BBO) in real-time and

sending that data to the Securities Industry Automation Corp (SIAC) where the data are merged to

form what is commonly called the Options Price Reporting Authority (OPRA) Feed. SIAC also tags a

bid and ask price of each series with a code indicating if the price is currently the national best bid or

offer (NBBO). The logical flow of the market data is shown in Figure 1.

Figure 1 Logical flow of market data from exchanges to OPRA to redistributors

3. Tick Collection Process

The ISE, one of the largest equities options exchange in the world, is responsible for the creation of

the historical files that are made available with this offering. The ISE has two fully redundant data

centers with separate and distinct connections to the real-time OPRA feed.

Page 6: (HOT) Data User Guide

ISE HOT Data User Guide

6

The ISE has a number of servers that listen to the OPRA real-time feed and captures the real-time

data across the redundant infrastructures. At the end of every day the ISE compresses the OPRA

files and uploads them to separate FTP servers at each data center. The tick collection process runs

on all weekdays except for US scheduled exchange holidays.

While the tick collection process is running each day, there is a separate process that monitors the

OPRA sequence numbers and identifies any gaps or missing sequence numbers. At the end of every

day another process runs that summarizes the gaps and then makes a request to SIAC for the

missing messages which are sent back to the ISE as retransmission files.

The sizes of daily files vary so it would be impossible to provide an exact time each day when the

data is available for downloading. However, based on the average size of files for the previous 12

months, ISE will commit to having the files available for subscribers to download by 9:00pm EST for

subscribers that are existing OPRA redistributors. The files are available for non-OPRA redistributors

the next trading day after the market opens. This level of service is subject to adjustment based on

an annual review.

4. Access to Daily Files

Although there is separate FTP servers at each data center, the subscribers will only need a single

address to access the FTP servers at either data center to download the data. The ISE will maintain

up to five days of OPRA data on the servers. Subscribers can access the FTP servers 24x7x365,

except during the hours of scheduled maintenance or in event of catastrophic failure. Fail-over

between sites would be transparent for the standard FTP process.

The ISE will provide a notification period of at least 60 days for major changes that will affect the

subscriber‟s ability to access the data under normal circumstances i.e. a change to the URL to access

the FTP servers. For any unplanned outages that are critical for the tick collection process, a

notification period may be less than 60 days.

5. Access to Historical Files

All historical data, other than the most recent five days which are on the FTP servers, will be

delivered to subscribers on portable external hard drives, which hold a terabyte or more of data.

Subscribers can choose a specific calendar period for a limited set of history or purchase the full

history. The full set of history will be shipped on the portable hard drives within two weeks of receipt

of the order form. Subsets of the complete data set will be delivered in less than two weeks i.e. up

to two months of data will be shipped in three business days. Should a hard drive fail within 60

business days of shipment, the ISE will process an order to replace the data onto another hard drive

upon receipt of the failed drive. Failed drives must be sent to:

Geralyn Endo / (212)897-8171 / [email protected]

International Securities Exchange

60 Broad Street

New York, NY 10004

Page 7: (HOT) Data User Guide

ISE HOT Data User Guide

7

6. Data Maintenance

Although the ISE creates the daily files, they have no responsibility or authority to edit or change the

data format. Changes and decisions to change the feed/file structures, including any additions,

deletions or changes to the interpretation of the data fields, application level protocols, and/or

documentation for this data are determined and controlled by OPRA. Hence this offering provides

standard native OPRA formatted data as a compressed file.

The subscribers are responsible for monitoring changes on OPRA‟s Web site at:

http://www.opradata.com/specs/data_recip.jsp. The ISE will make every effort to notify subscribers

of the changes when they are announced by OPRA.

7. ISE Support – 877 473-9989

The ISE provides support between the hours of 8:00am to 6:00pm EST on Mondays through Fridays,

with the exception of US exchange holidays. The contact number for the ISE‟s support desk is 877

473-9989 and the initial contact person will be Nick Piccirillo. In the event that an escalation is

required the contact person will be Dan Amar. In addition to calling, an email may be sent to

[email protected] and [email protected].

The calls will be directed to the appropriate department based on the nature of the call. The ISE will

provide support for inquiries on the following topics:

Access to Data

Data Content

Processing the Data

8. Unplanned Interruptions of Service

In the event of an unplanned interruption that materially or negatively impacts service, the ISE will

notify subscribers via email of the event and updates as they become available.

9. History of Changes to OPRA Data and the ISE Options Data set

Starting in June of 2005, the ISE began capturing and storing the real-time multicast OPRA feed.

This capture is the full OPRA feed with all quotes and trades from all participating exchanges

including the OPRA flagged NBBO. The data is captured as the raw OPRA stream and is not

massaged or reformatted prior to April 18, 2008. When OPRA switched to the OPRA FAST format on

April 18, 2008 a slight modification to the data was required (see section 16.5 of this guide for

details).

The following is a list of major OPRA changes since the initiation of this data set:

June 1, 2005 ISE data set begins with 8 transmission lines of OPRA ASCII data

April 5, 2006 ISE data set moves to 24 transmission lines of OPRA ASCII data

January 22, 2007 OPRA reassigns option root symbols across 24 transmission lines

March 5, 2007 OPRA reassigns option root symbols across 24 transmission lines

September 15, 2007 OPRA reassigns option root symbols across 24 transmission lines

April 18, 2008 Beginning this day the ISE data switched over to OPRA FAST data format

August 25, 2008 OPRA reassigns option root symbols across 24 transmission lines

Page 8: (HOT) Data User Guide

ISE HOT Data User Guide

8

November 24, 2008 ISE data set moved to OPRA FAST 2.0 data format (see section 11.3 for list of

enhancements

October 5, 2009 OPRA reassigns option root symbols across 24 transmission lines

February 12, 2010 Market participants must be prepared to utilize OSI compliant data elements

August 23, 2010 OPRA reassigns option symbols across 24 transmission lines

April 1, 2011 OPRA adds two new codes for category k quote messages

May 2, 2011 OPRA increases traffic distribution to 48 transmission lines

May 2, 2011 OPRA reassigns option symbols across 48 transmission lines

July 25, 2011 OPRA will employ a new symbol distribution with 6 characters

Additional events and changes are posted on the OPRA web site:

http://www.opradata.com/specs/data_recip.jsp

10. OPRA Data Distribution

Prior to May 2011, OPRA data is distributed across 8 or 24 transmission lines. The ISE data set

begins on June 1, 2005 and used 8 files to store the OPRA transmission lines 2 through 9 for equity,

ETF and index options (line 1 was used for foreign currency options and was not captured or

included in the data set due to minimal interest in this data). Starting with the April 5, 2006 trading

day, OPRA began disseminating data over 24 transmission lines so the ISE began to generate 24

files to capture all lines from OPRA. Beginning on May 2, 2011 OPRA began disseminating data over

48 transmission lines so ISE began to generate 48 files to capture all lines from OPRA.

The most recent OPRA Data Recipient Interface document produced by OPRA, which is available at

http://www.opradata.com/specs/data_recipient_interface.pdf, outlines the distribution of data across

the transmission lines according to the first letter or letters of the option root symbol. For example,

the Data Recipient Interface Specification Version 1.5 dated March 31, 2005 contains the following

table in Appendix B OPRA Traffic Distribution:

OPRA updates the traffic distribution approximately once or twice a year in order to balance the load

of traffic across the 8 transmission lines. When a rebalance is done the traffic across all lines will be

approximately the same.

The ISE data set for trading days in 2005 and up to April 4, 2006 has 8 files that would be named as

follows:

Page 9: (HOT) Data User Guide

ISE HOT Data User Guide

9

Symbol

Distribution

Line

Routing File Name

H, I, O, R L2 feedcapture.224.0.2.227_53578

A, S L3 feedcapture.224.0.2.228_53580

B, F, L, N L4 feedcapture.224.0.2.229_53582

M, W L5 feedcapture.224.0.2.230_53584

C, E, J, P L6 feedcapture.224.0.2.231_53586

G, Q, T, Z L7 feedcapture.224.0.2.232_53588

D, X, Y L8 feedcapture.224.0.2.233_53590

K, U, V L9 feedcapture.224.0.2.234_53592

So for example, Microsoft option series using option root code MQF would be found on line 5 in file

feedcapture.224.0.2.230_53584.

Starting with trading day April 5, 2006, the ISE data set began using 24 data transmission lines. The

OPRA Data Recipient Interface Specification Version 1.7 dated March 29, 2006 contains the following

table in Appendix B OPRA Traffic Distribution.

The ISE data set for trading days from April 5, 2006 have 24 files named as follows:

Page 10: (HOT) Data User Guide

ISE HOT Data User Guide

10

Now the Microsoft option series using option root code MQF would be found on line 13 in file

feedcapture.233.43.202.13_11113.

Starting with trading day May 2, 2011, the ISE data set began using 48 data transmission lines. The

OPRA Data Recipient Interface Specification Version 1.19 dated May 27, 2011 contains the following

table in Appendix B OPRA Traffic Distribution and the file names have been amended to this table.

OPRA

Channel

Symbol

Distribution as

of May 2, 2011 File Name

1 A ADMZZ feedcapture.233.43.202.001_11101

2 ADN ALLZZ feedcapture.233.43.202.002_11102

3 ALM APAZZ feedcapture.233.43.202.003_11103

4 APB AZZZZ feedcapture.233.43.202.004_11104

5 B BGZZZ feedcapture.233.43.202.005_11105

6 BH BRCAA feedcapture.233.43.202.006_11106

7 BRD CCKZZ feedcapture.233.43.202.007_11107

8 CCL CMAZZ feedcapture.233.43.202.008_11108

9 CMB CORZZ feedcapture.233.43.202.009_11109

10 COS CVSZZ feedcapture.233.43.202.010_11110

11 CVT DHZZZ feedcapture.233.43.202.011_11111

Symbol Distribution Line Routing File Name

A 1 feedcapture.233.43.202.1_11101

B 2 feedcapture.233.43.202.2_11102

C 3 feedcapture.233.43.202.3_11103

D 4 feedcapture.233.43.202.4_11104

E 5 feedcapture.233.43.202.5_11105

F 6 feedcapture.233.43.202.6_11106

G 7 feedcapture.233.43.202.7_11107

H 8 feedcapture.233.43.202.8_11108

I 9 feedcapture.233.43.202.9_11109

J, Y 10 feedcapture.233.43.202.10_11110

K 11 feedcapture.233.43.202.11_11111

L 12 feedcapture.233.43.202.12_11112

M 13 feedcapture.233.43.202.13_11113

N 14 feedcapture.233.43.202.14_11114

O 15 feedcapture.233.43.202.15_11115

P 16 feedcapture.233.43.202.16_11116

Q 17 feedcapture.233.43.202.17_11117

R 18 feedcapture.233.43.202.18_11118

S 19 feedcapture.233.43.202.19_11119

T, Z 20 feedcapture.233.43.202.20_11120

U 21 feedcapture.233.43.202.21_11121

V 22 feedcapture.233.43.202.22_11122

W 23 feedcapture.233.43.202.23_11123

X 24 feedcapture.233.43.202.24_11124

Page 11: (HOT) Data User Guide

ISE HOT Data User Guide

11

OPRA

Channel

Symbol

Distribution as

of May 2, 2011 File Name

12 DI DOAZZ feedcapture.233.43.202.012_11112

13 DOB EEMZZ feedcapture.233.43.202.013_11113

14 EEN ESMZZ feedcapture.233.43.202.014_11114

15 ESN FASZZ feedcapture.233.43.202.015_11115

16 FAT FSZZZ feedcapture.233.43.202.016_11116

17 FT GIKZZ feedcapture.233.43.202.017_11117

18 GIL GPZZZ feedcapture.233.43.202.018_11118

19 GQ HNZZZ feedcapture.233.43.202.019_11119

20 HO ICZZZ feedcapture.233.43.202.020_11120

21 ID IVZZZ feedcapture.233.43.202.021_11121

22 IW IYSZZ feedcapture.233.43.202.022_11122

23 IYT JZZZZ feedcapture.233.43.202.023_11123

24 K LLZZZ feedcapture.233.43.202.024_11124

25 LM MCDZZ feedcapture.233.43.202.129_16101

26 MCE MMMZZ feedcapture.233.43.202.130_16102

27 MMN MSZZZ feedcapture.233.43.202.131_16103

28 MT NDXZZ feedcapture.233.43.202.132_16104

29 NDY NVKZZ feedcapture.233.43.202.133_16105

30 NVL PABZZ feedcapture.233.43.202.134_16106

31 PAC PIZZZ feedcapture.233.43.202.135_16107

32 PJ PXBZZ feedcapture.233.43.202.136_16108

33 PXC QQQZZ feedcapture.233.43.202.137_16109

34 QQR RRBZZ feedcapture.233.43.202.138_16110

35 RRC SBUZZ feedcapture.233.43.202.139_16111

36 SBV SKMZZ feedcapture.233.43.202.140_16112

37 SKN SPXZZ feedcapture.233.43.202.141_16113

38 SPY SPYZZ feedcapture.233.43.202.142_16114

39 SPZ SWJZZ feedcapture.233.43.202.143_16115

40 SWK TISZZ feedcapture.233.43.202.144_16116

41 TIT TVZZZ feedcapture.233.43.202.145_16117

42 TW UPKZZ feedcapture.233.43.202.146_16118

43 UPL UYLZZ feedcapture.233.43.202.147_16119

44 UYM VYZZZ feedcapture.233.43.202.148_16120

45 VZ WLSZZ feedcapture.233.43.202.149_16121

46 WLT XHZZZ feedcapture.233.43.202.150_16122

47 XI XLZZZ feedcapture.233.43.202.151_16123

48 XM ZZZZZ feedcapture.233.43.202.152_16124

Now the Microsoft option series using option root code MSFT (post OSI) would be found on line 27 in

file feedcapture.233.43.202.131_16103.

OPRA continues to update the traffic distribution approximately once or twice a year in order to

balance the load of traffic across the 48 transmission lines. Please visit the

www.opradata.com/specs/data_recip.jsp Web site for additional Data Recipient Notices that affect

the distribution of the data.

Page 12: (HOT) Data User Guide

ISE HOT Data User Guide

12

11. OPRA Data Formats

The OPRA data stored in the ISE data files are a copy of the multicast packets transmitted by OPRA.

From its inception in June 2005 until April 17, 2008, the data format is ASCII. Starting with April 18,

2008 the data format is OPRA FAST. From November 24, 2008 the data format is OPRA FAST 2.0

and each of these formats is described.

11.1 ASCII OPRA Format – Beginning April 18, 2008

The underlying format of OPRA data is an ASCII stream of transmission data blocks that correspond

to multicast data packets. One or more message blocks can be stored within a transmission block.

The ASCII format for OPRA is documented in the OPRA Data Recipient Interface Specification. The

current release of this document can be found on the OPRA web site (www.opradata.com) under the

menus for Specifications -> SIAC Specifications -> Data Recipient Interface. Currently this

document can also be accessed using this direct link:

http://www.opradata.com/specs/data_recipient_interface.pdf

An ASCII OPRA message consists of a message header followed by message specific data. According

to the Data Recipient Interface Specification, the header consists of:

Field Characters Description

Participant ID 1 A=AMEX, B=BOX, C=CBOE, I=ISE, etc.

Retransmission 1 Typically blank. Message should be ignored if non

blank.

Message Identification 2 Message category and Type codes

Message Sequence Number 8 OPRA Sequence number

Time 6 Message Time in format hhmmss (no separators)

This gives the header a total of 18 bytes. Subsequent data formats expanded on the OPRA sequence

number and time fields.

The Message Identification field uses two characters: Message Category and Message Type. The

main Message Categories are Administrative, Open Interest, Quote, Trade and End of Day

messages. The Message Type character applies to the given Message Category. For example, if

Message Category is “k” (indicating a quote message) and Message Type is “F”, then we interpret

this message as a non-firm quote.

The Message Sequence Number deserves some additional attention. As discussed in a previous

section, the OPRA data is transmitted across 8, 24 or 48 multicast lines. Within one transmission

line, messages at the start of the day begin with OPRA sequence number 1 and continue to

increment for each message on that multicast line over the course of the day. Message Sequence

Numbers are not unique among multicast lines and the same sequence numbers may appear on

multiple lines. Since each multicast line carries data from different options series, the sequence

numbers used cannot be compared across lines.

Prior to November 24, 2008 the Message Sequence Number had a length of up to eight digits so

when the sequence number reached 99999999, it would “roll over” and start back with 1 again. This

Page 13: (HOT) Data User Guide

ISE HOT Data User Guide

13

sequence number “roll over” event typically took place in the afternoon. Beginning November 24,

2008 the Message Sequence Number was expanded to ten digits (refer to section 11.3).

Further details of these fields and message-specific fields can be found in the Data Recipient

Interface Specification: http://www.opradata.com/specs/data_recipient_interface.pdf.

11.2 OPRA FAST Encoding

Starting with trading day April 18, 2008, the ASCII OPRA data has been encoded using the OPRA

FAST (FIX Adapted for STreaming) encoder. FAST is an algorithm that is applied to the data to

reduce or compress the size of the messages by approximately 60%. The data files can be stored in

their OPRA FAST encoded format. When OPRA switched to the OPRA FAST format on April 18, 2008

a slight modification to the data was required (see section 16.5 of this guide for details).

11.3 OPRA FAST 2.0 Encoding

Starting with trading day November 24, 2008, a new OPRA FAST format was introduced to provide

better data compression, expand the sizes of several of the fields and to introduce a new field – the

expiration day of month for the option series. This format is often referred to as OPRA FAST 2.0 or

“FAST For Symbology”. The message header has been expanded to the following 25 bytes:

Field Old Size New Size Description

Participant ID 1 1 A=AMEX, B=BOX, C=CBOE, I=ISE, etc.

Retransmission 1 1 Typically blank. Message should be

ignored if non blank.

Message

Identification

2 2 Message category and Type codes

Message Sequence

Number

8 10 OPRA Sequence number

Time with

milliseconds

6 9 Message Time in format hhmmssmmm

(no separators) where mmm is

milliseconds

Expiration Day of

Month

N/A 2 Expiration day of month introduced in

OPRA FAST 2.0

12. Option Symbology and Mapping from Underlying to Option

Consider an “Option Series” to be an identifier of an option on an underlying instrument with specific

expiration date, strike price and right (Put or call). Options series are represented in the OPRA data

feed using a combination of:

Option Root Symbol

Expiration Month Code (also determines if series is a Put or Call)

Expiration Day of Month (added starting November 24, 2008)

Strike Price Code

Explicit Strike Price

A description for each of these items follows.

Page 14: (HOT) Data User Guide

ISE HOT Data User Guide

14

12.1 Option Root Symbol

The OPRA code does not always use the underlying symbol for the option root symbol. Prior to

February 12, 2010, the option root symbol was limited to three characters and many U.S. OTC

symbols have more than three characters. OPRA does not include the underlying ticker symbol in the

OPRA feed itself. However it does include the option root symbol. For a given underlying instrument

there will typically be at least one root symbol for non-LEAP option series and one root symbol for

LEAP options series. Additional root symbols may be added to cover a wider range of strike prices

and/or expiration dates. The additional option root symbols could be introduced during an expiration

month as well as intra-day. If an underlying security experiences a large swing in price, new strike

prices can be added possibly requiring the introduction of a new option root symbol.

For example, on April 3, 2008, Microsoft (ticker MSFT) had 180 option series using four different

option root symbols associated with it:

Symbol Options Series

MQF April, July and October 2008 calls and puts at strike prices $10 through $20

MSQ April, May, July and October 2008 calls and puts at strike prices $22.50 through $50

VMF January 2009 calls and puts at strike prices $15 through $55

WMF January 2010 calls and puts at strike prices $20 through $50

The Options Clearing Corporation maintains the definitive record of the mapping between the

underlying instrument and the option root symbols used for that underlying‟s options. For most

trading days, prior to the OSI project, an underlying mapping file is also included with the HOT Data

daily files. This file will have a name that incorporates the trading date as follows:

underlyingopracodemap_YYYYMMDD.csv

Where YYYY is the 4 digit year, MM is the two digit month (with leading zero if necessary) and DD is

the two digit day of the month (with leading zero if necessary). The „underlyingopracodemap‟ file has

two columns separated by a comma. The first column is the underlying ticker symbol. The second

column is the OPRA root symbol. There is no longer a need for this file post OSI.

There may be situations where OPRA root symbols are not mapped in the „underlyingopracodemap‟

file. This can happen when new option root symbols are introduced intra-day. Generally the symbols

will appear in the subsequent trading day‟s „underlyingopracodemap‟ file.

12.2 Option Expiration Month Code

OPRA assigns a code for the expiration month prior to February 12, 2010. The expiration month

code indicates the month of expiration for the option series as well as an indication of the right or

whether the series is a put or call. The following table summarizes the expiration month code and

their meaning:

Code Call Options Code Put Options

A JANUARY Call M JANUARY Put

B FEBRUARY Call N FEBRUARY Put

C MARCH Call O MARCH Put

D APRIL Call P APRIL Put

Page 15: (HOT) Data User Guide

ISE HOT Data User Guide

15

Code Call Options Code Put Options

E MAY Call Q MAY Put

F JUNE Call R JUNE Put

G JULY Call S JULY Put

H AUGUST Call T AUGUST Put

I SEPTEMBER Call U SEPTEMBER Put

J OCTOBER Call V OCTOBER Put

K NOVEMBER Call W NOVEMBER Put

L DECEMBER Call X DECEMBER Put

12.3 Expiration Day of Month

For OPRA data prior to November 24, 2008, expiration date is assumed to be the day after the third

Friday of the expiration month. Starting on November 24, 2008 a new OPRA field was introduced

indicating the exact day of the month on which an option series will expire. The introduction of this

field allows for the transmission of weekly and bi-weekly option series. Initially all values for this

field were “00”. Each of the exchanges began filling in this field with real data at different times. In

general, if the value of this field is 00 then it can be assumed the expiration date is the third Friday

of the month.

12.4 Strike Price Code

Prior to February 12, 2010, the Strike Price Code is the price per share for which the underlying

security maybe bought (call option) or sold (put option) by the holder of record upon exercise of the

option. The Data Recipient Interface Specification lists the following the strike price codes for whole

number strikes:

Page 16: (HOT) Data User Guide

ISE HOT Data User Guide

16

The Data Recipient Interface Specification lists the following the strike price codes half strike

increment strikes:

12.5 Explicit Strike Price

The Explicit Strike Price represents the stated price per share for which the underlying security may

be bought (call option) or sold (put option) by the holder of record upon exercise of the option.

13. Options Symbology Initiative (OSI)

The OPRA symbology was developed over 25 years ago and typically used three to five letters to

identify a particular option series. Up to the first three characters identify the option root symbol and

one character represented the contract expiration month, as well as whether the series was a put or

a call and one character represented the option exercise or strike price. For example: IBMER is the

IBM 90 call option, where IBM is the option root symbol, „E‟ represents a May call option and „R‟

represents the $90 strike price of the option.

Today this methodology poses several limitations for the industry as the options market has evolved

over the years. First of all the three characters that represent the underlying security creates

inconsistencies with many of the U.S. OTC securities, which in general are more than three

characters. When options were originally launched in the U.S. they expired the day after the third

Friday of the month. However there are now flexible and weekly expiration dates so a single

character for a monthly expiration is no longer effective. Long-term Equity AnticiPation Securities

(LEAPS®) have never been standardized and generally require a separate option symbol root.

In the summer of 2005 an industry initiative began with industry representatives to develop a plan

to eliminate the use of OPRA codes and come up with a standard to ensure that all option strike

prices be represented in decimal format. There were representatives from exchanges, vendors,

broker dealers and the Options Clearing Corp. A plan was approved on December 5, 2006 and by the

end of January 2007 the record layouts to be used throughout the testing and implementation

phases of the project were approved. Detailed tests scripts were designed, approved and published

in September 2008. A testing period was in progress from September 2009 through January 2010.

There was a mandatory cut-over to the new data elements for the record layouts on February 12,

2010 and the roll-out of the new symbology, based on a fixed number of securities in each tranche,

ran from March through May 2010.

Page 17: (HOT) Data User Guide

ISE HOT Data User Guide

17

The agreed upon symbology key for options represents the minimum data requirements used in the

transmission of listed option contracts between exchanges, Options Clearing Corp and the

participants. There were no rules defined to how the minimum data requirements had to be

displayed and can vary across different redistributors of OPRA data. An example of the Apple 200

call option that expires August 20, 2010 is as follows:

Symbol Year Month Day C/P Strike Price Price

Decimal

AAPL 10 08 20 C 00200 000

14. Comparison of Pre and Post OSI Symbols

One of the objectives of the OSI was to create option symbols that would be more intuitive and less

complicated for market participants to read. Let‟s look at an example of the OPRA symbols before

and after the project.

Here is an example of a legacy OPRA symbol:

Apple 200 Call, expiring 08/22/09 – APVHT

Here is an example of the new OPRA symbol on Yahoo! Finance:

Apple 200 Call, expiring 08/20/10 - AAPL100821C00200000

15. Processing the OPRA Tick Data

The data is delivered in the standard native OPRA format and once subscribers have access to the

data they will need to understand how to process the data. Most subscribers today have experience

and knowledge on processing end of day data that is delivered as flat files.

This offering has two services which is the full OPRA daily tick file and the OPRA end of day (EOD)

summary file. There are three separate directories on the FTP server and the subscriber will be

entitled to access the corresponding directories that they subscribed to. If a subscriber had access to

all three directories they would see the following:

Up to five days of the historical OPRA tick data can be downloaded from a FTP server which is

intuitively labeled and located at ftp://reports.ise.com/. Subscribers will see the following

representation for each day:

Page 18: (HOT) Data User Guide

ISE HOT Data User Guide

18

As of May 2, 2011, there will be 96 separate files made up of:

48 separate zipped files of the OPRA tick data

48 corresponding md5 hash files that are used for checking download integrity

A view of a subset of the 96 files for OPRA lines 1-12 would be as follows:

15.1 Hot Data Daily Retransmission Files

There is also another directory called „HotData_Retrans,‟ which is only available for the full tick data

service. In the event that there were any processing interruptions during the day, the ISE will

request a retransmission of these missing messages after the market close. There will be a separate

retransmission file for each of the OPRA 48 lines, whether there were gaps or not. Subscribers

should download and process these files every day.

Each daily folder is intuitively labeled and within each folder there will be 96 separate files made up

of:

48 separate zipped files of the OPRA tick data

48 corresponding md5 hash files that are used for checking download integrity

Page 19: (HOT) Data User Guide

ISE HOT Data User Guide

19

A view of a subset of the 96 retransmission files for OPRA lines 1-15 would be as follows:

16. Steps for Processing Historical OPRA Tick Data

16.1 User Name and Password

New subscribers will need to execute appropriate paperwork and will then be assigned a user name

and password. The subscriber will then download the appropriate data after logging onto the FTP

server which is located at: ftp://reports.ise.com.

16.2 Download

Once the download is complete, the subscriber will need to uncompress the files. Some of the data

files maybe larger than 2 GB so the unzip programs for UNIX or LINUX will need to have the 64-bit

internal file pointers to successfully decompress the files. Windows users running WinZip will need

version 9.0 or higher. The built-in Windows decompression routines will not be able to uncompress

large files.

16.3 Integrity Check

These are large files and it is possible the subscriber could have experienced an interruption on their

connection while downloading the file. The ISE produces a MD5 hash file for each OPRA capture file

Page 20: (HOT) Data User Guide

ISE HOT Data User Guide

20

which can be used to verify the integrity of each file. Therefore subscribers should check the

integrity of the file to be sure they received the same amount of data from the FTP server that was

created and loaded onto the FTP server. This is done by computing a MD5 hash file locally and

comparing it to the MD5 hash value provided by the ISE. If MD5 hash files match then no data was

lost during the download process. See Appendix A for steps to perform this check.

16.4 OPRA Data Formats

Once the files are uncompressed they will be in the native OPRA ASCII FAST or OPRA ASCII format

(data prior to April 2008 is OPRA ASCII and subsequent files are in OPRA ASCII FAST). The OPRA

FAST packets must be decoded before processing the OPRA ASCII messages. Fortunately OPRA

provides an off-the-shelf decoder.

16.5 Decoding OPRA ASCII FAST – As of April 24, 2008

The subscriber will need to decode the data in order to process the data and OPRA provides a FAST

decoder that will decode the OPRA FAST packets into the native OPRA ASCII messages. The decoder

needs to know how big each packet is so the ISE affixes a two byte packet size before each packet

of data. The subscriber‟s application will first read the packet size, then the packet to effectively

decode the file. Once the FAST packets have been decoded, the resulting data can be processed as

described in the OPRA Data Recipient Interface Specification.

Since the ISE affixes the two byte packet size, the subscriber will need to make some minor edits to

the decoder to correctly decode the files provided by the ISE. The instructions and the link to

download the decoder for LINUX/UNIX users are in Appendix B. The instructions and the link to

download the decoder for Windows users are in Appendix C.

16.6 Processing OPRA ASCII

OPRA data consists of messages (quotes, trades, open interest, etc.) made up of ASCII characters.

The format of each of the OPRA message types is given in the OPRA Data Recipient Interface

Specification. Subscribers will need write a parser to be able to read the messages and should refer

to the OPRA Data Recipient Interface Specification

http://www.opradata.com/specs/data_recipient_interface.pdf

17. Trouble Shooting FTP Access Queries

The following is a list of problems that customers sometimes have when trying to access the FTP

server and downloading the HOT Data files with the suggested recommendations. ISE will update

this section if a description of the identified problem is sent to Geralyn Endo at gendo.com.

Problem description Suggested recommendation

1) Customer cannot connect to

FTP site

Customer must have an account set up and must be configured

for “Passive FTP” mode.

If customer is in “Passive FTP” mode and still cannot connect to

FTP server, customer should contact the ISE‟s support desk at

877 473-9989. The support desk should make sure the customer

is trying to access ftp://reports.ise.com.

If the ISE can connect but the customer cannot connect to the

FTP server, the customer may have a network or firewall issue

and should seek internal support.

Page 21: (HOT) Data User Guide

ISE HOT Data User Guide

21

Problem description Suggested recommendation

2) Customer cannot log onto

FTP server (gets login /

username prompt but there

is problem with username

and password)

The support desk will then confirm if they can connect to FTP

server using the ISE‟s username and password.

The support desk will then confirm if they can connect to FTP

server using the subscriber‟s username and password (see

section 16.1 of this guide).

If support desk can connect then there may be a problem with

the customer‟s account. The support desk will check the

permissions for the subscriber on ftp://reports.ise.com.

3) Customer can log in to FTP

site but cannot see any files

Customer maybe looking in wrong directory or there are

permission problems preventing customer from accessing the

assigned directories. Customer should contact the support desk.

4) Customer can see files but

cannot download file

Customer may not be issuing correct ftp „get‟ command or using

the ASCII mode instead of BINARY transfer mode. It may also be

possible that the customer does not have enough disk space.

The support desk may ask Customer to try to connect and

downloading data from a different FTP site (such as ftp.ucsd.edu

or ftp.columbia.edu both of which provide anonymous access).

Also note that the files on ISE‟s web site are in mixed upper and

lower case letters.

Note the Windows FTP program will first save to a temp folder

then move/rename the file to the destination folder. This temp

folder is under the Windows profile in “c:\documents and

settings” and there can be problems if the file to be downloaded

is larger than 2GB.

5) Customer can download file

but cannot UNZIP the files or

read the files

A)When a customer cannot unzip a file it is generally due to the

size of the files, which can be larger than 2 GB. Recommend the

following:

For Windows customers:

The UNZIP program should be WinZip 9.0 or later. Ask

customers what version of WinZip they are running.

For Unix or Linux customers:

Unix or Linux users will need to have 64-bit internal file

pointers to successfully decompress the files. Alternatively,

try using commercial WinZip for Linux/Unix or try 7-zip.

http://www.7-zip.org/

B)Customer may be using the ASCII FTP transfer mode instead

of BINARY transfer mode.

C)Another issue associated with UNZIP problems can be that the

download was incomplete (e.g., terminated due to lack of disk

Page 22: (HOT) Data User Guide

ISE HOT Data User Guide

22

Problem description Suggested recommendation

space or network failure, download was done using ASCII

transfer mode, etc.). Customers should compare their local MD5

hash with the MD5 hash file they downloaded to ensure their

download was complete (see Appendix A). If the MD5 hashes do

not match then customer should ensure they have enough disk

space to download files.

6) Customer can download and

unzip file but cannot see any

data

Prior to April 2008 the OPRA data format was ASCII. The OPRA

data after 2008 is encoded in OPRA FAST format and the data

files must first be decoded using FAST decoder. Refer to section

on “OPRA FAST Encoding” in sections 11.2 and 11.3 of this

guide. Also refer to Appendix B for Unix/Linux users and

Appendix C for Windows users.

7) Customer can download,

unzip and decode FAST data

but still cannot see any data

or data comes out in one

long stream.

OPRA ASCII data has no carriage returns (line feeds or end of

line characters) and therefore cannot easily be displayed or

loaded into a database or spreadsheet. The OPRA ASCII data

must be parsed according to the OPRA Data Recipient Interface

Specification.

8) Customer can download,

unzip, decode OPRA FAST

and parse OPRA ASCII data,

but cannot match up OPRA

root symbols to underlying

messages.

Prior to the completion of the OPRA Symbology Initiative (refer

to section 13 of this guide), the customer needs to use the

underlyingopracodemap.csv file to map OPRA root symbols to

underlying instruments. Refer to “Option Root Symbol” section of

12.1 of this guide.

9) Customer cannot find any

underlying price data in the

OPRA feed.

OPRA does not contain underlying equity or ETF price data.

There is some underlying index data in the OPRA “Underlying

Value” messages but it is inconsistently used and should not be

relied upon.

18. Frequently Asked Questions

The following is a list of frequently asked questions (FAQs) will address the majority of Subscriber‟s

questions. ISE will update this list if frequently asked questions are sent to Geralyn Endo at

gendo.com.

18.1 Does HOT Data contain all OPRA data?

The HOT Data files include all OPRA trades and quotes except for foreign currency options (FCOs),

from all participating OPRA exchanges.

18.2 How far back does your historical tick data go?

The ISE currently provides OPRA tick history from June 1, 2005 to the present.

18.3 What is the actual data content?

The ISE collects the full OPRA A/B broadcast, except for foreign currency options, from

approximately 6:00 a.m. to 5:55 p.m. ET.

Page 23: (HOT) Data User Guide

ISE HOT Data User Guide

23

18.4 What is the delivery format?

The ISE collects and delivers data in the standard OPRA format. The data is split up over a number

of separate files in alphabetical order. Each file is compressed to reduce delivery bandwidth and

storage requirements. Please refer to Section 7, Field Descriptions, for a complete list of available

fields at www.opradata.com/specs/data_recipient_interface.pdf.

18.5 What is the size of the daily file?

For all of 2010, the average daily file size is approximately 40 GB compressed, or 75 GB

uncompressed. However, since OPRA began disseminating data over 48 lines in May 2011, the

average daily file size is approximately 70 GB compressed, or 150 GB uncompressed.

18.6 What is the total size of all the historical data?

The complete set of historical files is in excess of 100 TB. The annual totals from June 2005 through

December 2010 are as follows:

2005: 1.5TB compressed, 5.8TB uncompressed (June-December)

2006: 3.9TB compressed, 16.2TB uncompressed

2007: 5.9TB compressed, 24.2TB uncompressed

2008: 10.3TB compressed, 25.3TB uncompressed

2009: 8.5TB compressed, 15.8TB uncompressed

2010: 10.1TB compressed, 19.2TB uncompressed

The monthly totals are listed at www.ise.com/hotdata under the “Monthly File Sizes” tab.

18.7 How do I place an order for historical data?

Each new request requires an executed license agreement and order form. Subsequent requests will

only require an executed order form. Please contact Geralyn Endo ([email protected]) for the required

paperwork.

18.8 What are the delivery methods for the historical options tick data?

There are three methods:

1. A subscriber will download up to five days of the daily OPRA tick history files from a FTP

server.

2. For all OPRA tick history files that are not available on the FTP server, the data will be

delivered:

(a) on a portable hard drive to subscriber for a separate one-time fee

(b) over a cross connect for subscribers that have a direct connection to the ISE or are

collocated at our primary data center located in Secaucus, NJ (Equinix)

18.9 How long will it take to get access to the FTP server to begin downloading data?

A subscriber will receive a user name and password within 24 hours of the ISE receiving the

executed paperwork.

18.10 How long will it take to prepare and deliver the data for an ad-hoc request?

The delivery time for ad-hoc requests depends on the amount of data and when we receive the

order. If we receive an order before 10 a.m. (ET), the following delivery times can be expected: 1-3

months of data – approximately 4 business days; 4-6 months of data – approximately 6 business

days; 7-9 months of data – approximately 8 business days; the full data set – approximately two

weeks.

Page 24: (HOT) Data User Guide

ISE HOT Data User Guide

24

18.11 What delivery/storage mechanisms are used to deliver the data for ad-hoc

requests?

The ISE currently uses portable hard drives with an eSata and USB connections. There is a separate

one-time fee for each hard drive required for the order and the hard drives are retained by the

subscribers for back-up.

18.12 How do I process this data?

Some OPRA feed handlers may have the capability to read this data. Keep in mind that a real-time

feed handler is configured to read multicast data but the HOT Data is delivered as flat files, so a

configuration change maybe required for the real-time feed handler to read the flat file.

Alternatively, an OPRA parser can be written using the OPRA Data Recipient Interface Specification.

Subscribers should refer to this guide.

18.13 Is this data cleansed or filtered in anyway?

We capture the raw OPRA feed and do not impose any judgmental cleansing, filtering criteria or

conflation on the data, which can impede the results of back testing.

Page 25: (HOT) Data User Guide

ISE HOT Data User Guide

25

Appendix A: Ensuring the integrity of data files

Each of the data files corresponding to the 48 OPRA data lines is compressed using the ZIP

compression algorithm and the subscriber will need to uncompress the files. Because the data files

may each be larger than 2 GB, any unzip program for UNIX or LINUX will need to have 64-bit

internal file pointers to successfully decompress the files. Windows users running WinZip will need

version 9.0 or higher.

The integrity of the data within a zip file is ensured by using a Cyclic Redundancy Check (CRC) code

that is created by the ZIP application and stored within the ZIP file. To assess the integrity of the

data within a Zip file the WinZip program or other program that can unzip these kinds of files checks

the CRC code against the data stored inside of the ZIP file and will report any inconsistencies. The

process of creating the CRC while compressing and checking the data against the CRC while

decompressing the data are done automatically by the ZIP program.

The integrity of file transfers can be checked using the MD5 hash signature created by ISE on each

zipped file. Each zipped data file will have a corresponding MD5 hash file that is created by ISE.

These files will have a “.md5” filename extension. Clients should download both the ZIP file and the

MD5 file, compute their own local MD5 hash value on the downloaded ZIP file, and then compare

ISE‟s MD5 hash with their locally computed MD5 hash file. If these match then integrity of the file

transfer has been assured.

Below is an example set of steps that can be followed to ensure file transfer integrity:

1) Original data file created by ISE:

FeedCapture.233.43.202.9_11109__20091208__.dat

2) Zipped data file created by ISE:

FeedCapture.233.43.202.9_11109__20091208__.dat.zip

3) MD5 Hash of ZIP file created by ISE:

FeedCapture.233.43.202.9_11109__20091208__.dat.zip.md5

4) Contents of MD5 hash file created by ISE: 2e809aa4b314ef95054ccf3e38a37268

5) ZIP file and MD5 hash file transferred to client location

6) Client creates their own MD5 hash file on the ZIP file that has been downloaded

7) MD5 of downloaded ZIP file: FeedCapture.233.43.202.9_11109__20091208__.dat.zip.hash

8) Contents of local MD5 hash file: 2e809aa4b314ef95054ccf3e38a37268

9) Client compares their local MD5 hash with contents of MD5 hash file downloaded from the

ISE. If the two hash values match then integrity of the file transfer is assured.

Page 26: (HOT) Data User Guide

ISE HOT Data User Guide

26

Appendix B: Working with the OPRA FAST Decoder for Unix/Linux Users

The OPRA FAST decoder will decode the OPRA FAST packets into the native OPRA ASCII messages.

We have tested the OPRA decoder with sample data. Since the decoder is designed to read multicast

traffic, we found that some small edits to the code are required to correctly decode the HOT Data™

files provided by the ISE.

1) Download the OPRA FAST decoder from:

http://www.opradata.com/specs/FASTforOPRA_Decode_2.zip

2) Use Unzip to decompress this file.

3) One of the files is named „fast_main.c‟ and this is the file that requires several changes to

operate properly with the ISE data files.

4) Edit the „fast_main.c‟ file and make the following three changes (a-c):

a) On line 150

Replace:

unsigned int rec_len = 0;

with

unsigned short rec_len = 0;

b) On line 178

Replace

while (fread(&rec_len, sizeof(unsigned int), 1, fp1) != 0)

with

while (fread(&rec_len, sizeof(rec_len), 1, fp1) != 0)

c) On line 180

Replace

rec_len = ntohl(rec_len);

with

rec_len = ntohs(rec_len);

In March, 2009 OPRA made some improvements to optimize the packet size which greatly reduced

the number of FAST packets. As a result there are two more changes to make to the decoder:

5) Edit the „fast_api.h‟ file and make the following change:

On line 82

Replace:

#define MAX_MSG_SIZE 2048

with

#define MAX_MSG_SIZE 8196

Page 27: (HOT) Data User Guide

ISE HOT Data User Guide

27

6) Edit the „fast_process.h‟ file and make the following change:

On line 16

Replace:

#define PACKET_SIZE 1024 // Opra Packet Size

with

#define PACKET_SIZE 8192 // Opra Packet Size

7) Compile the program using the supplied Makefile. To work with files larger than 2GB, consider

adding the -D_FILE_OFFSET_BITS=64 compiler flag.

Appendix C - Working with the OPRA FAST Decoder for Windows Users

The OPRA FAST decoder will decode the OPRA FAST packets into the native OPRA ASCII messages.

We have tested the OPRA decoder with some sample data. Since the decoder is designed to read

multicast traffic, we found that some small edits to the code are required to correctly decode the

HOT Data™ files provided by the ISE.

1) Download the decoder from: http://www.opradata.com/specs/FASTforOPRA_Decode_2.zip

2) Use WinZip to decompress this file.

3) Extract the files into a separate folder. One of the files is named „fast_main.c‟ and this is the file

that requires several changes to operate properly with the ISE data files.

4) Edit the „fast_main.c‟ file and make the following three changes (a-c):

a) On line 150

Replace:

unsigned int rec_len = 0;

with

unsigned short rec_len = 0;

b) On line 178

Replace

while (fread(&rec_len, sizeof(unsigned int), 1, fp1) != 0)

with

while (fread(&rec_len, sizeof(rec_len), 1, fp1) != 0)

c) On line 180

Replace

rec_len = ntohl(rec_len);

with

rec_len = ntohs(rec_len);

March 30, 2009: OPRA made some improvements to optimize the packet size which greatly reduced

the number of messages so there are two more changes to make to the decoder:

Page 28: (HOT) Data User Guide

ISE HOT Data User Guide

28

5) Edit the „fast_api.h‟ file and make the following change:

On line 82

Replace:

#define MAX_MSG_SIZE 2048

with

#define MAX_MSG_SIZE 8196

6) Edit the „fast_process.h‟ file and make the following change:

On line 16

Replace:

#define PACKET_SIZE 1024 // Opra Packet Size

with

#define PACKET_SIZE 8192 // Opra Packet Size

7) On Windows platforms, several other changes will need to be made to various source code files:

a) Remove the #include <unistd.h> directives from included files

b) Certain variables were declared mid-function, despite this being a "C" program. Move

these declarations to the top of their respective functions.

c) The compiled program must be linked with WS2_32.lib

8) Compile the program.