43
1 Query Issues in Continuous Reporting Systems Graham Gal Dept of Accounting and Information Systems The Isenberg School of Management University of Massachusetts Amherst, MA 01003 [email protected]

Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

1

Query Issues in Continuous Reporting Systems

Graham Gal

Dept of Accounting and Information Systems The Isenberg School of Management

University of Massachusetts Amherst, MA 01003

[email protected]

Page 2: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

2

Abstract Investors have always made the case that more and better information will improve their decisions. Regulators have sought to answer these requests by searching for ways to make the information more readily available. A possible solution would be to move toward real time reporting. Certainly, current technology could provide an expanded set of corporate information that is more current or even continuously available. However, as the time between event and reporting of that event shrinks several issues arise. These include the information to be disclosed, its level of detail, the time lag, and the methods available to query the information. Furthermore information usage, information adequacy, and materiality emerge as additional concerns. These issues, while not currently discussed in the accounting literature, can benefit from learning from queries on statistical databases such as those containing U.S. Census information and hospital information. Methodologies to restrict queries with the use of inference channels, inductive learning, and query history are proposed as having implications for continuous reporting.

Formatted: Font: 14 pt

Formatted: Font: Not Bold, NotItalic

Formatted: Font: Not Bold, NotItalic

Formatted: Font: Not Bold, NotItalic

Formatted: Font: Not Bold, NotItalic

Formatted: Font: Not Bold, NotItalic

Formatted: Font: Not Bold, NotItalic

Comment [MV1]: Do u really want to justify this paper in the abstract with “a recent report”? hopefully this paper will be read many years form now.. I think u need to go ove r the abstract now that the paper is substantially different

Deleted: A recent report by an advisory committee of the Securities and Exchange Commission includes a call for reporting of financial information on a more timely basis.  This report calls for examination of this disclosure through improved corporate websites and tagging of the information using XBRL.  

Deleted: ; ho

Deleted: there are issues that need consideration before adoption of this technology

Deleted: One area of concern involves the way in which to assure the constant release of information.  As the time between the event and the reporting of that event shrinks, verifying the accuracy of the information becomes problematic.  Another, but related issue concerns the amount of information disclosed.  With disclosure of more information about the firm then information security also needs consideration.  This paper discusses some audit and security issues as well as the benefits and the problems of different approaches that potentially can keep information secure in the continuous reporting environment.¶

Page 3: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

3

I Introduction

The U.S. Securities and Exchange Commission has become increasingly interested in

improving financial reporting. One such improvement that is under consideration involves

greater use of technology to allow investors access to more data on a timelier basis1. The interest

in interactive data (SEC, 2007) was followed by a committee report to the U. S. Securities and

Exchange Commission (SEC, 2008) on improving financial reporting and called for increased

use of corporate websites to provide information that is more current. The ultimate goal is to

provide better information in a more continuous reporting environment. This report also

includes a call for the increased use of XBRL-tagging (XBRL International) “… to facilitate the

ability of investors to more easily access comparative arrays of company information (SEC, 2008

pg 3). The Global Ledger (XBRL-GL, XBRL International) consists of tags for transaction level

information while the XBRL-FR (Financial Reporting) taxonomies consist of tags for traditional

financial statement elements. The tagging of information down to the transaction level

envisioned by the XBRL-GL (XBRL International ), would allow investors access to information

about corporate activities down to the most detailed level. While current discussions do not call

for information at this level of detail, groups have called for reporting of corporate information in

a form that takes advantage of emerging technologies (Litan & Wallison, 2000; Xiao, Jones, &

Lymer, 2002). With the ability of current technology to provide more detailed and current

information, it is incumbent on academicians and practitioners to anticipate issues that would

arise under these different reporting environments. Certainly an SEC mandate would provide a

1 Section 409 of the Sarbanes Oxley (SOX, 2002) legislation seems to mandate this and probably requires technological solutions to achieve this level of disclosure.

Comment [MV2]: See section #409 of sarbox

Comment [A3]: Either eliminate or provide a respectable quote

Deleted: XBRL International

Comment [A4]: You need a GL reference and maybe a sentence or two differentiating GL and FR

Deleted: GL(

Deleted: , would

Deleted: allow

Page 4: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

4

more immediate path toward providing financial information that is machine readable and is

more current.

The ability to provide a larger set of information about a firm on a timelier basis is a

possibility. However, there are a number of issues that at a minimum need to be considered

before these mandates are put in place. It is the purpose of this paper to look at the implications

of this technology for investors, auditors, and managers. The next section will provide a more

detailed look at characteristics that would define continuous reporting environments.. The

ensuing section will discuss some implications for this type of reporting environment, including

information usage, audit concerns, and the disclosure of sensitive information. Section IV will

examine techniques that have been used to restrict access to sensitive detailed information from

the US Census Bureau data. While the methods used by the Census Bureau do provide an

approach to protect certain types of sensitive information there are certain differences that make

their approaches incompatible with requirements to provide relevant information to investors.

Therefore the following section will look at alternative approaches to protecting sensitive data.

At the conclusion the paper will look at the implications of these methods and suggest some

areas for further investigation.

II The Continuous Reporting / Assurance Environment

Characteristics

If corporate websites become an important or the predominant vehicle to disclose

information about the firm there are a number of variables that could be used to define important

characteristics of the environment. These include the type of information disclosed, the level of

detail, the time-lag, and the method of obtaining the information. This section will discuss each

Formatted: Heading 3

Formatted: Font: (Default) TimesNew Roman

Deleted: more timely

Deleted: , h

Deleted: both for auditing and for information security

Deleted: possible

Deleted: continuous reporting environments

Deleted: and the role of corporate websites in providing information in these environments

Deleted: amount of detailed information

Deleted: II

Deleted: problems that could result from allowing investors to access more detailed information and

Deleted: There are certain problems with the approaches used to protect Census data including distortion of the information so the paper will then look at alternative approaches to protect sensitive information.

Deleted: this

Comment [MV5]: Give this another read.. after this after that is not good english

Deleted: that need

Deleted: .

Deleted: A Query-Based

Deleted: reporting

Page 5: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

5

of these variables and their impact on the way investors will obtain information and the level of

assurance that can be afforded to the information provided.

Expanded Information Set

Under the current reporting environment information to be disclosed is clearly defined as

is its location within financial statements. Current reports have a set of information organized

around categories that have accepted structures and definitions. For instance, information about

sales has a clear definition under GAAP. There are also locations for other types of information

and there are rules for making this information available. The report to the SEC (2008) discusses

other non-GAAP information and indicates that this will be an issue that needs to be considered.

Clearly there is a great deal of information about firms that investors obtain from other sources

and placing this on a corporate website would offer a different level of credibility and might

imply some assurance about that information.

Expanded Level of Detail

Related to the type of information is the level of detail provided. Even if the categories

of disclosure where to remain the same; Sales, Accounts Receivable, Cost of Goods Sold, etc.

the level of detail could be expanded. A sales figure could be broken down by individual

customers, dates, or even invoices. The XBRL-GL tagging taxonomy presumably would allow

for this level of detail to be constructed and would also allow for aggregation of the detailed

information around different investor defined categories. This would put investors in a different

position as they could obtain not only financial statement information, but also a complete

breakdown of the individual items that make up the account. If this increased level of detail

were available for an expanded set of information then investors could essentially have data

about the operations of the firm. For instance, under current reports investors do not have a great

Formatted: Font: 12 pt

Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Formatted: Not Highlight

Formatted: Not Highlight

Formatted: Font: 12 pt

Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Comment [MV6]: Don’t u want a subheader for each of these?

Deleted:

Comment [MV7]: awkward

Deleted: along with

Deleted: place

Deleted: hiin

Deleted: the

Deleted: Another issue

Deleted: related to the type of information

Deleted: concerns

Deleted: For instance, a

Deleted: to

Page 6: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

6

deal of information about raw material purchases. If the type of information is expanded and

provided in more detail it could be possible for investors to recreate the purchase activity for

each raw material. This certainly describes a reporting environment that is not currently

available, but could be supported with current technologies.

Time Lag

Another variable that describes what might be considered a primary characteristic of any

continuous reporting environment is the time-lag between events and the ability to obtain

information about the events. Under the current reporting environment financial information is

reported at discrete points in time and contains aggregations of events that occurred within the

period covered by the report. Thus quarterly financial reports contain a sales figure that includes

events that occurred over the previous three months. Even if the quarterly report were available

on the last day of the quarter it contains some events that were concluded ninety days prior to the

release. Under a continuous reporting model the time-lag between events and the reporting or

availability to obtain information about the event would be reduced; in the limit, to zero. If

there is zero time between an event and its availability and is considered in conjunction with an

increased level of detail then there would be essentially a reporting environment in which

investors could obtain information about individual sales or purchases as they occur. This truly

continuous reporting environment has some important implications for investors in terms of

information they obtain.

Query Language

Such a truly continuous reporting environment would place investors in a very different

position as they search for information to make their decisions. Under the current reporting

environment investors are provided with what has been considered a complete set of financial

Formatted: Font: 12 pt

Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Formatted: Font: 12 pt

Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Deleted: along

Deleted: with the level of

Deleted: For instance,

Deleted: end of the quarter

Deleted: this

Deleted:

Deleted: to obtain information about the event

Page 7: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

7

information. Investors are free to search for additional information, but the standard financial

reports have a familiar set of information and format. If firms disclose information that has a

zero time-lag between occurrence and availability then it would no longer make sense to obtain

traditional financial statements even if they reflect moment by moment changes. While this

traditional financial report would probably be available the availability of a more detailed and

expanded information set would require investors to formulate requests for information based on

their perceptions of relevancy. This change from a standard report to a user developed report

would require corporate websites to provide a query language. There are two distinct but related

constructs of query languages that will impact the investors’ ability to request information.

First is the structure of the query language. The structure allows users to formulate their

questions in a format that is closer to natural language. For example, “What were sales for the

company” as opposed to formulations that are closer to a language like SQL2. The second

construct for a query language concerns the capabilities it has to manipulate data. Statistical

capabilities would be included in this construct. For example, “What are average sales for

company X” requires the query system to do a statistical calculation. A second type of capability

the query system would need are time constructs. Investors must be able to indicate a time frame

and so references such as “last quarter” or even “yesterday” might be allowed3. These constructs

of a query language, structure and capabilities, are primary determinants of the investors’ ability

to obtain information in a continuous reporting environment.

2 SQL (Structured Query Language) is a standard format to obtain information from databases. QBE (Query By Example) requests are translated to SQL statements. Even though QBE is a fairly restricted language there are instances when the translation to SQL becomes a problem.

3 Time parameters could come from a predetermined list, but then some of the “continuous” nature of events would be lost.

Deleted: These characteristics

Deleted: described in the previous paragraphs

Deleted: would enhance

Deleted: place

Deleted: investor information acquition capabilities

Deleted: s in a very different position in terms of obtaining information

Deleted: .

Deleted: to investors requesting this information

Deleted: ¶Query Language

Deleted: T

Deleted: the way they naturally ask questions

Page 8: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

8

This section has looked at four variables, expanded data set, more detail transaction level

information, time lag between event and availability, and query language constructs, that would

define any new reporting environment. The lag between the event and its availability is a central

characteristic of any “continuous” reporting, but types of information and the method used to

obtain this information also are important considerations in describing features of this

environment. The next section will look at some of the audit and usage issues implied by

changes in each of these characteristics.

III Issues for a Continuous Reporting Environment

Information Use

If corporate websites become the predominant source of financial information, then the

information becomes critical. Investors would be substituting traditional reports for web-based

ones. Even if the financial report is unchanged, then a delay in disclosure would be critical as

even alternative sources would presumably obtain their information from disclosures contained

on the website. Under the current reporting model even quarterly financial report are anticipated

and confirm investor expectations. The continuous reporting environment would not allow for

this anticipation as information is released ever closer to the occurrence. These changes in the

way information is disclosed means that concepts such as “insider trading”, “materiality”, and

“adequate disclosure” would need some changes in their definition.

Adequate Disclosure and Insider Trading

While an occasional problem with a firm’s website is currently an annoyance and may

result in lost sales or dissatisfaction by trading partners, under a situation where investors’ goals

are to obtain data as a timely part an investment decisions, any delay in the ability to access a site

Formatted: Font: (Default) TimesNew Roman, Not Italic

Formatted: Heading 2, Left, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Formatted: Heading 3, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Formatted: Font: (Default) TimesNew Roman

Formatted: Font: (Default) TimesNew Roman

Formatted: Heading 3, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single

Formatted: Space Before: 0 pt,After: 10 pt, Line spacing: Multiple1.15 li

Comment [MV8]: where they four?

Deleted: ¶

Deleted: that

Comment [MV9]: I feel there is a quantum improvement here… would you mind going over it and eliminating unnecessary words and try to make it a bit more to the point? Note that I created third level subheaders here…

Deleted: Use

Deleted: question of how

Deleted: is used

Deleted: very

Deleted: the current set of information available to investors is not changed from the traditional financial report

Deleted: information contained in the

Deleted: is

Deleted: s

Deleted: the

Page 9: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

9

changes the disclosure of the information. As the time between the underlying business events

and the availability of information about these events shrinks, a person with the ability to obtain

the data would be in a very powerful position compared to someone who cannot, and an auditor

faces a different set of problems. If there is a delay in reporting financial information because

the website is inaccessible due to an intentional act, such as a denial of service attack, or

accidental as with a technical problem, there is a problem of liability (Richardson & Scholz,

2000). Some questions that need thoughtful answers include:

• Has the firm taken adequate or reasonable steps to address technical problems that

might hinder access to the website?

• If a person within the firm trades on information disclosed on the website and

that information subsequently becomes unavailable, then is the person guilty of

insider trading?

• When has the requirement for adequate website disclosure been met? After ten

minutes without interruption? After one day?

• Must the disclosure be uninterrupted for the entire time period?

Cleary these questions need to be answered to provide auditors with guidance. At the minimum

auditors will need an understanding of what methods can be used to make a website inaccessible

and then decide whether the firm has taken reasonable steps to protect against these methods.

Even if the corporate website never has problems and the information is always available then

the query system that responds to investor requests presents some audit issues.

Materiality Issues

Formatted: Char Char6, Font:(Default) Times New Roman

Formatted: Normal

Deleted: certainly

Deleted: What would be considered adequate or reasonable protection against such problems that hinder disclosure

Deleted: Can a person inside the company trade on information disclosed on the corporate site that subsequently became unavailable

Comment [A10]: Good questions.. the speller however does not like the passive voice

Comment [MV11]: Great question,,, can u reword to be mor e understandable?

Deleted: O

Comment [MV12]: Horrible wording.. can u clean this up?

Deleted: While one might think of other issues that need to be resolved answers to these questions need to be answered for auditors to make a review

Comment [MV13]: ??????

Deleted: need to understand the possible ways to make

Deleted: appropriate or

Deleted: Audit Issues

Page 10: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

10

Under the current reporting environment firms have a clear understanding of their

requirement to produce accurate information that is free of material errors. Investors looking at

this information can pick which information to use, but the continuous reporting environment

presents them with more options and opportunities. When the investor is confronted with a

query system that allows/requires information requests from a corporate website, the ability to

formulate questions or queries become an important issue.

Review of Queries Previously the constructs necessary for a query language were identified. Each of these,

language and capabilities, would provide an investor with the tools to request information from

the firm’s website. However, the auditor is faced with issues as the query system is now

providing financial reports. The primary problem arises because a translation between natural

language and the actual request is necessary. This translation is not a trivial process, and

becomes more complicated as the query system allows requests that are closer to English

statements. In a search engine5 an inappropriate translation of a request may only lead to

response that contains more or fewer sites than would have been contained had the query been

perfectly translated. On the other hand, an incorrect response to a query of a firm’s continuously

reported financial data may lead to a data set that is materially different from what the investor

wanted. With a move to corporate websites as the primary means of disclosure there is the

potential for a large number of queries and essentially a large number of individual reports.

5. Search engines return relevant websites based on an interpretation of the request combined with algorithms, sometimes protected, that rank sites matching the request.

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single, Tabs: 261 pt, Left

Deleted: them to

Deleted: information

Deleted: their

Deleted: Queries and their audit

Comment [A14]: As I told u I asked an associate editor to go over this and he was positive on the paper but complained bitterly about the lack of organization,,,, I have to agree the paper has very interesting issues and questions up to now but is very poorly organized and written… you must very early on the paper say what a continuous reporting query system would be and draw the questions subsequently… please use an outliner and get things organized

Deleted: The requests would be in a format of a specific query language. Query languages have a structure that is sufficient to make an unambiguous request for information from a computer-based set of information

Deleted: 4.

Deleted: Humans generally are much better asking questions in natural language and therefore

Deleted: query language

Comment [A15]: This is good but should have a subheading and example and discussions

Comment [MV16]: Don’t u think people know what a search engine is?

Deleted: and result in an inappropriate investment decision.

Deleted: Most familiar search engines include Google, Yahoo, and Ask

Deleted: Each of search engines employ methods, sometimes protected, to return websites that meet an interpretation of the user’s request. NOT SURE WHAT U ARE SAYING

Page 11: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

11

Under this continuous reporting environment it is problematic for auditors to form opinions

about each of these reports.

Materiality Determination The auditor that is required to make materiality judgments looks at amounts disclosed in

the context of the complete report. If investors are creating many individual reports different

levels materiality would be implied. The information available to the investor depends on the

specific characteristics of the continuous reporting environment; expansion of information to

include non-GAAP items, the level of detail available, and time-lag. As these characteristics are

extended, i.e. more non-GAAP items, more detail, and shorter time-lag the investor is faced with

expanded options in terms of creating an individual report. The response to a query can be

materially inaccurate in a number of ways:

• The translation of the request from the investor into a specific query could be inaccurate.

This could be because of a problem with an inappropriate translation of terms. For

instance, an investor might ask for income when gross income was intended.

• The aggregation requested or time period might be misinterpreted; sales for the quarter

instead of sales year to date or yesterday when the request is made from a different time

zone.

• With the time-lag close to zero certain events are excluded from the response.

In each of these situations it is possible to conclude there is a material error in terms of a

match between the intent of the query and the results provided. This is clearly a different

definition of materiality. With the potential for a large number of individual queries the firm

loses the ability to review all of the information released and an auditor loses the ability to

Formatted: Font: 12 pt

Formatted: Font: 12 pt, Not Bold

Formatted: Heading 4, Line spacing: single

Formatted: Bulleted + Level: 1 +Aligned at: 24 pt + Tab after: 42 pt+ Indent at: 42 pt

Formatted: Indent: Left: 6 pt

Deleted: f the auditor is

Deleted: to

Deleted: make a conclusion

Deleted: opine

Deleted: there, the audit of the responses is problematic.

Deleted: conjunction with

Deleted: can create

Comment [MV17]: Wonder if the issue is auditing the result of a query or the accuracy of the database… however this whole issue is very interesting

Deleted: for example

Deleted: .

Deleted: First,

Deleted: the

Deleted: a

Deleted: A

Deleted:

Deleted: second

Deleted: problem could occur if an

Deleted: is

Deleted: as the

Deleted: Finally, and this would occur

Deleted: in the situation in which the

Deleted: between event and availability is supposed to be zero,

Deleted: not contained in

Comment [MV18]: If a user mis-reads the content of a traditional report .. is that a misstatement by the company and a faulty audit? You have to be careful about your assrtions about audit… now that the paper is cleaner I see more problems that need to be cleaned..

Page 12: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

12

review the appropriateness of the response to each query. This presents a problem for the

auditor to decide whether each query fairly represents the request of the user and whether the

response fairly represents the financial position of the firm in terms of the subset of data

provided.

Query Translation versus Database Materiality

With a large number of continuously generated reports an auditor would be faced with

determining materiality not of a set of numbers reported at a specific point in time, but a series

of numbers reported at different points in time to different investors. The traditional approach

of looking at the firm’s database at a point in time is no longer available. In the zero time -lag

continuous reporting model the corporate database is only current at the point of the request, not

at the end of the quarter or fiscal year. The concept of overall materiality is lost. If the match

between the users request and the information provided is not perfect, the auditor must first

decide if this difference is material; again in the context of the individual report. The auditor

must determine whether the problem resides in the way the user formulated their request, the

translation of the request into the formal query language, or the subset of information provided.

Significant differences could occur at each point and an auditor in a continuous reporting

environment could be faced with a materiality determination at any of them. Auditors would

now be faced with a requirement to review the specific query language used in conjunction with

the website; a very different level of responsibility. Firms would also be placed in a different

relationship to investors. Under the current reporting environment financial information is

disclosed after some level of review and interpretation by responsible individuals within the

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single

Formatted: Font: 12 pt, Not Italic

Formatted: Line spacing: Multiple1.15 li

Formatted: Indent: Left: 6 pt, Don'tadjust space between Latin and Asiantext

Comment [MV19]: Again audit of the database or of the query

Deleted: of

Deleted: ing

Deleted: An

Deleted: is

Comment [A20]: Many firms allocate materiality today… again this is discussed in the Galileo stuff… but this is a very good discussion if u subtitle and lead to it.. a this point is a set of incomplete points… select what you want to do and do it a bit more thoroughly

Deleted: in this setting

Comment [MV21]: Again not sure if the audit should be of the query results of the data.. however the issue of misleading query results is very important

Comment [A22]: This is very tricky we have not clarified the materiality model in our traditional environment you are entering here a bag of worms

Deleted: Material

Page 13: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

13

firm. This continuous reporting of financial information also changes the responsibility of the

firm when it is not possible to perform a traditional review6.

With the most radical change from the current reporting environment, and one that is

proposed by the Galileo Disclosure Model (Vasarhelyi and Alles), the amount of information

disclosed becomes critical. This disclosure model includes access to more data, at a greater level

of detail, and with zero time lags and requires a continuous determination of the completeness of

the information provided. However, this environment presents another important issue; how

much information should be disclosed. The next section discusses this new problem of keeping

too much information from being disclosed.

Amount of Information Disclosed

The XBRL-GL taxonomies look to tag corporate information down to transaction level;

this would allow a reporting model in which detailed data is available. When transaction level

detail is combined with a zero time-lag a truly continuous reporting environment is achieved.

Inventors and analyst probably could make a case that investment decisions would improve if

they had more detailed information available, however there needs to be some review of the level

of detail that should be available. If the available data expands to include all of the activities of

the firm, investors that want to delve into the intricacies of the operations of the firm would

benefit. The technology to move to such a situation is certainly within reach, but there needs to

be a critical review of the amount of the information that will be provided and the query system

that will provide the interface between investors and the underlying corporate data.

6 This problem was a major concern expressed by some corporate CFOs at the First Continuous Reporting Symposium held at Rutgers September 28th, 2007

Formatted: Font: (Default) TimesNew Roman

Deleted: T

Comment [MV23]: You may want ot refer to our Galielo work http:/raw.rutgers.edu/Galileo/

Comment [MV24]: Would non financial be better ??? hmmm not sure.. non gAAP is unothodox and in the age of IFARS this seems weird

Deleted: , a great deal of non-GAAP information, complete access to underlying detail, and zero time-lag between events and availability, an additional issues of

Deleted: As pointed out previously under a zero time-lag the auditor would be faced wi

Deleted: th

Deleted: even more

Deleted: of

Deleted: will look at problems that would arise should this more complete continuous environment become the norm.

Deleted: level,

Deleted: this

Deleted: of

Deleted: system disclosed from a corporate website the result is

Deleted: ly

Deleted: results

Deleted: there would certainly be a benefit to

Deleted: nature

Deleted: query systems

Page 14: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

14

Disclosure of Sensitive Information If the reporting model changes to include more of the underlying detail, then a query

system will have access to all of the firm’s data. Rather than simply responding to any request

for information, such a system must be able to balance the requests for information with the need

to protect sensitive corporate information. The definition of “sensitive information” may be

different depending on the group making the definition, but it is clear that knowledge about the

critical activities of the firm needs protection to ensure success of the firm (Bray, Chellappa,

Konsynski, & Thomas, 2007).

Definition of Sensitive Information Once a precise definition of sensitive information has been made a related precise

specification of when the sensitive information is released must also be made. This requires a

complete understanding of what information is actually needed to know the sensitive

information(Nelson, 1994) . For instance, it may be critical to protect the exact sales for each of

the company’s locations (sensitive information). However, the level of sales for the company is

not sensitive and should be available to investors. The problem is that in responding to queries

about sales grouped by states or regions (or other types of groupings), if enough queries are

asked then the value of sales for a particular store can be determined (Jonge, 1983). Even

responding inappropriately can provide unintended information (Sicherman, Jonge, & Van De

Riet, 1983). A response of, “That information cannot be provided,” to a query asking whether a

certain project is top secret will have the unintended consequence of providing the requested

information. This desire to protect “sensitive” information would also expand audit issues as an

auditor would need to be involved in determining what information should be termed “sensitive”,

the methods used to determine when this information has been disclosed, and the appropriateness

of the techniques used to withhold this information.

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single

Deleted: Restriction of Access due to detail available

Deleted: also

Deleted: actually

Deleted: For instance, a

Page 15: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

15

Similar problems of restricting availability of publicly available information have been

faced in other disciplines and therefore previous work may provide insights to attack this

problem faced in a continuous reporting environment. The next section will look at query

systems for statistical information systems such as the U.S Census and hospital systems and

procedures used to keep this information secure when users are allowed to formulate their own

information requests.

IV Learning from Queries on Statistical Data

The U.S Census Bureau and hospitals in the United States collect a great deal of

information that is statistical in nature. The requirement for both datasets is similar; the

information is generally available, but there is also a pledge of confidentiality (Office of Civil

Rights, 2007; U.S. Code Title 13 Chapter 1;Department of Health and Human Services, 2007).

Each of these codes defines confidentiality as non-disclosure of individual responses. For census

data, this would mean that an individual response would not be derivable with a specified degree

of certainty from the presented data. For hospitals, the HIPAA requires that individual

information not be disclosed to unauthorized individuals. Each of these requirements has

parallels with a corporation that does not want to disclose certain information.

Census and Hospital Data For Census data reports, the requirement is that there should not be any application of

procedures that can recover individual responses. Thus, for Census data individual responses are

considered sensitive. This means that individual responses should not be available to any query

or combination of queries. Certain identifying elements are removed from the data even before it

is made accessible to the query system, but additional methods are used to protect individual

Formatted: Font: (Default) TimesNew Roman, Not Italic

Formatted: Font: (Default) TimesNew Roman, Not Italic

Formatted: Left

Formatted: Normal, Left

Formatted: Font: 13 pt

Deleted: For the purpose of understanding publicly available information and restrictions of content provisioning we will draw to research / experiences from other areas.

Deleted: II

Comment [A25]: I would brake this down in 3 sub-secions census.. hospitals and applications to continuous reporting Only here I understand why u are going to talk about quries to statistical data.. u need to say in the intro that u are going to extrapolate on these two domains to better understand the CR environment

Comment [A26]: Census and HIPAA are very different ligislations ..be careful with these assertions but u can state what they are similar and what they are different

Deleted: While these requirements are similar, there is a slight difference. These may be analogous to a business organization that may not be willing or allowed to provide data on transactions from a particular business partner

Deleted: re

Deleted: the underlying data

Deleted: are

Page 16: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

16

responses. In contrast, hospitals must be able to provide individual information to authorized

people, but the reporting of health statistics to generally interested parties should protect

individual responses. In each of these databases individual responses are defined as sensitive and

therefore protected.

While these requirements are similar, the fundamental difference is that a user in a

hospital with a specific role can create queries for individual health information which is

considered sensitive, but the census data reporting system (American FactFinder, Hawala,

Zayatz, & Rowland, 2004) must not permit this for any user. Another difference is that health

statistics are reported much like traditional financial reports while Census data is accessed using

a query system that creates individual reports. Thus hospitals protect sensitive information using

the approach currently employed in reporting financial information, producing predefined

reports, and so their methods for protecting sensitive data are not applicable to a continuous

reporting environment.

Disclosure of Census Data The disclosure of Census data is generally provided in a table with either magnitude or

count data displayed in each cell. For instance, a census report could disclose average salaries

for each county in a state. If there is a possibility that an individual response could be determined

then procedures are applied to prevent the disclosure.

Because Census data can be accessed by large number of users the U.S. Census Bureau

has developed a number of techniques to protect sensitive data. Within American FactFinder

there are a number of statistical disclosure controls (SDC) available to maintain the

confidentiality of members in any cell. Generally, these SDC fall into forward or backward

process (Massell, 2003). Forward process controls include such methods as cell perturbation and

Formatted: Font: (Default) TimesNew Roman

Formatted: Heading 3, Line spacing: single

Comment [A27]: Here u show substantive difference so your early statement of similarity does not carry very well

Deleted: ¶

Deleted: similarity between Census and hospital

Deleted: s is that the reports

Deleted: are

Deleted: , while a report from a hospital might disclose patient numbers per unit or by diagnosis. I

Comment [MV28]: good

Deleted: there are some

Comment [MV29]: not sure what u are saying qw there is no continuousupdate of census data.. it is just a huge archoval file

Deleted: almost continually

Page 17: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

17

suppression. Perturbation involves the altering of the total value in a cell(s). Protection flow

involves methods of perturbation so that the that totals (columns or rows) are preserved (Massel,

2004 & 2005).

Statistical Disclosure Controls Insert Figure 1

In Figure 1 a constant x might be added to cell A and subtracted from cell C. This

pertubation would maintain the general information about the row total, but would alter the

column totals. To correct for this pertubation and maintain the column totals the constant x

could be subtracted from cell M and added to cell O. Other pertubation methods have been

proposed that allow for analysis that maintains statistical relationships (Dinur & Nissim, 2003).

The suppression of cells is generally done to to protect the confidentiality when a small number

of respondents fit into a specific category (Hawala, 2003). For instance, if a request was made

for salaries of doctors with a particular speciality in counties of a state and the cell size for a

particular county was small then that cell would be suppressed and the table pertubed so the

value could not be calculated. Each of these SDC allow for different level of protection in terms

of the degree of uncertainity required for the resulting query. The level of uncertainity can be

altered for different types of data. For instance, if there are a small number of respondents in a

particular category (microdata protection), then there might be more uncertainty required for

income as opposed to real estate value. Backward processing is generally done prior to release

of the table data to ensure that the forward processes that were used achieved the desired level of

protection.

Inferences from Disclosed Data

Formatted: Font: 12 pt

Formatted: Heading 5, Line spacing: single

Formatted: Font: 12 pt

Deleted: f

Deleted: . If a

Deleted:

Deleted: only had a

Deleted: number of doctors with that speciality

Comment [A30]: This is very interesting

Comment [MV31]: So I guess this answers my question… the data is static but que queries have an algorithm to protect confidentiality.. is this s census service or a third party

Page 18: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

18

Data in the transaction level zero time-lag continuous reporting environments has some parallels with the census data. If the current reporting categories are used, then protection levels may not be as important, however as more detailed information becomes available there is a question of how much detail should be disclosed. Some issues such as legal requirements (Weitzenboeck, 2001), or competitive reason would assist in determining the information that should not be disclosed. While the direct disclosure of the sensitive information could be blocked the level of detail could allow inferences to be made even if the data is not provided in tables with row and column totals. Two approaches to making inferences from data are time and set intersection methods.

Time Based Inferences An organization’s database changes over time as a result of both internal and external

events. Someone familiar with the timing of these events could infer certain information without

actually creating a query for that data. For instance, a person that is aware of an anticipated

hiring event could query the total salaries for a department before and after the event. By

comparing the two amounts the person could infer the salary of the new hire. This inference

would be possible even if the number or employees in the department were large enough so that

cell suppression was not required. This inference is even possible without access to transaction

level detail. This same approach could be used to infer purchases of raw materials, again

without access to transaction level detail. As more detail is available it would be possible to

infer not just raw material purchases, but purchases of specific items from specific vendors.

Intersecting Subset Inferences

Another approach to obtaining a salary of an individual in a department would be to ask

for the salary information of all employees, then of all employees except the managers, and so

on. Basically one might think of this as constructing a series of sets until the intersection

becomes a single member. This ability to string together an unlimited number of queries has

been shown to allow for access to all underlying detail (Jonge, 1983). In a corporate setting this

Formatted: Font: 12 pt

Formatted: Line spacing: Multiple1.15 li

Formatted: Font: Not Bold, NotItalic

Formatted: Heading 5, Line spacing: single

Formatted: Font: 12 pt

Formatted: Font: 12 pt

Formatted: Heading 5, Line spacing: single

Formatted: Line spacing: Multiple1.15 li

Comment [A32]: These are the words of cox but he only mean XBRL… I think your definitional section could be a great contribution.. I can send u my presentation is XBRL interactive data that I did in phili’s xbrl meeting tht they hated..

Deleted: ¶

Deleted: and hospital

Deleted: ¶

Comment [MV33]: Forinstance and censuing for instance.. can we shorten paragraphs and make them self contained?

Deleted: For instance, a query about total payroll by division or department would result in a table similar one of the census data tables. To restrict discovery of data about individual salaries application of the same cell restriction methods could conceal salaries in departments with a small number of employees, but because of the different nature of corporate data, there are situations in which a user could still obtain information about individual data. For instance, if a user knew that a person was to be hired into a department, then running the query about departmental payroll before and after the hiring date would allow someone to infer the salary of the new hire even in a department where the number of employees made cell perturbation unnecessary. This is a problem with detailed zero time-lag data that someone with knowledge of the activities of the firm can make inferences about sensitive data without ever

Comment [A34]: Again this whole thing is very interesting but you ramble on with no organization.. break it in sub

Deleted: and

Deleted: Basically one might think of this as constructing a series of sets until the intersection becomes a single

... [1]

... [2]

... [3]

Page 19: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

19

process is assisted by knowledge such as the minimum salary for a manager, or that salaries

cannot be negative, or the average number of employees in departments7.

In each of the examples, time based and intersecting subsets, sensitive information could

be obtained even though the specific sensitive information was never requested. Users can keep

track of the information provided and then use this information to infer sensitive information.

Therefore, restrictions must be placed on users to keep them from obtaining enough information

to infer undisclosed sensitive information.

Restrictions to prevent inferences

The need to restrict queries to eliminate the ability to make inferences was recognized

very early (Denning D. E., 1978; Denning and Denning, 1979; Schwartz, Denning and Denning

1979). The initial solution to maintain security with these types of queries, again depending on

the level of security required, was not to respond to queries where the set size fell outside the

range [k, n-k] for k ≥ 0 and n being the size of the database. The determination of k sets the

count for legitimate queries and presumably eliminates responses to queries such as the Census

data salary queries mentioned above8. While this approach can prevent disclosure of certain

information, the ability to string together queries combined with some knowledge about the

semantics of the corporate data can allow inferences about data that was supposed to be secure

(Dragovic & Crowcroft, 2004). In the case of small cell sizes the values can either be hidden or

perturbed. However, there is another problem with the perturbation techniques used with census

data.

7 Knowledge of other characteristics such as cardinalities (Zhang , Zhao & Chen, 2004) and related data (Yazdanian & Cuppens, 2003) has been shown to allow inferences in which sensitive information is thought to be protected.

8 This is essentially the approach used to determine when a cell should not be disclosed for a Census data report.

Formatted: Font: (Default) TimesNew Roman

Formatted: Heading 3, Line spacing: single

Formatted: Space Before: 12 pt,After: 12 pt, Don't adjust spacebetween Latin and Asian text

Formatted: Do not check spelling orgrammar

Deleted: In the salary example this could be

Deleted: These two examples point out the problem of being able to obtain sensitive information

Deleted: The ability to combine information from a string of queries or use information know about the operations of the firm allows a user to circumvent any simple procedures to keep information secure.

Deleted: ¶ ¶

Deleted: )

Deleted: to the need

Page 20: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

20

There is a presumption that information beneficial to investors should be disclosed.

There could be a distinction between information investors want and the set of information that

is required to make decisions. This discussion needs to be made in conjunction with

management’s specification of sensitive data. Current rules concerning disclosure preclude

altering data, particularly in a way that would make it materially different from a true value. In

fact if data were to be altered in a way unknown to investors then continuously reported

information would essentially be useless. This is an important distinction between mandates to

keep information secure versus any reporting mandate to provide useful information. It means

that methods used to secure Census data may not be appropriate under a detail level zero time-

lag continuous reporting environment that uses corporate websites. The next section will look

alternative approaches that could prevent inferences from disclosed data and overcome the

problems of the perturbation methods of SDC.

V Query Inferences and Continuous Reporting The statistical databases discussed in the previous section have certain functional

relationships contained within the data itself. For instance, a table such as the one with total

salaries by department and division would have totals that are functionally dependent on values

from other cells. The cell perturbation methods discussed previously can prevent someone

querying the database from using knowledge of the functional relationships to infer the value of

certain cells. While this does prevent sensitive data from disclosure, it can also prevent certain

types of analysis that might be of interest to potential users of the system. For instance,

perturbing sales information might also prevent an analysis of average sales per store.

Any future mandate from the SEC for continuous reporting would probably look toward

providing more data about the operations of the firm as this would be of interest to investment

Formatted: Space Before: 12 pt,After: 12 pt

Formatted: Font: 14 pt

Deleted: Of course t

Comment [A35]: Yo9u probably need a conceptual jump here where u create some form of differentiation between needed / required data and illustrative date of sorts.. again breaking it down in discrete sub or sub sub sections would help

Deleted: Regardless of the specific designation of either required or sensitive the c

Deleted: the

Deleted: This

Deleted: at the issue of inferences and inductive learning and some possible approaches to keep functional relationships in this type of data secure.

Deleted: IV

Deleted: .

Comment [A36]: Again u need a conceptual jump as there are tradeoffs here

Page 21: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

21

decision makers. Dinur and Nissim (2003), Cox (2005), and Steel (2004) present methods to

preserve certain qualities of the data so that analysis can still be carried out. In order to allow the

analysis, certain knowledge of the types of requests needs to be determined. The structure of

financial reports does presume certain types of decisions and the usefulness of information for

those decisions. However, as more detailed information becomes available there would need to

be a much deeper understanding of the different inferences that can be made to make sure those

inferences are preserved while keeping sensitive information from being disclosed. This analysis

would also allow for the development of more directed methods to protect sensitive information

while still disclosing the information beneficial to investors. There are two distinct methods to

determine what the possible inferences from a set of queries. One is to maintain a query history

and the other is to formulate inference channels (Staddon, 2003; Woodruff & Staddon, 2004).

Inference Channels Staddon (2003) presents a method to develop inference channels and create encryption

keys for objects within the channel. An inference channel can be conceptualized as a path of

information that when traversed allows for a previously unknown piece of information to be

inferred. In this approach, users are provided with tokens to query the database. When a user

performs a query and obtains an object in the inference channel, a token is used. The set of

available tokens is reduced, and thus information they can obtain is also reduced. The inference

channel will not be compromised with appropriate allocation of tokens to users; as tokens will be

used up before all of the objects in the channel are known. For instance, if there are eight pieces

of information required to acquire a certain piece of sensitive information, then users will be

supplied with seven tokens. In the manager salary example this would mean that a token would

be used when total salaries were requested and then another when salary for a specific

Formatted: Font: 13 pt

Deleted: Cox (2005)

Deleted: s

Deleted: some

Deleted: of

Deleted: ing

Deleted: certain types of

Deleted: uses of the data

Deleted: This essentially requires an understanding what inferences can be made from a set of information and when sensitive information can be inferred.

Deleted: on an object in the

Deleted: and their

Deleted: ,

Page 22: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

22

department and so on; the tokens would be used before the intersection became a single

employee. This system can also be collusion resistant. As a sufficient number (sufficient can be

determined by the possible connection of users and the sensitivity of the underlying data) of

tokens have been used to query a specific object in the inference channel, the object is

considered to be in the public domain and the token for this object will be considered used by all

users. Thus, a user cannot preserve certain tokens, query other objects in the channel, and share

that information. An inference channel protection scheme must be able to keep users from

working in collusion by keeping a record of objects in the inference channel that have been

queried. An inference protection scheme is consider c-collusion resistant if c users working in

collusion are unable to query all the objects in an inference channel.

Information Required for Inference Channels To implement security based on the inference channels two different pieces of

information must be determined.. The first is the identification of protected or sensitive

information. Management must determine what knowledge needs protection and the level of

protection. Secondly, the objects in the inference channel need specification. This requires a

complete specification of the data, which once known will allow a user to infer the value of a

sensitive piece of information. This scheme also assumes that the underlying database is static.

If one of the objects in the channel changes, a reevaluation of the tokens already used is required.

For instance, if one of the objects in the inference channel was the commission rate paid to

salespeople and the rate changes then technically the person has not used the token for that object

and it should be “refunded”. The inability to refund tokens and the requirement to define the

inference channel or path are drawbacks of this approach. A final drawback of the inference

channel approach is the functional relationships within a firm’s data.

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single

Formatted: Font: Not Bold

Deleted: as

Deleted: . T

Deleted: For a

Deleted: the protection

Deleted: approach

Deleted: prior to the deployment of the system

Deleted: requirement

Deleted: This requires m

Deleted: to

Deleted:

Deleted: objects in the system,

Deleted: along

Deleted: with

Deleted: to acquire or to disclose a piece of information is

Deleted: certainly

Deleted: to

Comment [MV37]: This is extremely interesting but is a huge paragraph.. no one can follow it and keep sane.. cau further break it down with 5th level sus titles? Or at least highlight the issues

Page 23: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

23

Information in a corporate database is correlated in many ways. Changes to accounts

receivable is correlated to the level of sales which is correlated to the level of inventory and so

on. Thus there may be alternative paths to sensitive information. With the inference channel

approach, all of these alternative paths to the information also need protection. The method of

assigning tokens to a channel would have to consider these correlated channels to protect any

sensitive information. The level of correlation would need to be specified as for certain

inferences a level of certainty may be quite sufficient. For instance, it may be sufficient to know

planned purchases of raw materials within a few weeks. An alternative approach, the history

approach, combined with an inductive learning system may prove to be a better solution to

protecting sensitive information in a continuous reporting environment.

Inductive Learning and Query History Asking questions and then combining the information obtained is a very natural way in

which humans learn about the world. Inductive learning systems combine information to

formulate different theories about the underlying systems providing the answers. A corporate

information system that allows continuous reporting can be viewed similarly. The match is even

more evident when the activities of the firm are the source of information, and the intent of an

investor is to learn about the operations of the firm that are producing the data. The fact that the

activities can be described functionally adds further support to this view. Production functions

and purchasing decisions certainly fit into this model. An argument can be made that these

functional relationships might be considered sensitive. For instance, the disclosure of a

company’s production function would allow competitors to anticipate purchasing activities and

understand limits on prices. A continuous reporting system with access to individual

transactions could allow a user to learn the underlying functional relationships in a firm in much

Formatted: Font: (Default) TimesNew Roman, Bold

Formatted: Font: 13 pt

Formatted: Font: Not Bold

Deleted: often

Deleted: For instance,

Deleted: c

Deleted: T

Deleted: system

Comment [A38]: Is this a direct cite? Where di du get this whole thing? You have to find an easier way to explain and illustrate this for the JETA audience… do not copy from the query literature explain in brief terms and illustrate and refer to it exactly (author, date and page number)

Deleted: to review information provided overcomes the requirement to specify all possible ways to infer a piece of sensitive information and the change to information of inference channels.

Deleted: n

Deleted: , particularly

Deleted: you consider that

Deleted: link is even more realistic when the

Deleted: Certainly,

Deleted: an

Page 24: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

24

the same way that inductive learning systems can learn other types of functional forms.

Osherson et al. (1982) formally describe a learning system in which a passive learner

gathers observations from a natural system. Under this model of learning, the observer

reformulates their representation/hypothesis of the function that governs the behavior of the

system. While the observer can never prove that the function generating the data matches their

hypothesized function, as more observations confirm their hypothesis they become more certain

that their view is correct. All it takes to reject their hypothesis (or to reformulate it) is an

observation that does not fit into the hypothesis. The revision process (Sloan and Turán, 1999)

requires the learner to incorporate previous observations with each new observation. Stability

occurs when the learner does not revise their hypothesis because of a new observation(s) (Martin

& Osherson, 2003). In each case, the learner is attempting to discover underlying functional

relationships that are generating the data. For this type of system, the observer must decide

which data to observe and to include in their hypothesis (re)formulation.

Continuous Reporting and Inductive Learning

A person obtaining information from a continuous reporting system faces a problem

similar to an inductive learning system. The assumption of a passive learner is discarded in

favor of an active model in which the investor/learner is requesting specific data (observations)

to support their hypothesis or investment decision. In a continuous reporting environment the

investor/learner makes a series of requests and then incorporates the results. The history of these

continuous reports represents what the investor/learner can infer. While the inductive learning

from the query history approach does not require knowledge of the users’ intent, understanding

the objective can help in determining the uncertainty required (Blum, et. al., 2008). For instance,

someone interested in timing raw material purchases to take advantage of demand swings may

Formatted: Font: 12 pt

Formatted: Heading 4, Line spacing: single

Formatted: Normal, Space Before: 12 pt, After: 12 pt, Line spacing: Double, Don't adjust space betweenLatin and Asian text

Comment [A39]: If u are going to a different issue change sub heading now.. I do not think this is clear to the reader of the implications for CR

Deleted: While the inference channel protection approach is computationally easier these two issues, changes in the underlying data and correlated objects, makes an approach, which looks at what can be inferred from a history of queries an approach with more promise. An inductive learning approach to query history review may be a promising approach to restricting users from obtaining secure information through various inference methods. ¶

Deleted: Osherson, Stob, & Weinstein,

Deleted: the

Deleted: is

Deleted: similar problem

Deleted: First, the

Deleted: that will help

Deleted: users

Page 25: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

25

need a different level of certainty than someone interested in learning the quality level of a

production process. The investor/learner will generally have a very specific view of the types of

observations that will allow them to infer the underlying functional relationships and direct their

search in these areas. Thus, a query history must look at what can be inferred rather than what

tokens have been used. This eliminates the need for understanding all possible paths only what

can be inferred from previously obtained information. In order to restrict a response to a query

the inductive-learning approach looks at relationship of the information to sensitive data9.

Therefore an intelligent investor that finds a unique inference channel cannot circumvent the

protection management has placed on sensitive information. One deficiency of the inference

channel approach was the problem of “refunding” tokens when the underlying information

changed. This problem can be overcome by using strategies employed by dynamic search

engines.

Query Histories and Dynamic Data As a firm operates over time, data will change but there is a chance that certain functional

relationships will not. Total sales will change, but the average sales or the gross margin percent

might not. Thus inferences based on totals may no longer be valid while those based on averages

or percents could still be valid. To identify specific queries that are no longer valid specific

characteristics of the data already provided must be calculated. To make a query based

continuous reporting system useable these changes must be monitored and their impact on query

histories determined.

9 Keefe et. al. (1989) look at creating sensitivity levels within a database and reviewing query histories on those sensitive elements in the database.

Formatted: Font: 12 pt

Formatted: Heading 4

Formatted: Font: 12 pt

Formatted: Font: Not Bold

Deleted: s

Deleted: an

Deleted: of

Deleted: inference

Deleted: This means that

Deleted: of

Deleted: s’ determination of what is

Deleted: There are other approaches that can be used in conjunction with the history to prevent disclosure of sensitive information. These include the restricting or altering the capabilities of the query system and including direct inference mechanism in the query system itself.

Deleted: ¶

Page 26: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

26

Previous work on monitoring dynamic web pages (Pandey, Ramamritham, &

Chakrabarti, 2003; Garg, Ramamritham, & Chakrabarti, 2004) use certain probability functions

to determine whether a web page needs to be revisited. Their methods have been used to review

information on changes to a hurricane’s course and when weather websites need to be revisited.

Certainly, the stream of events that could change sales in a corporate database is more

deterministic than the stream of events that could change a hurricane. Other approaches such as

adding security constructs to cookie policies might also be able to determine when information is

no longer valid (Shankar and Karlof, 2006). Partitions to corporate databases along certain

dimensions could identify the series of events that change information about sales or other

sensitive objects with a high degree of certainty. Including mechanisms like these to delete

invalid queries in the history would allow a continuous reporting system to answer previously

blocked queries. Maintaining the query history can keep sensitive information from being

disclosed. However, there are ways to alter the structure of the query system itself to augment

the history approach and limit the types of inferences that can be made.

Altering the Capabilities of the Query System Previously the characteristics of the query system were presented. Altering these

characteristics can also restrict the ability of a learner/investor to discover structures of the

underlying system. Gasarch and Smith (1992) prove that query systems with certain capabilities

have the learning potential (ability to learn certain functions) that is equivalent to passive

induction inference machines. They show that by including (or removing) certain capabilities

the query system can improve (or reduce) the learning capability as compared to the capability of

a passive inductive learning machines. A simple example would be that by eliminating time

parameters from a query system would eliminate a rather large set of information that could be

Formatted: Font: 12 pt

Formatted: Font: 12 pt

Formatted: Heading 4

Formatted: Font: 12 pt

Formatted: Font: Not Bold

Page 27: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

27

obtained and therefore a large number of inferences that could be made. The removal of

statistical capabilities would eliminate another large set of queries and again reduce the

information that could be obtained. Another possible capability to remove would be the ability

to ask join queries; sales organized by state10. Gasarch and Smith (1992) have shown that

including recursive queries will expand the learning capability of the system. Standard deviation

of sales in a particular region for a particular time period is an example of such a query11. With

each restriction in the capabilities of the query system itself certain types of information cannot

be obtained and therefore inferences that depend on this information are eliminated. Thus

combining an inductive learning query history approach with a query language that has a reduced

set of capabilities can make it difficult12 to obtain sensitive information.

Implications of Query History and Inductive Learning for Continuous Reporting

Much of the work in understanding how inferences can be made looks at how to build

computational models of inductive learning. This work has focused on characteristics of

learning systems (Osherson, Stob and Weinstein 1982), what it means to learn (Sloan and Turán,

1999), and what it means to know (Martin and Osherson, 2003). This work has implications for

restricting access to sensitive information in a continuous reporting environment. First, the

investor has a goal of learning about the underlying systems in the corporation. Second, the

investor trys to obtain information that supports this goal. A continuous reporting system

becomes the vehicle by which the investor interacts with the corporate information system.

10 This is a join query because the information about the state would require joining the sale to the customer and then grouping the sales by the state the customer lives in.

11 This is recursive because the results of a query are used in a subsequent query.

12 It might be tempting to conclude that removal of capabilities would make certain inferences impossible. However, research has shown that methods thought to be completely secure in the end are only “partially” secure.

Formatted: Font: 13 pt

Formatted: Font: (Default) TimesNew Roman

Formatted: Font: 13 pt

Formatted: Font: (Default) TimesNew Roman

Formatted: Font: Not Bold

Formatted: Do not check spelling orgrammar

Deleted: Altering the Structure of the Query System¶

When an investor is requesting information corporate information from a database that contains transaction level data there are certain data elements and functional relationship that would be specified as sensitive. Maintaining a history of what has been disclosed can determine all inferences that can be made with the given set of data. However, altering the structure of the query system can also keep certain functional relationships from being divulged. Gasarch and Smith (1992) prove that query systems with certain capabilities have the learning potential (ability to learn certain functions) that is equivalent to passive induction inference machines. However, by including (or removing) certain capabilities the query system can improve (or reduce) the learning capability as compared to passive inductive learning machines. For instance, the ability to ask recursive queries expands the learning capability of the system. What is the standard deviation of sales in a particular region for a particular time period is an example of such a query. In addition restricting the types of functions available to the system can prevent certain relationships from disclosure. This approach would be appropriate if all functional relationships within the set of secure information were

Comment [A40]: Again this is very muddled and not clear what u are saying about CR….

Deleted: Query Histories and Dynamic Data¶

As the firm operates over time, there is a chance that not only will the data

Deleted: 13. A query history could identify the series of events that change information about sales with a high degree of certainty. By identifying, the

Deleted: systems

Deleted: (Osherson, Stob, & Weinstein, 1982)

Deleted: ) ,

Deleted: and

Deleted: &

... [4]

... [5]

... [6]

Page 28: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

28

Finally, the investor will ask for information until their view becomes stable; additional

information does not change their understanding of the information system. There are two

fundemental differences. The first is that the inductive learning model does not assume any prior

knowledge of the underlying system to be learned. It could be learning a language for the first

time. In a continuous reporting environment the assumption is that the user has some knowledge

of the company. Investors understand revenues and expenses. The other difference is the

inductive learning research is concerned with understanding the process, the goal of a continuous

reporting system is to stop learning before sensitive information is disclosed. Neither of these

differences would preclude the use of inductive learning to develop systems to protect sensitive

information; in fact this approach has been used in static databases (Blum, Ligett and Roth 2008

and Nabar, et al. 2006). Corporate databases do present a special challenge as the underlying

database does change over time. This constant change makes it difficult to protect sensitive

information and still provide new information under the inference channel approach. Another

problem with inference channels is that not only the best path to the sensitive information must

be understood, but also all possible paths. Query histories overcome these problems by looking

at what can be inferred from the investor’s set of queries.

V Summary and Conclusions Under the current reporting environment investors must combine information from many

different sources and make conjectures about their validity. Investors are continually interested

in obtaining more information that is timelier. A continuous or real time reporting environment

promises to address some of these issues. There are always technological advances that can

implement these visions. Any enhanced reporting environment can be characterized by an

increase in the amount of data, a higher level of detail, the time-lag between the event and it

Formatted: Do not check spelling orgrammar

Formatted: Do not check spelling orgrammar

Formatted: Font: (Default) TimesNew Roman, Bold

Formatted: Left

Formatted: Font: Not Bold

Deleted: This work looks a revision based on new knowledge. This new knowledge comes from observations.

Deleted: between these systems and a system that would allow users to request information about a corporation and yet still keep from disclosing sensitive data.

Deleted: For example, the user

Deleted: s

Deleted: looks at how this process takes place, the goal of this approach in continuous reporting environment is to

Deleted:

Deleted: Because the goal of the inductive learning system is to understand what can be learned from information provided it would seem to provide a better approach to protecting sensitive information in a continuous reporting environment.

Deleted: I

Deleted: s assume knowledge of the best way to combine information while query histories and inductive learning does not. This implies a model of security where managers define what knowledge should not be available as opposed to what data should not be disclosed.

Deleted: The SEC

Comment [MV41]: A page and a half paragraph.. ouch.. maybe here you want to in an organized way what you did and that will help you to fix the rest of the paper and ht eabstract.. the more I read it the more I like it but it is still a mess .. not a big huge mess as before…

Deleted: has indicated interest in using corporate websites as a vehicle to disclose corporate information. This combined with a vision from the XBRL community for a continuous reporting environment requires academicians and practitioner to consider the implications of such an environment. On primary issue concerns the characteristics of such

Deleted: the

Deleted: the

Deleted: and

... [7]

Page 29: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

29

availability, and finally the query system that creates the reports. The move toward real time

reporting will affect participants in the reporting process.

This continuous reporting environment places additional pressure on investors, auditors,

and managers. Investors, because in this environment they must create their own reports as

opposed to simply accepting traditional ones. Their information usage promises to change and it

becomes a new area of concern. Auditors, because they must be prepared to review both the

adequacy of measures used by firms to make sure information is continually available and the

materiality with respect to the system’s response. Finally, managers because they must be

prepared to deal with different types of disclosure of corporate information and will be required

to determine what information should be considered sensitive.

A continuous reporting environment would allow users to make inferences about the

operations of the firm in ways not envisioned in the current reporting environment. The potential

ability to ask questions of operational data would mean that sensitive information could be

inadvertently disclosed. Both the U.S Census Bureau and hospitals also face this challenge as

users query their underlying data. The inferences possible from these queries are anticipated and

restrictions are placed using statistical disclosure controls before the sensitive information is

disclosed. There are some issues that make these SDCs not well suited for continuously reported

corporate data and therefore other methodologies are explored. The use of inference channels

and inductive learning systems with query histories offer some advanced capabilities to protect

sensitive information. Certain characteristics of corporate information seem to make an

inductive learning approach better suited to protect sensitive information. The major issue for

practitioners and academics is to understand and make informed choices about these issues

Deleted:

Deleted: also

Deleted: some

Deleted: as they

Deleted: what has

Deleted: ly been provided

Deleted: The a

Deleted: must

Page 30: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

30

before the systems are used. Clearly, additional investigation is required to guide the adoption of

real time reporting.

Deleted: systems that provide the information as opposed to the information itself. Issues such as disclosure take on a different meaning when the system to provide this disclosure is a website that can become unavailable. In addition materiality takes on a different meaning when there a many different user created reports as opposed on one general report. If the continuous reporting environment evolves into one in which detailed information becomes available on a zero time-lag bases the investor community will have to resolve issues such as what information is required, what is desired, and what should declared off limits because it would compromise the future of the firm. Given that this decision has been made, and this is certainly not a trivial endeavor, systems must be put in place to protect sensitive data from disclosure while still allowing disclosure of other information. There are basically two methods to keeping information secure while still providing users with the capability to use a query system to request information. The first is the inference channel approach which assigns tokens to each step in the chain of information required to make an inference. This is computationally easier but suffers from problems including correlated information and refunding of tokens for information that has changed. The query history approach is more complicated, but would allow for revisions based on a learning model. The emergence of new technologies and calls to use this technology to support the users of financial information requires an investigation not only of all the implications of these changes, but also possible approaches to alleviate some of the problems that might result. This paper has explored some of the characteristics of a continuous reporting environment. As with any new technology there are benefits to be gained as well a challenges to be considered.

Page 31: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

31

Works Cited Blum, Avrim, Katrina Ligett, and Aaron Roth. "A Learning Theory Approach to Non-Interaction Database Privacy." Proceedings of the 40th annual ACM Symposium on Theory of Computing. Victoria, British Columbia, CA: ACM Press, 2008. 609-617.

Bray, David A, Ramnath K. Chellappa, Benn R. Konsynski, and Dominic M. Thomas. "Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality." Twenty-Eighth International Conference on Information Systems. Montreal, CA: Association for Information Systems, 2007. 1-10.

Cox, Lawrence H. "Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data." Prooceedings of Statistics Canada Symposium 2005: Methodological Challenges for Future Information Needs. Ottawa, CA: Statistics Canada, 2005.

De Jonge, Wiebren. "Compromising Statistical Database Responding to Queries about Means." ACM Transactions on Database Systems, 1983: 66-80.

Denning, Dorothy E. "Are Statistical Data bases Secure?" National Computer Conference. Washington, DC: ACM Press, 1978. 525-530.

Denning, Drorthy E., and Peter J. Denning. "Data Security." ACM Computing Surveys, 1979: 227-249.

Dinur, Irit, and Kobbi Nissim. "Revealing Information while Preserving Privacy." Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of Database Systems. San Diego, CA: ACM Press, 2003. 202-210.

Dragovic, Boris, and Jon Crowcroft. "Information Exposure Control through Data Manipulation for Ubiquitous computing." Proceedings of the 2004 Workshop on New Security Paradigms. Nova Scotia, CA: ACM Press, 2004. 57-64.

Garg, Shaveen, Krithi Ramamritham, and Soumen Chakrabarti. "Web-CAM: Monitoring the Dynamic Web to respond to Continual Queries." Proceedings of the 2004 ACM SIGMOD International Conference on Managementof Data. Paris: ACM Press, 2004. 927-928.

Gasarch, William I., and Carl H. Smith. "learning via Queries." Journal of the Association for Computing Machinery, 1992: 649-674.

Formatted: Font: Not Bold

Formatted: Check spelling andgrammar

Formatted: Font: Italic, Checkspelling and grammar

Formatted: Check spelling andgrammar

Formatted: Check spelling andgrammar

Formatted: Check spelling andgrammar

Formatted: Check spelling andgrammar

Deleted: Bray, D. A., Chellappa, R. K., Konsynski, B. R., & Thomas, D. M. (2007). Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality. Twenty-Eighth International Conference on Information Systems. Atlanta, GA: Association for Information Systems.¶Chin, F., & Ozsoyoglu, G. (1981). Auditing for secure statistical databases. Proceedings of the ACM '81 conference (pp. 53-59). New York: ACM Press.¶Cox, L. (2005). Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data. Symposium 2005 : Methodological Challenges for Future Information Needs. Ottawa, Canada: Statistics Canada.¶Crawford, R., Bishop, M., Bhumiratana, B., Clark, L., & Levitt, K. (2006). Sanitization models and their limitations. Proceedings of the 2006 workshop on New security paradigms (pp. 41-56). ACM Press.¶Denning, D. E. (1978). Are Statistical Data Bases Secure? National Computer Conference, (pp. 525-530). Washington, DC.¶___________ & Denning, P. J. (1979). Data Security. ACM Computing Surveys , 227-249.¶Department of Health and Human Services. (2007, September 13). Office for Civil Rights - HIPAA. Retrieved November 11, 2007, from United States Department of Health and Human Services: ¶Dragovic, B., & Crowcroft, J. (2004). Information exposure control through data manipulation for ubiquitous computing. Proceedings of the 2004 workshop on New security paradigms (pp. 57-64). Nova Scotia, Canada: ACM Press.¶Garg, S., Ramamritham, K., & Chakrabarti, S. (2004). Web-CAM: monitoring the dynamic Web to respond to continual queries. Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 927-928). Paris, France: ACM Press.¶... [8]

Page 32: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

32

Hawala, Sam. Microdate Disclosure Protection Research and Experineces at the US Census Bureau. Research Report, Washington, DC: United States Census Bureau, 2003.

Hawala, Sam, Laura Zayatz, and Sandra Rowland. American FactFinder: Disclosure Limitationfor the Advanced Query System. Research Report, Washington, DC: United States Census Bureau, 2004.

Litan, R.E., and P.J. Wallison. Corporate Disclosure int he Internet Age. Working Paper, AEI-Brookings Joint Center, 2000.

Martin, Eric, and Daniel Osherson. "Scientific Discovery from the Point of View of Acceptance." Inductive Logic. May 1, 2003. http://www.princeton.edu/~osherson/IL/essay1.pdf (accessed December 11, 2006).

Massel, Paul B. Comparing Statistical Disclosure Control Methods for Tables: Identifying Key Factors. Research Report, Washington, DC: United States Census Bureau, 2004.

_______. Comparing Ways of Using "Protection Flow" to Protect Magnitude Data Tables from Disclosures. Research Report, Washington, DC: United States Census Bureau, 2005.

_______. "Statistical Disclosure Control for Tables: Determining Which Method to use." Symposium 2003: Challenges in Survey Taking for the Next Decade. Ottawa, CA: Statistics Canada, 2003. 2-12.

Nabar, Shubha U., Bhaskara Marthi, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwan. "Towards Robustness in Query Auditing." Proceedings of the 32nd Internation Converence on Very Large Data Bases. Seoul, South Korea: ACM Press, 2006. 151-162.

Nelson, Ruth. "What is a Secret and What does that have to do with Computer Security?" Proceedings of the 1994 workshop on New Security Paradigms. Little Compton, RI: ACM Press, 1994. 74-79.

Office of Civil Rights. "Medical Privacy - National Standards to Protect the Privacy of Personal Health Information." United States Department of Health and Human Services. 2007. http://www.hhs.gov (accessed November 11, 2007).

Osherson, Daniel, Michael Stob, and Scott Weinstein. Systems that Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists. Cambridge, MA: The MIT Press, 1982.

Pandey, Sandeep, Krithi Ramamritham, and Soumen Chakrabarti. "Monitoring the Dynamic Web to respond to Continuous Queries." Proceedings of the 12th Internation Conference on the World Wide Web. Budapest, HU: ACM Press, 2003. 659-668.

Richardson, Vernon J., and Susan Scholz. "Coporate Reporting and the Internet: Bision, Reality, and Intervening Obstacles." Pacific Accounting Review, 1999/2000: 67-75.

Page 33: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

33

Schwartz, M.D., Dorothy Denning, and Peter Denning. "Linear Queries in Statistical Databases." ACM Transactions on Database Systems, 1979: 156-167.

Shankar, Umesh, and Chris Karlof. "Doppelganger: Better Browser Privacy with the Bother." Proceedings of the 13th ACM Conference on Computer and Communications Security. Alexandria, VA: ACM Press, 2006. 154-167.

Sicherman, George L., Wiebren De Jonge, and Reind P. Van de Riet. "Answering Queries without Revealing Secrets." ACM Transactions on Database Systems, 1983: 41-59.

Sloan, Robert H., and Győrgy Turán. "On Theory Revision with Queries." Proceedings of the 12th Annual Conference on Computation Learning Theory. Santa Cruz, CA: ACM Press, 1999. 41-52.

Staddon, Jessica. "Dynamic Inference Control." ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA: ACM Press, 2003. 94-100.

Steel, Philip M. A New Estimation for the Number of Unique Population Elements based on the Observed Sample. Research Report, Washington, DC: United States Census Bureau, 2004.

T.F., Keefe., M.B. Thuraisingham, and W.T. Tsai. "Secure Query-Processing Strategies." IEEE Computer, 1989: 63-70.

United States Department of Health and Human Services. "Health Insurance Portability and Accountability Act." Department of Health and Human Services. September 2007. http://www.hhs.gov/ocr/hipaa/ (accessed November 11, 2007).

United States General Laws. Sarbanes-Oxley Act (SOX). Public Law no. 107-204, Washington, DC: Government Printing Office, 2002.

United States General Laws. U.S. Code Title 13 Chapter 1. Washington, DC: United States Government.

United States Securities and Exchange Commission. Interactive Data: Putting Technology to work for the American Investor. 2007. http://www.sec.gov/spotlight/xbrl/interactivedata.htm (accessed August 20, 2007).

United States Securities and Exchange Commission. Progress Report of the Advisory Committee on Inprovements to Financial Reporting. Committee Report, Washington, DC: United States Securities and Exchange Commission, 2008.

Vasarhelyi, Miklos A., and Michael Alles. The Galileo Disclosure Model. http://raw.rutgers.edu/Galileo/ (accessed August 19, 2008).

Page 34: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

34

Weitzenboeck, Emily M. "Enterprise Security: Legal Challenges and Possible Solutions." Proceedings of the Tenth IEEE Internation Workshops on Enability Technologies: Infrastructure for Collaborative Enterprises. Los Alamitos, CA: IEEE Computer Science Press, 2001. 183-188.

Woodruff, David, and Jessica Staddon. "Private Inference Control." Proceedings of the 11th ACM Conference on Computer and Communications Security. Washington, DC: ACM Press, 2004. 188-197.

XBRL International. Global Ledger Taxonomy - An Introduction . http://www.xbrl.org/GLTaxonomy/ (accessed August 8, 2008).

—. XBRLTaxonomies. http://www.xbrl.org/Taxonomies/ (accessed August 18, 2007).

Xiao, Zezhong, Jones, Michael J., and Andrew Lymer. "Immediate Trends in Internet Reporting." European Accounting Reivew, 2002.

Yazdanian, Kioumars, and Frédéric Cuppens. "Neighborhood Data and Database Security." Proceedings f the 1992-1993 Workshop on New Security Paradigms. Little Compton, RI: ACM Press, 1993. 150-454.

Zhang, Nan, Wei Zhao, and Jianer Chen. "Cardinality-Based Inference Control in OLAP Systems: An Information Theoretic Approach." Proceedings of the 7th ACM Internation Workshop on Data Warehousing and OLAP. Washington, DC: ACM Press, 2004. 59-64.

Page 35: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

35

Deleted: ¶¶¶¶¶¶. ¶¶¶¶

Page 36: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

Page 18: [1] Deleted gfgal 8/26/2008 4:46:00 PM

For instance, a query about total payroll by division or department would result in a table

similar one of the census data tables. To restrict discovery of data about individual

salaries application of the same cell restriction methods could conceal salaries in

departments with a small number of employees, but because of the different nature of

corporate data, there are situations in which a user could still obtain information about

individual data. For instance, if a user[MV1] knew that a person was to be hired into a

department, then running the query about departmental payroll before and after the hiring

date would allow someone to infer the salary of the new hire even in a department where

the number of employees made cell perturbation unnecessary. This is a problem with

detailed zero time-lag data that someone with knowledge of the activities of the firm can

make inferences about sensitive data without ever requesting sensitive information.

Page 18: [2] Comment [A34] Author 8/10/2008 11:37:00 AM Again this whole thing is very interesting but you ramble on with no organization.. break it in sub sections with a couple of paragraphs and show a simple easy example for each and it will be very good

Page 18: [3] Deleted gfgal 8/27/2008 2:09:00 PM

Basically one might think of this as constructing a series of sets until the intersection

becomes a single member.

Page 27: [4] Deleted gfgal 8/26/2008 9:29:00 AM

Altering the Structure of the Query System

When an investor is requesting information corporate information from a database

that contains transaction level data there are certain data elements and functional

relationship that would be specified as sensitive. Maintaining a history of what has been

disclosed can determine all inferences that can be made with the given set of data.

Page 37: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

However, altering the structure of the query system can also keep certain functional

relationships from being divulged. Gasarch and Smith (1992) prove that query systems

with certain capabilities have the learning potential (ability to learn certain functions) that

is equivalent to passive induction inference machines. However, by including (or

removing) certain capabilities the query system can improve (or reduce) the learning

capability as compared to passive inductive learning machines. For instance, the ability

to ask recursive queries expands the learning capability of the system. What is the

standard deviation of sales in a particular region for a particular time period is an

example of such a query. In addition restricting the types of functions available to the

system can prevent certain relationships from disclosure. This approach would be

appropriate if all functional relationships within the set of secure information were

known, and therefore the lack of these capabilities in the query system would be

sufficient to keep the information secure. However, the query system would also need to

review the history of responses to be certain that the security requirements are

maintained. For instance, the sales by region had already been disclosed.

To keep certain information secure the query system must be able to generate all

functional relationships that can be derived from data that have already been provided.

This history of information provided, query results, determines what the user already

knows about the underlying system. In order to keep information secure a query system

must be able to stop responding if the user could use information provided in the next

query to obtain information determined to be sensitive. Continuous reporting systems

add another layer of complication to the learner trying to infer functional relationships.

In previous models of query systems, there was always an assumption that the underlying

Page 38: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

data or functional relationships were static. In a firm, this assumption might not hold,

and this plays a key role in using the query history to keep sensitive information from

being disclosed.

Page 27: [5] Deleted gfgal 8/25/2008 11:41:00 AM

Query Histories and Dynamic Data As the firm operates over time, there is a chance that not only will the

data change but also certain functional relationships will change. This would

mean that responses to previous queries may no longer be appropriate and

the inferences might be invalid as well. To identify specific queries that are

not longer valid specific characteristics of the data already provided must be

calculated. For instance, the total for a particular data element could change

regularly (sales), but other characteristics, such as mean or standard

deviation, might not change significantly and therefore the data’s support for

a particular inference might be still valid. To provide security from

disclosure a query system must not only be able to determine what can be

inferred, but also, what has changed, the significance of the changes, and any

changes to the certainty for values of sensitive information. [A2]

Previous work on monitoring dynamic web pages (Pandey, Ramamritham, & Chakrabarti, 2003; Garg, Ramamritham, & Chakrabarti, 2004) use certain probability functions to determine whether a web page needs to be revisited. Rather than continually, reviewing the underlying corporate data for changes this approach might yield better results in the continuous reporting environment. Certainly, the stream of events that could change an object’s values

Page 39: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

in a corporate database is more available and deterministic than the stream of events that could change information on a web page. For instance, the possible ways in which sales information could change are better understood than the possible ways a hurricane might change course

Page 27: [6] Deleted gfgal 8/25/2008 11:41:00 AM

1. A query history could identify the series of events that change information

about sales with a high degree of certainty. By identifying, the information

that is no longer valid responses to information requests previously blocked

could be allowed.

Page 28: [7] Deleted gfgal 8/26/2008 11:42:00 AM

has indicated interest in using corporate websites as a vehicle to disclose corporate

information. This combined with a vision from the XBRL community for a continuous

reporting environment requires academicians and practitioner to consider the implications

of such an environment. On primary issue concerns the characteristics of such an

environment. These[MV3] characteristics include

Page 31: [8] Deleted gfgal 8/28/2008 3:30:00 PM

Bray, D. A., Chellappa, R. K., Konsynski, B. R., & Thomas, D. M. (2007). Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality. Twenty-Eighth International Conference on Information Systems. Atlanta, GA: Association for Information Systems.

Chin, F., & Ozsoyoglu, G. (1981). Auditing for secure statistical databases. Proceedings of the ACM '81 conference (pp. 53-59). New York: ACM Press.

1 This was the example used by Pandey, Ramamritham, & Chakrabarti ( 2003) and certainly required a more complex approach to monitoring changes than changes to any corporate object.

Page 40: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

Cox, L. (2005). Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data. Symposium 2005 : Methodological Challenges for Future Information Needs. Ottawa, Canada: Statistics Canada.

Crawford, R., Bishop, M., Bhumiratana, B., Clark, L., & Levitt, K. (2006). Sanitization models and their limitations. Proceedings of the 2006 workshop on New security paradigms (pp. 41-56). ACM Press.

Denning, D. E. (1978). Are Statistical Data Bases Secure? National Computer Conference, (pp. 525-530). Washington, DC.

___________ & Denning, P. J. (1979). Data Security. ACM Computing Surveys , 227-249.

Department of Health and Human Services. (2007, September 13). Office for Civil Rights - HIPAA. Retrieved November 11, 2007, from United States Department of Health and Human Services:

Dragovic, B., & Crowcroft, J. (2004). Information exposure control through data manipulation for ubiquitous computing. Proceedings of the 2004 workshop on New security paradigms (pp. 57-64). Nova Scotia, Canada: ACM Press.

Garg, S., Ramamritham, K., & Chakrabarti, S. (2004). Web-CAM: monitoring the dynamic Web to respond to continual queries. Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 927-928). Paris, France: ACM Press.

Gasarch, W. I., & Smith, C. H. (1992). Learning Via Queries. Journal of the Association for Computing Machinery , 649-674.

Hawala, S. (2003). Microdata Disclosure Protection Research and Experineces at the US Census Bureau. Washington, DC: Bureau of the Census.

____________., Zayatz, L., & Rowland, S. (2004). American FactFinder; Disclosure Limitation for the Advanced Query System. Washington, DC: Bureau of the Census.

Jonge, W. D. (1983). Compromising Statistical Databases Responding to Queries about Means. ACM Transactions on Database Systems , 60-80.

Litan, R. E., & Wallison, P. J. (2000). Corporate Disclosure In The Internet Age . AEI-Brookings Joint Center Working Paper No. . 00-07 .

Page 41: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

Martin, E., & Osherson, D. N. (2003, May 1). Scientific Discovery from the Point of View of Acceptance. Retrieved December 11, 2006, from http://www.princeton.edu/~osherson/IL/essay1.pdf

Massel, P. B. (2005). Comparing ways of using "Protection Flow" to Protect Magnitude Data Tables from Disclosure. Washington, DC: Bureau of the Census: Disclosure Limitation Research Group.

___________. (2004). Comparing Statistical Disclosure Control Methods for Tables: Identifying Key Factors. Washington, DC: Bureau of the Census: Disclosure Limitation Research Group.

___________. (2003). Statistical Disclosure Control for Tables; Determining Which Method to Use. Symposium 2003: Challenges in Survey Taking for the Next Decade (pp. 2-12). Ottawa, Canada: Statistics Canada.

Nelson, R. (1994). What is a secret—and—what does that have to do with computer security? Proceedings of the 1994 workshop on New security paradigms (pp. 74-79). Little Compton, Rhode Island, United States: ACM Press.

Office of Civil Rights. (2007). Medical Privacy - National Standards to Protect the Privacy of Personal Health Information. Washington, DC: United States Department of Health and Human Services.

Ogawa, H., Fu, K. S., & Yao, J. T. (1984). Knowledge representation and inference control of SPERIL-II. Proceedings of the 1984 annual conference of the ACM on The fifth generation challenge (pp. 42-49). ACM Press.

Osherson, D. N., Stob, M., & Weinstein, S. (1982). Systems that Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists. Cambridge, MA USA: The MIT Press.

Pandey, S., Ramamritham, K., & Chakrabarti, S. (2003). Monitoring the dynamic web to respond to continuous queries. Proceedings of the 12th international conference on World Wide Web (pp. 659-668). Budapest, Hungary: ACM Press.

Richardson, V. J., & Scholz, S. (2000). Corporate Reporting and the Internet: Vision, Reality, and Intervening Obstacles . Pacific Accounting Review .

Schwartz, M., Denning, D., & Denning, P. (1979). Linear queries in statistical databases. ACM Transactins on Database Systems , 156-167.

Sicherman, G. L., Jonge, W. D., & Van De Riet, R. P. (1983). Answering Queries Without Revealing Secrets. ACM Transactions on Database Systems , 41-59.

Page 42: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

Sion, R. (2005). Query execution assurance for outsourced databases. Proceedings of the 31st international conference on Very large data bases (pp. 601-612). Trondheim, Norway: ACM Press.

Sloan, R. H., & Turán, G. (1999). On theory revision with queries. Proceedings of the twelfth annual conference on Computational learning theory (pp. 41-52). Santa Cruz, CA, USA: ACM Press.

Staddon, J. (2003). Dynamic Inference Control. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (pp. 94-100). San Diego, CA USA: ACM Press.

Steel, P. M. (2004). A new Estimation for the number of Unique Population Elements Based on the Observed Sample. Washington, DC: Bureau of the Census.

U.S. Code Title 13 Chapter 1. Title 13 Chapter 1. Washington, D.C.: U.S. Government.

U.S. Securities and Exchange Commission. (2007, August 10). Interactive Data: Putting Technology to Work for American Investors. Retrieved August 20, 2007, from U. S. Securities and Exchange Commission: http://www.sec.gov/spotlight/xbrl/interactivedata.htm

_____________. (2008). Progress Report of the Advisory Committee on Improvements to Financial Reporting to the United States Securities and Exchange Commission. February 14, 2008. U.S. Securities and Exchange Commission.

Weitzenboeck, E. M. (2001). Enterprise Security: Legal Challenges and Possible Solutions. Proceedings of the 10th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (pp. 183-188). Los Alamitos, CA USA: IEEE CS Press.

Woodruff, D., & Staddon, J. (2004). Private inference control. Proceedings of the 11th ACM conference on Computer and communications security CCS '04 (pp. 188-197). Washington DC, USA: ACM Press.

XBRL International. (n.d.). XBRL - GL Taxonomy. Retrieved from XBRL - GL: http://www.xbrl.org/GLTaxonomy/

Xiao, Z., Jones, M. J., & Lymer, A. (2002). Immediate Trends in Internet Reporting . European Accounting Review .

Yazdanian, K., & Cuppens, F. (1993). Neighborhood data and database security. Proceedings on the 1992-1993 workshop on New security paradigms (pp. 150-154). Little Compton, Rhode Island, United States: ACM Press.

Page 43: Query Issues in Continuous Reporting Systemsraw.rutgers.edu/docs/wcars/16wcars/New folder/Final version of Que… · A possible solution would be to move toward real time reporting

Zhang, N., & Zhao, W. C. (2004). Cardinality-based inference control in OLAP systems: an information theoretic approach. Proceedings of the 7th ACM international workshop on Data warehousing and OLAP (pp. 59-64). Washington, DC, USA: ACM Press.