Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Query Issues in Continuous Reporting Systems
Graham Gal
Dept of Accounting and Information Systems The Isenberg School of Management
University of Massachusetts Amherst, MA 01003
2
Abstract Investors have always made the case that more and better information will improve their decisions. Regulators have sought to answer these requests by searching for ways to make the information more readily available. A possible solution would be to move toward real time reporting. Certainly, current technology could provide an expanded set of corporate information that is more current or even continuously available. However, as the time between event and reporting of that event shrinks several issues arise. These include the information to be disclosed, its level of detail, the time lag, and the methods available to query the information. Furthermore information usage, information adequacy, and materiality emerge as additional concerns. These issues, while not currently discussed in the accounting literature, can benefit from learning from queries on statistical databases such as those containing U.S. Census information and hospital information. Methodologies to restrict queries with the use of inference channels, inductive learning, and query history are proposed as having implications for continuous reporting.
Formatted: Font: 14 pt
Formatted: Font: Not Bold, NotItalic
Formatted: Font: Not Bold, NotItalic
Formatted: Font: Not Bold, NotItalic
Formatted: Font: Not Bold, NotItalic
Formatted: Font: Not Bold, NotItalic
Formatted: Font: Not Bold, NotItalic
Comment [MV1]: Do u really want to justify this paper in the abstract with “a recent report”? hopefully this paper will be read many years form now.. I think u need to go ove r the abstract now that the paper is substantially different
Deleted: A recent report by an advisory committee of the Securities and Exchange Commission includes a call for reporting of financial information on a more timely basis. This report calls for examination of this disclosure through improved corporate websites and tagging of the information using XBRL.
Deleted: ; ho
Deleted: there are issues that need consideration before adoption of this technology
Deleted: One area of concern involves the way in which to assure the constant release of information. As the time between the event and the reporting of that event shrinks, verifying the accuracy of the information becomes problematic. Another, but related issue concerns the amount of information disclosed. With disclosure of more information about the firm then information security also needs consideration. This paper discusses some audit and security issues as well as the benefits and the problems of different approaches that potentially can keep information secure in the continuous reporting environment.¶
3
I Introduction
The U.S. Securities and Exchange Commission has become increasingly interested in
improving financial reporting. One such improvement that is under consideration involves
greater use of technology to allow investors access to more data on a timelier basis1. The interest
in interactive data (SEC, 2007) was followed by a committee report to the U. S. Securities and
Exchange Commission (SEC, 2008) on improving financial reporting and called for increased
use of corporate websites to provide information that is more current. The ultimate goal is to
provide better information in a more continuous reporting environment. This report also
includes a call for the increased use of XBRL-tagging (XBRL International) “… to facilitate the
ability of investors to more easily access comparative arrays of company information (SEC, 2008
pg 3). The Global Ledger (XBRL-GL, XBRL International) consists of tags for transaction level
information while the XBRL-FR (Financial Reporting) taxonomies consist of tags for traditional
financial statement elements. The tagging of information down to the transaction level
envisioned by the XBRL-GL (XBRL International ), would allow investors access to information
about corporate activities down to the most detailed level. While current discussions do not call
for information at this level of detail, groups have called for reporting of corporate information in
a form that takes advantage of emerging technologies (Litan & Wallison, 2000; Xiao, Jones, &
Lymer, 2002). With the ability of current technology to provide more detailed and current
information, it is incumbent on academicians and practitioners to anticipate issues that would
arise under these different reporting environments. Certainly an SEC mandate would provide a
1 Section 409 of the Sarbanes Oxley (SOX, 2002) legislation seems to mandate this and probably requires technological solutions to achieve this level of disclosure.
Comment [MV2]: See section #409 of sarbox
Comment [A3]: Either eliminate or provide a respectable quote
Deleted: XBRL International
Comment [A4]: You need a GL reference and maybe a sentence or two differentiating GL and FR
Deleted: GL(
Deleted: , would
Deleted: allow
4
more immediate path toward providing financial information that is machine readable and is
more current.
The ability to provide a larger set of information about a firm on a timelier basis is a
possibility. However, there are a number of issues that at a minimum need to be considered
before these mandates are put in place. It is the purpose of this paper to look at the implications
of this technology for investors, auditors, and managers. The next section will provide a more
detailed look at characteristics that would define continuous reporting environments.. The
ensuing section will discuss some implications for this type of reporting environment, including
information usage, audit concerns, and the disclosure of sensitive information. Section IV will
examine techniques that have been used to restrict access to sensitive detailed information from
the US Census Bureau data. While the methods used by the Census Bureau do provide an
approach to protect certain types of sensitive information there are certain differences that make
their approaches incompatible with requirements to provide relevant information to investors.
Therefore the following section will look at alternative approaches to protecting sensitive data.
At the conclusion the paper will look at the implications of these methods and suggest some
areas for further investigation.
II The Continuous Reporting / Assurance Environment
Characteristics
If corporate websites become an important or the predominant vehicle to disclose
information about the firm there are a number of variables that could be used to define important
characteristics of the environment. These include the type of information disclosed, the level of
detail, the time-lag, and the method of obtaining the information. This section will discuss each
Formatted: Heading 3
Formatted: Font: (Default) TimesNew Roman
Deleted: more timely
Deleted: , h
Deleted: both for auditing and for information security
Deleted: possible
Deleted: continuous reporting environments
Deleted: and the role of corporate websites in providing information in these environments
Deleted: amount of detailed information
Deleted: II
Deleted: problems that could result from allowing investors to access more detailed information and
Deleted: There are certain problems with the approaches used to protect Census data including distortion of the information so the paper will then look at alternative approaches to protect sensitive information.
Deleted: this
Comment [MV5]: Give this another read.. after this after that is not good english
Deleted: that need
Deleted: .
Deleted: A Query-Based
Deleted: reporting
5
of these variables and their impact on the way investors will obtain information and the level of
assurance that can be afforded to the information provided.
Expanded Information Set
Under the current reporting environment information to be disclosed is clearly defined as
is its location within financial statements. Current reports have a set of information organized
around categories that have accepted structures and definitions. For instance, information about
sales has a clear definition under GAAP. There are also locations for other types of information
and there are rules for making this information available. The report to the SEC (2008) discusses
other non-GAAP information and indicates that this will be an issue that needs to be considered.
Clearly there is a great deal of information about firms that investors obtain from other sources
and placing this on a corporate website would offer a different level of credibility and might
imply some assurance about that information.
Expanded Level of Detail
Related to the type of information is the level of detail provided. Even if the categories
of disclosure where to remain the same; Sales, Accounts Receivable, Cost of Goods Sold, etc.
the level of detail could be expanded. A sales figure could be broken down by individual
customers, dates, or even invoices. The XBRL-GL tagging taxonomy presumably would allow
for this level of detail to be constructed and would also allow for aggregation of the detailed
information around different investor defined categories. This would put investors in a different
position as they could obtain not only financial statement information, but also a complete
breakdown of the individual items that make up the account. If this increased level of detail
were available for an expanded set of information then investors could essentially have data
about the operations of the firm. For instance, under current reports investors do not have a great
Formatted: Font: 12 pt
Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Formatted: Not Highlight
Formatted: Not Highlight
Formatted: Font: 12 pt
Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Comment [MV6]: Don’t u want a subheader for each of these?
Deleted:
Comment [MV7]: awkward
Deleted: along with
Deleted: place
Deleted: hiin
Deleted: the
Deleted: Another issue
Deleted: related to the type of information
Deleted: concerns
Deleted: For instance, a
Deleted: to
6
deal of information about raw material purchases. If the type of information is expanded and
provided in more detail it could be possible for investors to recreate the purchase activity for
each raw material. This certainly describes a reporting environment that is not currently
available, but could be supported with current technologies.
Time Lag
Another variable that describes what might be considered a primary characteristic of any
continuous reporting environment is the time-lag between events and the ability to obtain
information about the events. Under the current reporting environment financial information is
reported at discrete points in time and contains aggregations of events that occurred within the
period covered by the report. Thus quarterly financial reports contain a sales figure that includes
events that occurred over the previous three months. Even if the quarterly report were available
on the last day of the quarter it contains some events that were concluded ninety days prior to the
release. Under a continuous reporting model the time-lag between events and the reporting or
availability to obtain information about the event would be reduced; in the limit, to zero. If
there is zero time between an event and its availability and is considered in conjunction with an
increased level of detail then there would be essentially a reporting environment in which
investors could obtain information about individual sales or purchases as they occur. This truly
continuous reporting environment has some important implications for investors in terms of
information they obtain.
Query Language
Such a truly continuous reporting environment would place investors in a very different
position as they search for information to make their decisions. Under the current reporting
environment investors are provided with what has been considered a complete set of financial
Formatted: Font: 12 pt
Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Formatted: Font: 12 pt
Formatted: Heading 4, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Deleted: along
Deleted: with the level of
Deleted: For instance,
Deleted: end of the quarter
Deleted: this
Deleted:
Deleted: to obtain information about the event
7
information. Investors are free to search for additional information, but the standard financial
reports have a familiar set of information and format. If firms disclose information that has a
zero time-lag between occurrence and availability then it would no longer make sense to obtain
traditional financial statements even if they reflect moment by moment changes. While this
traditional financial report would probably be available the availability of a more detailed and
expanded information set would require investors to formulate requests for information based on
their perceptions of relevancy. This change from a standard report to a user developed report
would require corporate websites to provide a query language. There are two distinct but related
constructs of query languages that will impact the investors’ ability to request information.
First is the structure of the query language. The structure allows users to formulate their
questions in a format that is closer to natural language. For example, “What were sales for the
company” as opposed to formulations that are closer to a language like SQL2. The second
construct for a query language concerns the capabilities it has to manipulate data. Statistical
capabilities would be included in this construct. For example, “What are average sales for
company X” requires the query system to do a statistical calculation. A second type of capability
the query system would need are time constructs. Investors must be able to indicate a time frame
and so references such as “last quarter” or even “yesterday” might be allowed3. These constructs
of a query language, structure and capabilities, are primary determinants of the investors’ ability
to obtain information in a continuous reporting environment.
2 SQL (Structured Query Language) is a standard format to obtain information from databases. QBE (Query By Example) requests are translated to SQL statements. Even though QBE is a fairly restricted language there are instances when the translation to SQL becomes a problem.
3 Time parameters could come from a predetermined list, but then some of the “continuous” nature of events would be lost.
Deleted: These characteristics
Deleted: described in the previous paragraphs
Deleted: would enhance
Deleted: place
Deleted: investor information acquition capabilities
Deleted: s in a very different position in terms of obtaining information
Deleted: .
Deleted: to investors requesting this information
Deleted: ¶Query Language
Deleted: T
Deleted: the way they naturally ask questions
8
This section has looked at four variables, expanded data set, more detail transaction level
information, time lag between event and availability, and query language constructs, that would
define any new reporting environment. The lag between the event and its availability is a central
characteristic of any “continuous” reporting, but types of information and the method used to
obtain this information also are important considerations in describing features of this
environment. The next section will look at some of the audit and usage issues implied by
changes in each of these characteristics.
III Issues for a Continuous Reporting Environment
Information Use
If corporate websites become the predominant source of financial information, then the
information becomes critical. Investors would be substituting traditional reports for web-based
ones. Even if the financial report is unchanged, then a delay in disclosure would be critical as
even alternative sources would presumably obtain their information from disclosures contained
on the website. Under the current reporting model even quarterly financial report are anticipated
and confirm investor expectations. The continuous reporting environment would not allow for
this anticipation as information is released ever closer to the occurrence. These changes in the
way information is disclosed means that concepts such as “insider trading”, “materiality”, and
“adequate disclosure” would need some changes in their definition.
Adequate Disclosure and Insider Trading
While an occasional problem with a firm’s website is currently an annoyance and may
result in lost sales or dissatisfaction by trading partners, under a situation where investors’ goals
are to obtain data as a timely part an investment decisions, any delay in the ability to access a site
Formatted: Font: (Default) TimesNew Roman, Not Italic
Formatted: Heading 2, Left, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Formatted: Heading 3, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Formatted: Font: (Default) TimesNew Roman
Formatted: Font: (Default) TimesNew Roman
Formatted: Heading 3, SpaceBefore: 0 pt, After: 0 pt, Linespacing: single
Formatted: Space Before: 0 pt,After: 10 pt, Line spacing: Multiple1.15 li
Comment [MV8]: where they four?
Deleted: ¶
Deleted: that
Comment [MV9]: I feel there is a quantum improvement here… would you mind going over it and eliminating unnecessary words and try to make it a bit more to the point? Note that I created third level subheaders here…
Deleted: Use
Deleted: question of how
Deleted: is used
Deleted: very
Deleted: the current set of information available to investors is not changed from the traditional financial report
Deleted: information contained in the
Deleted: is
Deleted: s
Deleted: the
9
changes the disclosure of the information. As the time between the underlying business events
and the availability of information about these events shrinks, a person with the ability to obtain
the data would be in a very powerful position compared to someone who cannot, and an auditor
faces a different set of problems. If there is a delay in reporting financial information because
the website is inaccessible due to an intentional act, such as a denial of service attack, or
accidental as with a technical problem, there is a problem of liability (Richardson & Scholz,
2000). Some questions that need thoughtful answers include:
• Has the firm taken adequate or reasonable steps to address technical problems that
might hinder access to the website?
• If a person within the firm trades on information disclosed on the website and
that information subsequently becomes unavailable, then is the person guilty of
insider trading?
• When has the requirement for adequate website disclosure been met? After ten
minutes without interruption? After one day?
• Must the disclosure be uninterrupted for the entire time period?
Cleary these questions need to be answered to provide auditors with guidance. At the minimum
auditors will need an understanding of what methods can be used to make a website inaccessible
and then decide whether the firm has taken reasonable steps to protect against these methods.
Even if the corporate website never has problems and the information is always available then
the query system that responds to investor requests presents some audit issues.
Materiality Issues
Formatted: Char Char6, Font:(Default) Times New Roman
Formatted: Normal
Deleted: certainly
Deleted: What would be considered adequate or reasonable protection against such problems that hinder disclosure
Deleted: Can a person inside the company trade on information disclosed on the corporate site that subsequently became unavailable
Comment [A10]: Good questions.. the speller however does not like the passive voice
Comment [MV11]: Great question,,, can u reword to be mor e understandable?
Deleted: O
Comment [MV12]: Horrible wording.. can u clean this up?
Deleted: While one might think of other issues that need to be resolved answers to these questions need to be answered for auditors to make a review
Comment [MV13]: ??????
Deleted: need to understand the possible ways to make
Deleted: appropriate or
Deleted: Audit Issues
10
Under the current reporting environment firms have a clear understanding of their
requirement to produce accurate information that is free of material errors. Investors looking at
this information can pick which information to use, but the continuous reporting environment
presents them with more options and opportunities. When the investor is confronted with a
query system that allows/requires information requests from a corporate website, the ability to
formulate questions or queries become an important issue.
Review of Queries Previously the constructs necessary for a query language were identified. Each of these,
language and capabilities, would provide an investor with the tools to request information from
the firm’s website. However, the auditor is faced with issues as the query system is now
providing financial reports. The primary problem arises because a translation between natural
language and the actual request is necessary. This translation is not a trivial process, and
becomes more complicated as the query system allows requests that are closer to English
statements. In a search engine5 an inappropriate translation of a request may only lead to
response that contains more or fewer sites than would have been contained had the query been
perfectly translated. On the other hand, an incorrect response to a query of a firm’s continuously
reported financial data may lead to a data set that is materially different from what the investor
wanted. With a move to corporate websites as the primary means of disclosure there is the
potential for a large number of queries and essentially a large number of individual reports.
5. Search engines return relevant websites based on an interpretation of the request combined with algorithms, sometimes protected, that rank sites matching the request.
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single, Tabs: 261 pt, Left
Deleted: them to
Deleted: information
Deleted: their
Deleted: Queries and their audit
Comment [A14]: As I told u I asked an associate editor to go over this and he was positive on the paper but complained bitterly about the lack of organization,,,, I have to agree the paper has very interesting issues and questions up to now but is very poorly organized and written… you must very early on the paper say what a continuous reporting query system would be and draw the questions subsequently… please use an outliner and get things organized
Deleted: The requests would be in a format of a specific query language. Query languages have a structure that is sufficient to make an unambiguous request for information from a computer-based set of information
Deleted: 4.
Deleted: Humans generally are much better asking questions in natural language and therefore
Deleted: query language
Comment [A15]: This is good but should have a subheading and example and discussions
Comment [MV16]: Don’t u think people know what a search engine is?
Deleted: and result in an inappropriate investment decision.
Deleted: Most familiar search engines include Google, Yahoo, and Ask
Deleted: Each of search engines employ methods, sometimes protected, to return websites that meet an interpretation of the user’s request. NOT SURE WHAT U ARE SAYING
11
Under this continuous reporting environment it is problematic for auditors to form opinions
about each of these reports.
Materiality Determination The auditor that is required to make materiality judgments looks at amounts disclosed in
the context of the complete report. If investors are creating many individual reports different
levels materiality would be implied. The information available to the investor depends on the
specific characteristics of the continuous reporting environment; expansion of information to
include non-GAAP items, the level of detail available, and time-lag. As these characteristics are
extended, i.e. more non-GAAP items, more detail, and shorter time-lag the investor is faced with
expanded options in terms of creating an individual report. The response to a query can be
materially inaccurate in a number of ways:
• The translation of the request from the investor into a specific query could be inaccurate.
This could be because of a problem with an inappropriate translation of terms. For
instance, an investor might ask for income when gross income was intended.
• The aggregation requested or time period might be misinterpreted; sales for the quarter
instead of sales year to date or yesterday when the request is made from a different time
zone.
• With the time-lag close to zero certain events are excluded from the response.
In each of these situations it is possible to conclude there is a material error in terms of a
match between the intent of the query and the results provided. This is clearly a different
definition of materiality. With the potential for a large number of individual queries the firm
loses the ability to review all of the information released and an auditor loses the ability to
Formatted: Font: 12 pt
Formatted: Font: 12 pt, Not Bold
Formatted: Heading 4, Line spacing: single
Formatted: Bulleted + Level: 1 +Aligned at: 24 pt + Tab after: 42 pt+ Indent at: 42 pt
Formatted: Indent: Left: 6 pt
Deleted: f the auditor is
Deleted: to
Deleted: make a conclusion
Deleted: opine
Deleted: there, the audit of the responses is problematic.
Deleted: conjunction with
Deleted: can create
Comment [MV17]: Wonder if the issue is auditing the result of a query or the accuracy of the database… however this whole issue is very interesting
Deleted: for example
Deleted: .
Deleted: First,
Deleted: the
Deleted: a
Deleted: A
Deleted:
Deleted: second
Deleted: problem could occur if an
Deleted: is
Deleted: as the
Deleted: Finally, and this would occur
Deleted: in the situation in which the
Deleted: between event and availability is supposed to be zero,
Deleted: not contained in
Comment [MV18]: If a user mis-reads the content of a traditional report .. is that a misstatement by the company and a faulty audit? You have to be careful about your assrtions about audit… now that the paper is cleaner I see more problems that need to be cleaned..
12
review the appropriateness of the response to each query. This presents a problem for the
auditor to decide whether each query fairly represents the request of the user and whether the
response fairly represents the financial position of the firm in terms of the subset of data
provided.
Query Translation versus Database Materiality
With a large number of continuously generated reports an auditor would be faced with
determining materiality not of a set of numbers reported at a specific point in time, but a series
of numbers reported at different points in time to different investors. The traditional approach
of looking at the firm’s database at a point in time is no longer available. In the zero time -lag
continuous reporting model the corporate database is only current at the point of the request, not
at the end of the quarter or fiscal year. The concept of overall materiality is lost. If the match
between the users request and the information provided is not perfect, the auditor must first
decide if this difference is material; again in the context of the individual report. The auditor
must determine whether the problem resides in the way the user formulated their request, the
translation of the request into the formal query language, or the subset of information provided.
Significant differences could occur at each point and an auditor in a continuous reporting
environment could be faced with a materiality determination at any of them. Auditors would
now be faced with a requirement to review the specific query language used in conjunction with
the website; a very different level of responsibility. Firms would also be placed in a different
relationship to investors. Under the current reporting environment financial information is
disclosed after some level of review and interpretation by responsible individuals within the
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single
Formatted: Font: 12 pt, Not Italic
Formatted: Line spacing: Multiple1.15 li
Formatted: Indent: Left: 6 pt, Don'tadjust space between Latin and Asiantext
Comment [MV19]: Again audit of the database or of the query
Deleted: of
Deleted: ing
Deleted: An
Deleted: is
Comment [A20]: Many firms allocate materiality today… again this is discussed in the Galileo stuff… but this is a very good discussion if u subtitle and lead to it.. a this point is a set of incomplete points… select what you want to do and do it a bit more thoroughly
Deleted: in this setting
Comment [MV21]: Again not sure if the audit should be of the query results of the data.. however the issue of misleading query results is very important
Comment [A22]: This is very tricky we have not clarified the materiality model in our traditional environment you are entering here a bag of worms
Deleted: Material
13
firm. This continuous reporting of financial information also changes the responsibility of the
firm when it is not possible to perform a traditional review6.
With the most radical change from the current reporting environment, and one that is
proposed by the Galileo Disclosure Model (Vasarhelyi and Alles), the amount of information
disclosed becomes critical. This disclosure model includes access to more data, at a greater level
of detail, and with zero time lags and requires a continuous determination of the completeness of
the information provided. However, this environment presents another important issue; how
much information should be disclosed. The next section discusses this new problem of keeping
too much information from being disclosed.
Amount of Information Disclosed
The XBRL-GL taxonomies look to tag corporate information down to transaction level;
this would allow a reporting model in which detailed data is available. When transaction level
detail is combined with a zero time-lag a truly continuous reporting environment is achieved.
Inventors and analyst probably could make a case that investment decisions would improve if
they had more detailed information available, however there needs to be some review of the level
of detail that should be available. If the available data expands to include all of the activities of
the firm, investors that want to delve into the intricacies of the operations of the firm would
benefit. The technology to move to such a situation is certainly within reach, but there needs to
be a critical review of the amount of the information that will be provided and the query system
that will provide the interface between investors and the underlying corporate data.
6 This problem was a major concern expressed by some corporate CFOs at the First Continuous Reporting Symposium held at Rutgers September 28th, 2007
Formatted: Font: (Default) TimesNew Roman
Deleted: T
Comment [MV23]: You may want ot refer to our Galielo work http:/raw.rutgers.edu/Galileo/
Comment [MV24]: Would non financial be better ??? hmmm not sure.. non gAAP is unothodox and in the age of IFARS this seems weird
Deleted: , a great deal of non-GAAP information, complete access to underlying detail, and zero time-lag between events and availability, an additional issues of
Deleted: As pointed out previously under a zero time-lag the auditor would be faced wi
Deleted: th
Deleted: even more
Deleted: of
Deleted: will look at problems that would arise should this more complete continuous environment become the norm.
Deleted: level,
Deleted: this
Deleted: of
Deleted: system disclosed from a corporate website the result is
Deleted: ly
Deleted: results
Deleted: there would certainly be a benefit to
Deleted: nature
Deleted: query systems
14
Disclosure of Sensitive Information If the reporting model changes to include more of the underlying detail, then a query
system will have access to all of the firm’s data. Rather than simply responding to any request
for information, such a system must be able to balance the requests for information with the need
to protect sensitive corporate information. The definition of “sensitive information” may be
different depending on the group making the definition, but it is clear that knowledge about the
critical activities of the firm needs protection to ensure success of the firm (Bray, Chellappa,
Konsynski, & Thomas, 2007).
Definition of Sensitive Information Once a precise definition of sensitive information has been made a related precise
specification of when the sensitive information is released must also be made. This requires a
complete understanding of what information is actually needed to know the sensitive
information(Nelson, 1994) . For instance, it may be critical to protect the exact sales for each of
the company’s locations (sensitive information). However, the level of sales for the company is
not sensitive and should be available to investors. The problem is that in responding to queries
about sales grouped by states or regions (or other types of groupings), if enough queries are
asked then the value of sales for a particular store can be determined (Jonge, 1983). Even
responding inappropriately can provide unintended information (Sicherman, Jonge, & Van De
Riet, 1983). A response of, “That information cannot be provided,” to a query asking whether a
certain project is top secret will have the unintended consequence of providing the requested
information. This desire to protect “sensitive” information would also expand audit issues as an
auditor would need to be involved in determining what information should be termed “sensitive”,
the methods used to determine when this information has been disclosed, and the appropriateness
of the techniques used to withhold this information.
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single
Deleted: Restriction of Access due to detail available
Deleted: also
Deleted: actually
Deleted: For instance, a
15
Similar problems of restricting availability of publicly available information have been
faced in other disciplines and therefore previous work may provide insights to attack this
problem faced in a continuous reporting environment. The next section will look at query
systems for statistical information systems such as the U.S Census and hospital systems and
procedures used to keep this information secure when users are allowed to formulate their own
information requests.
IV Learning from Queries on Statistical Data
The U.S Census Bureau and hospitals in the United States collect a great deal of
information that is statistical in nature. The requirement for both datasets is similar; the
information is generally available, but there is also a pledge of confidentiality (Office of Civil
Rights, 2007; U.S. Code Title 13 Chapter 1;Department of Health and Human Services, 2007).
Each of these codes defines confidentiality as non-disclosure of individual responses. For census
data, this would mean that an individual response would not be derivable with a specified degree
of certainty from the presented data. For hospitals, the HIPAA requires that individual
information not be disclosed to unauthorized individuals. Each of these requirements has
parallels with a corporation that does not want to disclose certain information.
Census and Hospital Data For Census data reports, the requirement is that there should not be any application of
procedures that can recover individual responses. Thus, for Census data individual responses are
considered sensitive. This means that individual responses should not be available to any query
or combination of queries. Certain identifying elements are removed from the data even before it
is made accessible to the query system, but additional methods are used to protect individual
Formatted: Font: (Default) TimesNew Roman, Not Italic
Formatted: Font: (Default) TimesNew Roman, Not Italic
Formatted: Left
Formatted: Normal, Left
Formatted: Font: 13 pt
Deleted: For the purpose of understanding publicly available information and restrictions of content provisioning we will draw to research / experiences from other areas.
Deleted: II
Comment [A25]: I would brake this down in 3 sub-secions census.. hospitals and applications to continuous reporting Only here I understand why u are going to talk about quries to statistical data.. u need to say in the intro that u are going to extrapolate on these two domains to better understand the CR environment
Comment [A26]: Census and HIPAA are very different ligislations ..be careful with these assertions but u can state what they are similar and what they are different
Deleted: While these requirements are similar, there is a slight difference. These may be analogous to a business organization that may not be willing or allowed to provide data on transactions from a particular business partner
Deleted: re
Deleted: the underlying data
Deleted: are
16
responses. In contrast, hospitals must be able to provide individual information to authorized
people, but the reporting of health statistics to generally interested parties should protect
individual responses. In each of these databases individual responses are defined as sensitive and
therefore protected.
While these requirements are similar, the fundamental difference is that a user in a
hospital with a specific role can create queries for individual health information which is
considered sensitive, but the census data reporting system (American FactFinder, Hawala,
Zayatz, & Rowland, 2004) must not permit this for any user. Another difference is that health
statistics are reported much like traditional financial reports while Census data is accessed using
a query system that creates individual reports. Thus hospitals protect sensitive information using
the approach currently employed in reporting financial information, producing predefined
reports, and so their methods for protecting sensitive data are not applicable to a continuous
reporting environment.
Disclosure of Census Data The disclosure of Census data is generally provided in a table with either magnitude or
count data displayed in each cell. For instance, a census report could disclose average salaries
for each county in a state. If there is a possibility that an individual response could be determined
then procedures are applied to prevent the disclosure.
Because Census data can be accessed by large number of users the U.S. Census Bureau
has developed a number of techniques to protect sensitive data. Within American FactFinder
there are a number of statistical disclosure controls (SDC) available to maintain the
confidentiality of members in any cell. Generally, these SDC fall into forward or backward
process (Massell, 2003). Forward process controls include such methods as cell perturbation and
Formatted: Font: (Default) TimesNew Roman
Formatted: Heading 3, Line spacing: single
Comment [A27]: Here u show substantive difference so your early statement of similarity does not carry very well
Deleted: ¶
Deleted: similarity between Census and hospital
Deleted: s is that the reports
Deleted: are
Deleted: , while a report from a hospital might disclose patient numbers per unit or by diagnosis. I
Comment [MV28]: good
Deleted: there are some
Comment [MV29]: not sure what u are saying qw there is no continuousupdate of census data.. it is just a huge archoval file
Deleted: almost continually
17
suppression. Perturbation involves the altering of the total value in a cell(s). Protection flow
involves methods of perturbation so that the that totals (columns or rows) are preserved (Massel,
2004 & 2005).
Statistical Disclosure Controls Insert Figure 1
In Figure 1 a constant x might be added to cell A and subtracted from cell C. This
pertubation would maintain the general information about the row total, but would alter the
column totals. To correct for this pertubation and maintain the column totals the constant x
could be subtracted from cell M and added to cell O. Other pertubation methods have been
proposed that allow for analysis that maintains statistical relationships (Dinur & Nissim, 2003).
The suppression of cells is generally done to to protect the confidentiality when a small number
of respondents fit into a specific category (Hawala, 2003). For instance, if a request was made
for salaries of doctors with a particular speciality in counties of a state and the cell size for a
particular county was small then that cell would be suppressed and the table pertubed so the
value could not be calculated. Each of these SDC allow for different level of protection in terms
of the degree of uncertainity required for the resulting query. The level of uncertainity can be
altered for different types of data. For instance, if there are a small number of respondents in a
particular category (microdata protection), then there might be more uncertainty required for
income as opposed to real estate value. Backward processing is generally done prior to release
of the table data to ensure that the forward processes that were used achieved the desired level of
protection.
Inferences from Disclosed Data
Formatted: Font: 12 pt
Formatted: Heading 5, Line spacing: single
Formatted: Font: 12 pt
Deleted: f
Deleted: . If a
Deleted:
Deleted: only had a
Deleted: number of doctors with that speciality
Comment [A30]: This is very interesting
Comment [MV31]: So I guess this answers my question… the data is static but que queries have an algorithm to protect confidentiality.. is this s census service or a third party
18
Data in the transaction level zero time-lag continuous reporting environments has some parallels with the census data. If the current reporting categories are used, then protection levels may not be as important, however as more detailed information becomes available there is a question of how much detail should be disclosed. Some issues such as legal requirements (Weitzenboeck, 2001), or competitive reason would assist in determining the information that should not be disclosed. While the direct disclosure of the sensitive information could be blocked the level of detail could allow inferences to be made even if the data is not provided in tables with row and column totals. Two approaches to making inferences from data are time and set intersection methods.
Time Based Inferences An organization’s database changes over time as a result of both internal and external
events. Someone familiar with the timing of these events could infer certain information without
actually creating a query for that data. For instance, a person that is aware of an anticipated
hiring event could query the total salaries for a department before and after the event. By
comparing the two amounts the person could infer the salary of the new hire. This inference
would be possible even if the number or employees in the department were large enough so that
cell suppression was not required. This inference is even possible without access to transaction
level detail. This same approach could be used to infer purchases of raw materials, again
without access to transaction level detail. As more detail is available it would be possible to
infer not just raw material purchases, but purchases of specific items from specific vendors.
Intersecting Subset Inferences
Another approach to obtaining a salary of an individual in a department would be to ask
for the salary information of all employees, then of all employees except the managers, and so
on. Basically one might think of this as constructing a series of sets until the intersection
becomes a single member. This ability to string together an unlimited number of queries has
been shown to allow for access to all underlying detail (Jonge, 1983). In a corporate setting this
Formatted: Font: 12 pt
Formatted: Line spacing: Multiple1.15 li
Formatted: Font: Not Bold, NotItalic
Formatted: Heading 5, Line spacing: single
Formatted: Font: 12 pt
Formatted: Font: 12 pt
Formatted: Heading 5, Line spacing: single
Formatted: Line spacing: Multiple1.15 li
Comment [A32]: These are the words of cox but he only mean XBRL… I think your definitional section could be a great contribution.. I can send u my presentation is XBRL interactive data that I did in phili’s xbrl meeting tht they hated..
Deleted: ¶
Deleted: and hospital
Deleted: ¶
Comment [MV33]: Forinstance and censuing for instance.. can we shorten paragraphs and make them self contained?
Deleted: For instance, a query about total payroll by division or department would result in a table similar one of the census data tables. To restrict discovery of data about individual salaries application of the same cell restriction methods could conceal salaries in departments with a small number of employees, but because of the different nature of corporate data, there are situations in which a user could still obtain information about individual data. For instance, if a user knew that a person was to be hired into a department, then running the query about departmental payroll before and after the hiring date would allow someone to infer the salary of the new hire even in a department where the number of employees made cell perturbation unnecessary. This is a problem with detailed zero time-lag data that someone with knowledge of the activities of the firm can make inferences about sensitive data without ever
Comment [A34]: Again this whole thing is very interesting but you ramble on with no organization.. break it in sub
Deleted: and
Deleted: Basically one might think of this as constructing a series of sets until the intersection becomes a single
... [1]
... [2]
... [3]
19
process is assisted by knowledge such as the minimum salary for a manager, or that salaries
cannot be negative, or the average number of employees in departments7.
In each of the examples, time based and intersecting subsets, sensitive information could
be obtained even though the specific sensitive information was never requested. Users can keep
track of the information provided and then use this information to infer sensitive information.
Therefore, restrictions must be placed on users to keep them from obtaining enough information
to infer undisclosed sensitive information.
Restrictions to prevent inferences
The need to restrict queries to eliminate the ability to make inferences was recognized
very early (Denning D. E., 1978; Denning and Denning, 1979; Schwartz, Denning and Denning
1979). The initial solution to maintain security with these types of queries, again depending on
the level of security required, was not to respond to queries where the set size fell outside the
range [k, n-k] for k ≥ 0 and n being the size of the database. The determination of k sets the
count for legitimate queries and presumably eliminates responses to queries such as the Census
data salary queries mentioned above8. While this approach can prevent disclosure of certain
information, the ability to string together queries combined with some knowledge about the
semantics of the corporate data can allow inferences about data that was supposed to be secure
(Dragovic & Crowcroft, 2004). In the case of small cell sizes the values can either be hidden or
perturbed. However, there is another problem with the perturbation techniques used with census
data.
7 Knowledge of other characteristics such as cardinalities (Zhang , Zhao & Chen, 2004) and related data (Yazdanian & Cuppens, 2003) has been shown to allow inferences in which sensitive information is thought to be protected.
8 This is essentially the approach used to determine when a cell should not be disclosed for a Census data report.
Formatted: Font: (Default) TimesNew Roman
Formatted: Heading 3, Line spacing: single
Formatted: Space Before: 12 pt,After: 12 pt, Don't adjust spacebetween Latin and Asian text
Formatted: Do not check spelling orgrammar
Deleted: In the salary example this could be
Deleted: These two examples point out the problem of being able to obtain sensitive information
Deleted: The ability to combine information from a string of queries or use information know about the operations of the firm allows a user to circumvent any simple procedures to keep information secure.
Deleted: ¶ ¶
Deleted: )
Deleted: to the need
20
There is a presumption that information beneficial to investors should be disclosed.
There could be a distinction between information investors want and the set of information that
is required to make decisions. This discussion needs to be made in conjunction with
management’s specification of sensitive data. Current rules concerning disclosure preclude
altering data, particularly in a way that would make it materially different from a true value. In
fact if data were to be altered in a way unknown to investors then continuously reported
information would essentially be useless. This is an important distinction between mandates to
keep information secure versus any reporting mandate to provide useful information. It means
that methods used to secure Census data may not be appropriate under a detail level zero time-
lag continuous reporting environment that uses corporate websites. The next section will look
alternative approaches that could prevent inferences from disclosed data and overcome the
problems of the perturbation methods of SDC.
V Query Inferences and Continuous Reporting The statistical databases discussed in the previous section have certain functional
relationships contained within the data itself. For instance, a table such as the one with total
salaries by department and division would have totals that are functionally dependent on values
from other cells. The cell perturbation methods discussed previously can prevent someone
querying the database from using knowledge of the functional relationships to infer the value of
certain cells. While this does prevent sensitive data from disclosure, it can also prevent certain
types of analysis that might be of interest to potential users of the system. For instance,
perturbing sales information might also prevent an analysis of average sales per store.
Any future mandate from the SEC for continuous reporting would probably look toward
providing more data about the operations of the firm as this would be of interest to investment
Formatted: Space Before: 12 pt,After: 12 pt
Formatted: Font: 14 pt
Deleted: Of course t
Comment [A35]: Yo9u probably need a conceptual jump here where u create some form of differentiation between needed / required data and illustrative date of sorts.. again breaking it down in discrete sub or sub sub sections would help
Deleted: Regardless of the specific designation of either required or sensitive the c
Deleted: the
Deleted: This
Deleted: at the issue of inferences and inductive learning and some possible approaches to keep functional relationships in this type of data secure.
Deleted: IV
Deleted: .
Comment [A36]: Again u need a conceptual jump as there are tradeoffs here
21
decision makers. Dinur and Nissim (2003), Cox (2005), and Steel (2004) present methods to
preserve certain qualities of the data so that analysis can still be carried out. In order to allow the
analysis, certain knowledge of the types of requests needs to be determined. The structure of
financial reports does presume certain types of decisions and the usefulness of information for
those decisions. However, as more detailed information becomes available there would need to
be a much deeper understanding of the different inferences that can be made to make sure those
inferences are preserved while keeping sensitive information from being disclosed. This analysis
would also allow for the development of more directed methods to protect sensitive information
while still disclosing the information beneficial to investors. There are two distinct methods to
determine what the possible inferences from a set of queries. One is to maintain a query history
and the other is to formulate inference channels (Staddon, 2003; Woodruff & Staddon, 2004).
Inference Channels Staddon (2003) presents a method to develop inference channels and create encryption
keys for objects within the channel. An inference channel can be conceptualized as a path of
information that when traversed allows for a previously unknown piece of information to be
inferred. In this approach, users are provided with tokens to query the database. When a user
performs a query and obtains an object in the inference channel, a token is used. The set of
available tokens is reduced, and thus information they can obtain is also reduced. The inference
channel will not be compromised with appropriate allocation of tokens to users; as tokens will be
used up before all of the objects in the channel are known. For instance, if there are eight pieces
of information required to acquire a certain piece of sensitive information, then users will be
supplied with seven tokens. In the manager salary example this would mean that a token would
be used when total salaries were requested and then another when salary for a specific
Formatted: Font: 13 pt
Deleted: Cox (2005)
Deleted: s
Deleted: some
Deleted: of
Deleted: ing
Deleted: certain types of
Deleted: uses of the data
Deleted: This essentially requires an understanding what inferences can be made from a set of information and when sensitive information can be inferred.
Deleted: on an object in the
Deleted: and their
Deleted: ,
22
department and so on; the tokens would be used before the intersection became a single
employee. This system can also be collusion resistant. As a sufficient number (sufficient can be
determined by the possible connection of users and the sensitivity of the underlying data) of
tokens have been used to query a specific object in the inference channel, the object is
considered to be in the public domain and the token for this object will be considered used by all
users. Thus, a user cannot preserve certain tokens, query other objects in the channel, and share
that information. An inference channel protection scheme must be able to keep users from
working in collusion by keeping a record of objects in the inference channel that have been
queried. An inference protection scheme is consider c-collusion resistant if c users working in
collusion are unable to query all the objects in an inference channel.
Information Required for Inference Channels To implement security based on the inference channels two different pieces of
information must be determined.. The first is the identification of protected or sensitive
information. Management must determine what knowledge needs protection and the level of
protection. Secondly, the objects in the inference channel need specification. This requires a
complete specification of the data, which once known will allow a user to infer the value of a
sensitive piece of information. This scheme also assumes that the underlying database is static.
If one of the objects in the channel changes, a reevaluation of the tokens already used is required.
For instance, if one of the objects in the inference channel was the commission rate paid to
salespeople and the rate changes then technically the person has not used the token for that object
and it should be “refunded”. The inability to refund tokens and the requirement to define the
inference channel or path are drawbacks of this approach. A final drawback of the inference
channel approach is the functional relationships within a firm’s data.
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single
Formatted: Font: Not Bold
Deleted: as
Deleted: . T
Deleted: For a
Deleted: the protection
Deleted: approach
Deleted: prior to the deployment of the system
Deleted: requirement
Deleted: This requires m
Deleted: to
Deleted:
Deleted: objects in the system,
Deleted: along
Deleted: with
Deleted: to acquire or to disclose a piece of information is
Deleted: certainly
Deleted: to
Comment [MV37]: This is extremely interesting but is a huge paragraph.. no one can follow it and keep sane.. cau further break it down with 5th level sus titles? Or at least highlight the issues
23
Information in a corporate database is correlated in many ways. Changes to accounts
receivable is correlated to the level of sales which is correlated to the level of inventory and so
on. Thus there may be alternative paths to sensitive information. With the inference channel
approach, all of these alternative paths to the information also need protection. The method of
assigning tokens to a channel would have to consider these correlated channels to protect any
sensitive information. The level of correlation would need to be specified as for certain
inferences a level of certainty may be quite sufficient. For instance, it may be sufficient to know
planned purchases of raw materials within a few weeks. An alternative approach, the history
approach, combined with an inductive learning system may prove to be a better solution to
protecting sensitive information in a continuous reporting environment.
Inductive Learning and Query History Asking questions and then combining the information obtained is a very natural way in
which humans learn about the world. Inductive learning systems combine information to
formulate different theories about the underlying systems providing the answers. A corporate
information system that allows continuous reporting can be viewed similarly. The match is even
more evident when the activities of the firm are the source of information, and the intent of an
investor is to learn about the operations of the firm that are producing the data. The fact that the
activities can be described functionally adds further support to this view. Production functions
and purchasing decisions certainly fit into this model. An argument can be made that these
functional relationships might be considered sensitive. For instance, the disclosure of a
company’s production function would allow competitors to anticipate purchasing activities and
understand limits on prices. A continuous reporting system with access to individual
transactions could allow a user to learn the underlying functional relationships in a firm in much
Formatted: Font: (Default) TimesNew Roman, Bold
Formatted: Font: 13 pt
Formatted: Font: Not Bold
Deleted: often
Deleted: For instance,
Deleted: c
Deleted: T
Deleted: system
Comment [A38]: Is this a direct cite? Where di du get this whole thing? You have to find an easier way to explain and illustrate this for the JETA audience… do not copy from the query literature explain in brief terms and illustrate and refer to it exactly (author, date and page number)
Deleted: to review information provided overcomes the requirement to specify all possible ways to infer a piece of sensitive information and the change to information of inference channels.
Deleted: n
Deleted: , particularly
Deleted: you consider that
Deleted: link is even more realistic when the
Deleted: Certainly,
Deleted: an
24
the same way that inductive learning systems can learn other types of functional forms.
Osherson et al. (1982) formally describe a learning system in which a passive learner
gathers observations from a natural system. Under this model of learning, the observer
reformulates their representation/hypothesis of the function that governs the behavior of the
system. While the observer can never prove that the function generating the data matches their
hypothesized function, as more observations confirm their hypothesis they become more certain
that their view is correct. All it takes to reject their hypothesis (or to reformulate it) is an
observation that does not fit into the hypothesis. The revision process (Sloan and Turán, 1999)
requires the learner to incorporate previous observations with each new observation. Stability
occurs when the learner does not revise their hypothesis because of a new observation(s) (Martin
& Osherson, 2003). In each case, the learner is attempting to discover underlying functional
relationships that are generating the data. For this type of system, the observer must decide
which data to observe and to include in their hypothesis (re)formulation.
Continuous Reporting and Inductive Learning
A person obtaining information from a continuous reporting system faces a problem
similar to an inductive learning system. The assumption of a passive learner is discarded in
favor of an active model in which the investor/learner is requesting specific data (observations)
to support their hypothesis or investment decision. In a continuous reporting environment the
investor/learner makes a series of requests and then incorporates the results. The history of these
continuous reports represents what the investor/learner can infer. While the inductive learning
from the query history approach does not require knowledge of the users’ intent, understanding
the objective can help in determining the uncertainty required (Blum, et. al., 2008). For instance,
someone interested in timing raw material purchases to take advantage of demand swings may
Formatted: Font: 12 pt
Formatted: Heading 4, Line spacing: single
Formatted: Normal, Space Before: 12 pt, After: 12 pt, Line spacing: Double, Don't adjust space betweenLatin and Asian text
Comment [A39]: If u are going to a different issue change sub heading now.. I do not think this is clear to the reader of the implications for CR
Deleted: While the inference channel protection approach is computationally easier these two issues, changes in the underlying data and correlated objects, makes an approach, which looks at what can be inferred from a history of queries an approach with more promise. An inductive learning approach to query history review may be a promising approach to restricting users from obtaining secure information through various inference methods. ¶
Deleted: Osherson, Stob, & Weinstein,
Deleted: the
Deleted: is
Deleted: similar problem
Deleted: First, the
Deleted: that will help
Deleted: users
25
need a different level of certainty than someone interested in learning the quality level of a
production process. The investor/learner will generally have a very specific view of the types of
observations that will allow them to infer the underlying functional relationships and direct their
search in these areas. Thus, a query history must look at what can be inferred rather than what
tokens have been used. This eliminates the need for understanding all possible paths only what
can be inferred from previously obtained information. In order to restrict a response to a query
the inductive-learning approach looks at relationship of the information to sensitive data9.
Therefore an intelligent investor that finds a unique inference channel cannot circumvent the
protection management has placed on sensitive information. One deficiency of the inference
channel approach was the problem of “refunding” tokens when the underlying information
changed. This problem can be overcome by using strategies employed by dynamic search
engines.
Query Histories and Dynamic Data As a firm operates over time, data will change but there is a chance that certain functional
relationships will not. Total sales will change, but the average sales or the gross margin percent
might not. Thus inferences based on totals may no longer be valid while those based on averages
or percents could still be valid. To identify specific queries that are no longer valid specific
characteristics of the data already provided must be calculated. To make a query based
continuous reporting system useable these changes must be monitored and their impact on query
histories determined.
9 Keefe et. al. (1989) look at creating sensitivity levels within a database and reviewing query histories on those sensitive elements in the database.
Formatted: Font: 12 pt
Formatted: Heading 4
Formatted: Font: 12 pt
Formatted: Font: Not Bold
Deleted: s
Deleted: an
Deleted: of
Deleted: inference
Deleted: This means that
Deleted: of
Deleted: s’ determination of what is
Deleted: There are other approaches that can be used in conjunction with the history to prevent disclosure of sensitive information. These include the restricting or altering the capabilities of the query system and including direct inference mechanism in the query system itself.
Deleted: ¶
26
Previous work on monitoring dynamic web pages (Pandey, Ramamritham, &
Chakrabarti, 2003; Garg, Ramamritham, & Chakrabarti, 2004) use certain probability functions
to determine whether a web page needs to be revisited. Their methods have been used to review
information on changes to a hurricane’s course and when weather websites need to be revisited.
Certainly, the stream of events that could change sales in a corporate database is more
deterministic than the stream of events that could change a hurricane. Other approaches such as
adding security constructs to cookie policies might also be able to determine when information is
no longer valid (Shankar and Karlof, 2006). Partitions to corporate databases along certain
dimensions could identify the series of events that change information about sales or other
sensitive objects with a high degree of certainty. Including mechanisms like these to delete
invalid queries in the history would allow a continuous reporting system to answer previously
blocked queries. Maintaining the query history can keep sensitive information from being
disclosed. However, there are ways to alter the structure of the query system itself to augment
the history approach and limit the types of inferences that can be made.
Altering the Capabilities of the Query System Previously the characteristics of the query system were presented. Altering these
characteristics can also restrict the ability of a learner/investor to discover structures of the
underlying system. Gasarch and Smith (1992) prove that query systems with certain capabilities
have the learning potential (ability to learn certain functions) that is equivalent to passive
induction inference machines. They show that by including (or removing) certain capabilities
the query system can improve (or reduce) the learning capability as compared to the capability of
a passive inductive learning machines. A simple example would be that by eliminating time
parameters from a query system would eliminate a rather large set of information that could be
Formatted: Font: 12 pt
Formatted: Font: 12 pt
Formatted: Heading 4
Formatted: Font: 12 pt
Formatted: Font: Not Bold
27
obtained and therefore a large number of inferences that could be made. The removal of
statistical capabilities would eliminate another large set of queries and again reduce the
information that could be obtained. Another possible capability to remove would be the ability
to ask join queries; sales organized by state10. Gasarch and Smith (1992) have shown that
including recursive queries will expand the learning capability of the system. Standard deviation
of sales in a particular region for a particular time period is an example of such a query11. With
each restriction in the capabilities of the query system itself certain types of information cannot
be obtained and therefore inferences that depend on this information are eliminated. Thus
combining an inductive learning query history approach with a query language that has a reduced
set of capabilities can make it difficult12 to obtain sensitive information.
Implications of Query History and Inductive Learning for Continuous Reporting
Much of the work in understanding how inferences can be made looks at how to build
computational models of inductive learning. This work has focused on characteristics of
learning systems (Osherson, Stob and Weinstein 1982), what it means to learn (Sloan and Turán,
1999), and what it means to know (Martin and Osherson, 2003). This work has implications for
restricting access to sensitive information in a continuous reporting environment. First, the
investor has a goal of learning about the underlying systems in the corporation. Second, the
investor trys to obtain information that supports this goal. A continuous reporting system
becomes the vehicle by which the investor interacts with the corporate information system.
10 This is a join query because the information about the state would require joining the sale to the customer and then grouping the sales by the state the customer lives in.
11 This is recursive because the results of a query are used in a subsequent query.
12 It might be tempting to conclude that removal of capabilities would make certain inferences impossible. However, research has shown that methods thought to be completely secure in the end are only “partially” secure.
Formatted: Font: 13 pt
Formatted: Font: (Default) TimesNew Roman
Formatted: Font: 13 pt
Formatted: Font: (Default) TimesNew Roman
Formatted: Font: Not Bold
Formatted: Do not check spelling orgrammar
Deleted: Altering the Structure of the Query System¶
When an investor is requesting information corporate information from a database that contains transaction level data there are certain data elements and functional relationship that would be specified as sensitive. Maintaining a history of what has been disclosed can determine all inferences that can be made with the given set of data. However, altering the structure of the query system can also keep certain functional relationships from being divulged. Gasarch and Smith (1992) prove that query systems with certain capabilities have the learning potential (ability to learn certain functions) that is equivalent to passive induction inference machines. However, by including (or removing) certain capabilities the query system can improve (or reduce) the learning capability as compared to passive inductive learning machines. For instance, the ability to ask recursive queries expands the learning capability of the system. What is the standard deviation of sales in a particular region for a particular time period is an example of such a query. In addition restricting the types of functions available to the system can prevent certain relationships from disclosure. This approach would be appropriate if all functional relationships within the set of secure information were
Comment [A40]: Again this is very muddled and not clear what u are saying about CR….
Deleted: Query Histories and Dynamic Data¶
As the firm operates over time, there is a chance that not only will the data
Deleted: 13. A query history could identify the series of events that change information about sales with a high degree of certainty. By identifying, the
Deleted: systems
Deleted: (Osherson, Stob, & Weinstein, 1982)
Deleted: ) ,
Deleted: and
Deleted: &
... [4]
... [5]
... [6]
28
Finally, the investor will ask for information until their view becomes stable; additional
information does not change their understanding of the information system. There are two
fundemental differences. The first is that the inductive learning model does not assume any prior
knowledge of the underlying system to be learned. It could be learning a language for the first
time. In a continuous reporting environment the assumption is that the user has some knowledge
of the company. Investors understand revenues and expenses. The other difference is the
inductive learning research is concerned with understanding the process, the goal of a continuous
reporting system is to stop learning before sensitive information is disclosed. Neither of these
differences would preclude the use of inductive learning to develop systems to protect sensitive
information; in fact this approach has been used in static databases (Blum, Ligett and Roth 2008
and Nabar, et al. 2006). Corporate databases do present a special challenge as the underlying
database does change over time. This constant change makes it difficult to protect sensitive
information and still provide new information under the inference channel approach. Another
problem with inference channels is that not only the best path to the sensitive information must
be understood, but also all possible paths. Query histories overcome these problems by looking
at what can be inferred from the investor’s set of queries.
V Summary and Conclusions Under the current reporting environment investors must combine information from many
different sources and make conjectures about their validity. Investors are continually interested
in obtaining more information that is timelier. A continuous or real time reporting environment
promises to address some of these issues. There are always technological advances that can
implement these visions. Any enhanced reporting environment can be characterized by an
increase in the amount of data, a higher level of detail, the time-lag between the event and it
Formatted: Do not check spelling orgrammar
Formatted: Do not check spelling orgrammar
Formatted: Font: (Default) TimesNew Roman, Bold
Formatted: Left
Formatted: Font: Not Bold
Deleted: This work looks a revision based on new knowledge. This new knowledge comes from observations.
Deleted: between these systems and a system that would allow users to request information about a corporation and yet still keep from disclosing sensitive data.
Deleted: For example, the user
Deleted: s
Deleted: looks at how this process takes place, the goal of this approach in continuous reporting environment is to
Deleted:
Deleted: Because the goal of the inductive learning system is to understand what can be learned from information provided it would seem to provide a better approach to protecting sensitive information in a continuous reporting environment.
Deleted: I
Deleted: s assume knowledge of the best way to combine information while query histories and inductive learning does not. This implies a model of security where managers define what knowledge should not be available as opposed to what data should not be disclosed.
Deleted: The SEC
Comment [MV41]: A page and a half paragraph.. ouch.. maybe here you want to in an organized way what you did and that will help you to fix the rest of the paper and ht eabstract.. the more I read it the more I like it but it is still a mess .. not a big huge mess as before…
Deleted: has indicated interest in using corporate websites as a vehicle to disclose corporate information. This combined with a vision from the XBRL community for a continuous reporting environment requires academicians and practitioner to consider the implications of such an environment. On primary issue concerns the characteristics of such
Deleted: the
Deleted: the
Deleted: and
... [7]
29
availability, and finally the query system that creates the reports. The move toward real time
reporting will affect participants in the reporting process.
This continuous reporting environment places additional pressure on investors, auditors,
and managers. Investors, because in this environment they must create their own reports as
opposed to simply accepting traditional ones. Their information usage promises to change and it
becomes a new area of concern. Auditors, because they must be prepared to review both the
adequacy of measures used by firms to make sure information is continually available and the
materiality with respect to the system’s response. Finally, managers because they must be
prepared to deal with different types of disclosure of corporate information and will be required
to determine what information should be considered sensitive.
A continuous reporting environment would allow users to make inferences about the
operations of the firm in ways not envisioned in the current reporting environment. The potential
ability to ask questions of operational data would mean that sensitive information could be
inadvertently disclosed. Both the U.S Census Bureau and hospitals also face this challenge as
users query their underlying data. The inferences possible from these queries are anticipated and
restrictions are placed using statistical disclosure controls before the sensitive information is
disclosed. There are some issues that make these SDCs not well suited for continuously reported
corporate data and therefore other methodologies are explored. The use of inference channels
and inductive learning systems with query histories offer some advanced capabilities to protect
sensitive information. Certain characteristics of corporate information seem to make an
inductive learning approach better suited to protect sensitive information. The major issue for
practitioners and academics is to understand and make informed choices about these issues
Deleted:
Deleted: also
Deleted: some
Deleted: as they
Deleted: what has
Deleted: ly been provided
Deleted: The a
Deleted: must
30
before the systems are used. Clearly, additional investigation is required to guide the adoption of
real time reporting.
Deleted: systems that provide the information as opposed to the information itself. Issues such as disclosure take on a different meaning when the system to provide this disclosure is a website that can become unavailable. In addition materiality takes on a different meaning when there a many different user created reports as opposed on one general report. If the continuous reporting environment evolves into one in which detailed information becomes available on a zero time-lag bases the investor community will have to resolve issues such as what information is required, what is desired, and what should declared off limits because it would compromise the future of the firm. Given that this decision has been made, and this is certainly not a trivial endeavor, systems must be put in place to protect sensitive data from disclosure while still allowing disclosure of other information. There are basically two methods to keeping information secure while still providing users with the capability to use a query system to request information. The first is the inference channel approach which assigns tokens to each step in the chain of information required to make an inference. This is computationally easier but suffers from problems including correlated information and refunding of tokens for information that has changed. The query history approach is more complicated, but would allow for revisions based on a learning model. The emergence of new technologies and calls to use this technology to support the users of financial information requires an investigation not only of all the implications of these changes, but also possible approaches to alleviate some of the problems that might result. This paper has explored some of the characteristics of a continuous reporting environment. As with any new technology there are benefits to be gained as well a challenges to be considered.
31
Works Cited Blum, Avrim, Katrina Ligett, and Aaron Roth. "A Learning Theory Approach to Non-Interaction Database Privacy." Proceedings of the 40th annual ACM Symposium on Theory of Computing. Victoria, British Columbia, CA: ACM Press, 2008. 609-617.
Bray, David A, Ramnath K. Chellappa, Benn R. Konsynski, and Dominic M. Thomas. "Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality." Twenty-Eighth International Conference on Information Systems. Montreal, CA: Association for Information Systems, 2007. 1-10.
Cox, Lawrence H. "Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data." Prooceedings of Statistics Canada Symposium 2005: Methodological Challenges for Future Information Needs. Ottawa, CA: Statistics Canada, 2005.
De Jonge, Wiebren. "Compromising Statistical Database Responding to Queries about Means." ACM Transactions on Database Systems, 1983: 66-80.
Denning, Dorothy E. "Are Statistical Data bases Secure?" National Computer Conference. Washington, DC: ACM Press, 1978. 525-530.
Denning, Drorthy E., and Peter J. Denning. "Data Security." ACM Computing Surveys, 1979: 227-249.
Dinur, Irit, and Kobbi Nissim. "Revealing Information while Preserving Privacy." Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of Database Systems. San Diego, CA: ACM Press, 2003. 202-210.
Dragovic, Boris, and Jon Crowcroft. "Information Exposure Control through Data Manipulation for Ubiquitous computing." Proceedings of the 2004 Workshop on New Security Paradigms. Nova Scotia, CA: ACM Press, 2004. 57-64.
Garg, Shaveen, Krithi Ramamritham, and Soumen Chakrabarti. "Web-CAM: Monitoring the Dynamic Web to respond to Continual Queries." Proceedings of the 2004 ACM SIGMOD International Conference on Managementof Data. Paris: ACM Press, 2004. 927-928.
Gasarch, William I., and Carl H. Smith. "learning via Queries." Journal of the Association for Computing Machinery, 1992: 649-674.
Formatted: Font: Not Bold
Formatted: Check spelling andgrammar
Formatted: Font: Italic, Checkspelling and grammar
Formatted: Check spelling andgrammar
Formatted: Check spelling andgrammar
Formatted: Check spelling andgrammar
Formatted: Check spelling andgrammar
Deleted: Bray, D. A., Chellappa, R. K., Konsynski, B. R., & Thomas, D. M. (2007). Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality. Twenty-Eighth International Conference on Information Systems. Atlanta, GA: Association for Information Systems.¶Chin, F., & Ozsoyoglu, G. (1981). Auditing for secure statistical databases. Proceedings of the ACM '81 conference (pp. 53-59). New York: ACM Press.¶Cox, L. (2005). Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data. Symposium 2005 : Methodological Challenges for Future Information Needs. Ottawa, Canada: Statistics Canada.¶Crawford, R., Bishop, M., Bhumiratana, B., Clark, L., & Levitt, K. (2006). Sanitization models and their limitations. Proceedings of the 2006 workshop on New security paradigms (pp. 41-56). ACM Press.¶Denning, D. E. (1978). Are Statistical Data Bases Secure? National Computer Conference, (pp. 525-530). Washington, DC.¶___________ & Denning, P. J. (1979). Data Security. ACM Computing Surveys , 227-249.¶Department of Health and Human Services. (2007, September 13). Office for Civil Rights - HIPAA. Retrieved November 11, 2007, from United States Department of Health and Human Services: ¶Dragovic, B., & Crowcroft, J. (2004). Information exposure control through data manipulation for ubiquitous computing. Proceedings of the 2004 workshop on New security paradigms (pp. 57-64). Nova Scotia, Canada: ACM Press.¶Garg, S., Ramamritham, K., & Chakrabarti, S. (2004). Web-CAM: monitoring the dynamic Web to respond to continual queries. Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 927-928). Paris, France: ACM Press.¶... [8]
32
Hawala, Sam. Microdate Disclosure Protection Research and Experineces at the US Census Bureau. Research Report, Washington, DC: United States Census Bureau, 2003.
Hawala, Sam, Laura Zayatz, and Sandra Rowland. American FactFinder: Disclosure Limitationfor the Advanced Query System. Research Report, Washington, DC: United States Census Bureau, 2004.
Litan, R.E., and P.J. Wallison. Corporate Disclosure int he Internet Age. Working Paper, AEI-Brookings Joint Center, 2000.
Martin, Eric, and Daniel Osherson. "Scientific Discovery from the Point of View of Acceptance." Inductive Logic. May 1, 2003. http://www.princeton.edu/~osherson/IL/essay1.pdf (accessed December 11, 2006).
Massel, Paul B. Comparing Statistical Disclosure Control Methods for Tables: Identifying Key Factors. Research Report, Washington, DC: United States Census Bureau, 2004.
_______. Comparing Ways of Using "Protection Flow" to Protect Magnitude Data Tables from Disclosures. Research Report, Washington, DC: United States Census Bureau, 2005.
_______. "Statistical Disclosure Control for Tables: Determining Which Method to use." Symposium 2003: Challenges in Survey Taking for the Next Decade. Ottawa, CA: Statistics Canada, 2003. 2-12.
Nabar, Shubha U., Bhaskara Marthi, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwan. "Towards Robustness in Query Auditing." Proceedings of the 32nd Internation Converence on Very Large Data Bases. Seoul, South Korea: ACM Press, 2006. 151-162.
Nelson, Ruth. "What is a Secret and What does that have to do with Computer Security?" Proceedings of the 1994 workshop on New Security Paradigms. Little Compton, RI: ACM Press, 1994. 74-79.
Office of Civil Rights. "Medical Privacy - National Standards to Protect the Privacy of Personal Health Information." United States Department of Health and Human Services. 2007. http://www.hhs.gov (accessed November 11, 2007).
Osherson, Daniel, Michael Stob, and Scott Weinstein. Systems that Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists. Cambridge, MA: The MIT Press, 1982.
Pandey, Sandeep, Krithi Ramamritham, and Soumen Chakrabarti. "Monitoring the Dynamic Web to respond to Continuous Queries." Proceedings of the 12th Internation Conference on the World Wide Web. Budapest, HU: ACM Press, 2003. 659-668.
Richardson, Vernon J., and Susan Scholz. "Coporate Reporting and the Internet: Bision, Reality, and Intervening Obstacles." Pacific Accounting Review, 1999/2000: 67-75.
33
Schwartz, M.D., Dorothy Denning, and Peter Denning. "Linear Queries in Statistical Databases." ACM Transactions on Database Systems, 1979: 156-167.
Shankar, Umesh, and Chris Karlof. "Doppelganger: Better Browser Privacy with the Bother." Proceedings of the 13th ACM Conference on Computer and Communications Security. Alexandria, VA: ACM Press, 2006. 154-167.
Sicherman, George L., Wiebren De Jonge, and Reind P. Van de Riet. "Answering Queries without Revealing Secrets." ACM Transactions on Database Systems, 1983: 41-59.
Sloan, Robert H., and Győrgy Turán. "On Theory Revision with Queries." Proceedings of the 12th Annual Conference on Computation Learning Theory. Santa Cruz, CA: ACM Press, 1999. 41-52.
Staddon, Jessica. "Dynamic Inference Control." ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA: ACM Press, 2003. 94-100.
Steel, Philip M. A New Estimation for the Number of Unique Population Elements based on the Observed Sample. Research Report, Washington, DC: United States Census Bureau, 2004.
T.F., Keefe., M.B. Thuraisingham, and W.T. Tsai. "Secure Query-Processing Strategies." IEEE Computer, 1989: 63-70.
United States Department of Health and Human Services. "Health Insurance Portability and Accountability Act." Department of Health and Human Services. September 2007. http://www.hhs.gov/ocr/hipaa/ (accessed November 11, 2007).
United States General Laws. Sarbanes-Oxley Act (SOX). Public Law no. 107-204, Washington, DC: Government Printing Office, 2002.
United States General Laws. U.S. Code Title 13 Chapter 1. Washington, DC: United States Government.
United States Securities and Exchange Commission. Interactive Data: Putting Technology to work for the American Investor. 2007. http://www.sec.gov/spotlight/xbrl/interactivedata.htm (accessed August 20, 2007).
United States Securities and Exchange Commission. Progress Report of the Advisory Committee on Inprovements to Financial Reporting. Committee Report, Washington, DC: United States Securities and Exchange Commission, 2008.
Vasarhelyi, Miklos A., and Michael Alles. The Galileo Disclosure Model. http://raw.rutgers.edu/Galileo/ (accessed August 19, 2008).
34
Weitzenboeck, Emily M. "Enterprise Security: Legal Challenges and Possible Solutions." Proceedings of the Tenth IEEE Internation Workshops on Enability Technologies: Infrastructure for Collaborative Enterprises. Los Alamitos, CA: IEEE Computer Science Press, 2001. 183-188.
Woodruff, David, and Jessica Staddon. "Private Inference Control." Proceedings of the 11th ACM Conference on Computer and Communications Security. Washington, DC: ACM Press, 2004. 188-197.
XBRL International. Global Ledger Taxonomy - An Introduction . http://www.xbrl.org/GLTaxonomy/ (accessed August 8, 2008).
—. XBRLTaxonomies. http://www.xbrl.org/Taxonomies/ (accessed August 18, 2007).
Xiao, Zezhong, Jones, Michael J., and Andrew Lymer. "Immediate Trends in Internet Reporting." European Accounting Reivew, 2002.
Yazdanian, Kioumars, and Frédéric Cuppens. "Neighborhood Data and Database Security." Proceedings f the 1992-1993 Workshop on New Security Paradigms. Little Compton, RI: ACM Press, 1993. 150-454.
Zhang, Nan, Wei Zhao, and Jianer Chen. "Cardinality-Based Inference Control in OLAP Systems: An Information Theoretic Approach." Proceedings of the 7th ACM Internation Workshop on Data Warehousing and OLAP. Washington, DC: ACM Press, 2004. 59-64.
35
Deleted: ¶¶¶¶¶¶. ¶¶¶¶
Page 18: [1] Deleted gfgal 8/26/2008 4:46:00 PM
For instance, a query about total payroll by division or department would result in a table
similar one of the census data tables. To restrict discovery of data about individual
salaries application of the same cell restriction methods could conceal salaries in
departments with a small number of employees, but because of the different nature of
corporate data, there are situations in which a user could still obtain information about
individual data. For instance, if a user[MV1] knew that a person was to be hired into a
department, then running the query about departmental payroll before and after the hiring
date would allow someone to infer the salary of the new hire even in a department where
the number of employees made cell perturbation unnecessary. This is a problem with
detailed zero time-lag data that someone with knowledge of the activities of the firm can
make inferences about sensitive data without ever requesting sensitive information.
Page 18: [2] Comment [A34] Author 8/10/2008 11:37:00 AM Again this whole thing is very interesting but you ramble on with no organization.. break it in sub sections with a couple of paragraphs and show a simple easy example for each and it will be very good
Page 18: [3] Deleted gfgal 8/27/2008 2:09:00 PM
Basically one might think of this as constructing a series of sets until the intersection
becomes a single member.
Page 27: [4] Deleted gfgal 8/26/2008 9:29:00 AM
Altering the Structure of the Query System
When an investor is requesting information corporate information from a database
that contains transaction level data there are certain data elements and functional
relationship that would be specified as sensitive. Maintaining a history of what has been
disclosed can determine all inferences that can be made with the given set of data.
However, altering the structure of the query system can also keep certain functional
relationships from being divulged. Gasarch and Smith (1992) prove that query systems
with certain capabilities have the learning potential (ability to learn certain functions) that
is equivalent to passive induction inference machines. However, by including (or
removing) certain capabilities the query system can improve (or reduce) the learning
capability as compared to passive inductive learning machines. For instance, the ability
to ask recursive queries expands the learning capability of the system. What is the
standard deviation of sales in a particular region for a particular time period is an
example of such a query. In addition restricting the types of functions available to the
system can prevent certain relationships from disclosure. This approach would be
appropriate if all functional relationships within the set of secure information were
known, and therefore the lack of these capabilities in the query system would be
sufficient to keep the information secure. However, the query system would also need to
review the history of responses to be certain that the security requirements are
maintained. For instance, the sales by region had already been disclosed.
To keep certain information secure the query system must be able to generate all
functional relationships that can be derived from data that have already been provided.
This history of information provided, query results, determines what the user already
knows about the underlying system. In order to keep information secure a query system
must be able to stop responding if the user could use information provided in the next
query to obtain information determined to be sensitive. Continuous reporting systems
add another layer of complication to the learner trying to infer functional relationships.
In previous models of query systems, there was always an assumption that the underlying
data or functional relationships were static. In a firm, this assumption might not hold,
and this plays a key role in using the query history to keep sensitive information from
being disclosed.
Page 27: [5] Deleted gfgal 8/25/2008 11:41:00 AM
Query Histories and Dynamic Data As the firm operates over time, there is a chance that not only will the
data change but also certain functional relationships will change. This would
mean that responses to previous queries may no longer be appropriate and
the inferences might be invalid as well. To identify specific queries that are
not longer valid specific characteristics of the data already provided must be
calculated. For instance, the total for a particular data element could change
regularly (sales), but other characteristics, such as mean or standard
deviation, might not change significantly and therefore the data’s support for
a particular inference might be still valid. To provide security from
disclosure a query system must not only be able to determine what can be
inferred, but also, what has changed, the significance of the changes, and any
changes to the certainty for values of sensitive information. [A2]
Previous work on monitoring dynamic web pages (Pandey, Ramamritham, & Chakrabarti, 2003; Garg, Ramamritham, & Chakrabarti, 2004) use certain probability functions to determine whether a web page needs to be revisited. Rather than continually, reviewing the underlying corporate data for changes this approach might yield better results in the continuous reporting environment. Certainly, the stream of events that could change an object’s values
in a corporate database is more available and deterministic than the stream of events that could change information on a web page. For instance, the possible ways in which sales information could change are better understood than the possible ways a hurricane might change course
Page 27: [6] Deleted gfgal 8/25/2008 11:41:00 AM
1. A query history could identify the series of events that change information
about sales with a high degree of certainty. By identifying, the information
that is no longer valid responses to information requests previously blocked
could be allowed.
Page 28: [7] Deleted gfgal 8/26/2008 11:42:00 AM
has indicated interest in using corporate websites as a vehicle to disclose corporate
information. This combined with a vision from the XBRL community for a continuous
reporting environment requires academicians and practitioner to consider the implications
of such an environment. On primary issue concerns the characteristics of such an
environment. These[MV3] characteristics include
Page 31: [8] Deleted gfgal 8/28/2008 3:30:00 PM
Bray, D. A., Chellappa, R. K., Konsynski, B. R., & Thomas, D. M. (2007). Balancing Knowledge Sharing with Knowledge Protection: The Influence of Role-Criticality. Twenty-Eighth International Conference on Information Systems. Atlanta, GA: Association for Information Systems.
Chin, F., & Ozsoyoglu, G. (1981). Auditing for secure statistical databases. Proceedings of the ACM '81 conference (pp. 53-59). New York: ACM Press.
1 This was the example used by Pandey, Ramamritham, & Chakrabarti ( 2003) and certainly required a more complex approach to monitoring changes than changes to any corporate object.
Cox, L. (2005). Quality-Preserving Controlled Tabular Adjustment: A Method for Resolving Confidentiality and Data Quality Issues for Tabular Data. Symposium 2005 : Methodological Challenges for Future Information Needs. Ottawa, Canada: Statistics Canada.
Crawford, R., Bishop, M., Bhumiratana, B., Clark, L., & Levitt, K. (2006). Sanitization models and their limitations. Proceedings of the 2006 workshop on New security paradigms (pp. 41-56). ACM Press.
Denning, D. E. (1978). Are Statistical Data Bases Secure? National Computer Conference, (pp. 525-530). Washington, DC.
___________ & Denning, P. J. (1979). Data Security. ACM Computing Surveys , 227-249.
Department of Health and Human Services. (2007, September 13). Office for Civil Rights - HIPAA. Retrieved November 11, 2007, from United States Department of Health and Human Services:
Dragovic, B., & Crowcroft, J. (2004). Information exposure control through data manipulation for ubiquitous computing. Proceedings of the 2004 workshop on New security paradigms (pp. 57-64). Nova Scotia, Canada: ACM Press.
Garg, S., Ramamritham, K., & Chakrabarti, S. (2004). Web-CAM: monitoring the dynamic Web to respond to continual queries. Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 927-928). Paris, France: ACM Press.
Gasarch, W. I., & Smith, C. H. (1992). Learning Via Queries. Journal of the Association for Computing Machinery , 649-674.
Hawala, S. (2003). Microdata Disclosure Protection Research and Experineces at the US Census Bureau. Washington, DC: Bureau of the Census.
____________., Zayatz, L., & Rowland, S. (2004). American FactFinder; Disclosure Limitation for the Advanced Query System. Washington, DC: Bureau of the Census.
Jonge, W. D. (1983). Compromising Statistical Databases Responding to Queries about Means. ACM Transactions on Database Systems , 60-80.
Litan, R. E., & Wallison, P. J. (2000). Corporate Disclosure In The Internet Age . AEI-Brookings Joint Center Working Paper No. . 00-07 .
Martin, E., & Osherson, D. N. (2003, May 1). Scientific Discovery from the Point of View of Acceptance. Retrieved December 11, 2006, from http://www.princeton.edu/~osherson/IL/essay1.pdf
Massel, P. B. (2005). Comparing ways of using "Protection Flow" to Protect Magnitude Data Tables from Disclosure. Washington, DC: Bureau of the Census: Disclosure Limitation Research Group.
___________. (2004). Comparing Statistical Disclosure Control Methods for Tables: Identifying Key Factors. Washington, DC: Bureau of the Census: Disclosure Limitation Research Group.
___________. (2003). Statistical Disclosure Control for Tables; Determining Which Method to Use. Symposium 2003: Challenges in Survey Taking for the Next Decade (pp. 2-12). Ottawa, Canada: Statistics Canada.
Nelson, R. (1994). What is a secret—and—what does that have to do with computer security? Proceedings of the 1994 workshop on New security paradigms (pp. 74-79). Little Compton, Rhode Island, United States: ACM Press.
Office of Civil Rights. (2007). Medical Privacy - National Standards to Protect the Privacy of Personal Health Information. Washington, DC: United States Department of Health and Human Services.
Ogawa, H., Fu, K. S., & Yao, J. T. (1984). Knowledge representation and inference control of SPERIL-II. Proceedings of the 1984 annual conference of the ACM on The fifth generation challenge (pp. 42-49). ACM Press.
Osherson, D. N., Stob, M., & Weinstein, S. (1982). Systems that Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists. Cambridge, MA USA: The MIT Press.
Pandey, S., Ramamritham, K., & Chakrabarti, S. (2003). Monitoring the dynamic web to respond to continuous queries. Proceedings of the 12th international conference on World Wide Web (pp. 659-668). Budapest, Hungary: ACM Press.
Richardson, V. J., & Scholz, S. (2000). Corporate Reporting and the Internet: Vision, Reality, and Intervening Obstacles . Pacific Accounting Review .
Schwartz, M., Denning, D., & Denning, P. (1979). Linear queries in statistical databases. ACM Transactins on Database Systems , 156-167.
Sicherman, G. L., Jonge, W. D., & Van De Riet, R. P. (1983). Answering Queries Without Revealing Secrets. ACM Transactions on Database Systems , 41-59.
Sion, R. (2005). Query execution assurance for outsourced databases. Proceedings of the 31st international conference on Very large data bases (pp. 601-612). Trondheim, Norway: ACM Press.
Sloan, R. H., & Turán, G. (1999). On theory revision with queries. Proceedings of the twelfth annual conference on Computational learning theory (pp. 41-52). Santa Cruz, CA, USA: ACM Press.
Staddon, J. (2003). Dynamic Inference Control. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (pp. 94-100). San Diego, CA USA: ACM Press.
Steel, P. M. (2004). A new Estimation for the number of Unique Population Elements Based on the Observed Sample. Washington, DC: Bureau of the Census.
U.S. Code Title 13 Chapter 1. Title 13 Chapter 1. Washington, D.C.: U.S. Government.
U.S. Securities and Exchange Commission. (2007, August 10). Interactive Data: Putting Technology to Work for American Investors. Retrieved August 20, 2007, from U. S. Securities and Exchange Commission: http://www.sec.gov/spotlight/xbrl/interactivedata.htm
_____________. (2008). Progress Report of the Advisory Committee on Improvements to Financial Reporting to the United States Securities and Exchange Commission. February 14, 2008. U.S. Securities and Exchange Commission.
Weitzenboeck, E. M. (2001). Enterprise Security: Legal Challenges and Possible Solutions. Proceedings of the 10th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (pp. 183-188). Los Alamitos, CA USA: IEEE CS Press.
Woodruff, D., & Staddon, J. (2004). Private inference control. Proceedings of the 11th ACM conference on Computer and communications security CCS '04 (pp. 188-197). Washington DC, USA: ACM Press.
XBRL International. (n.d.). XBRL - GL Taxonomy. Retrieved from XBRL - GL: http://www.xbrl.org/GLTaxonomy/
Xiao, Z., Jones, M. J., & Lymer, A. (2002). Immediate Trends in Internet Reporting . European Accounting Review .
Yazdanian, K., & Cuppens, F. (1993). Neighborhood data and database security. Proceedings on the 1992-1993 workshop on New security paradigms (pp. 150-154). Little Compton, Rhode Island, United States: ACM Press.
Zhang, N., & Zhao, W. C. (2004). Cardinality-based inference control in OLAP systems: an information theoretic approach. Proceedings of the 7th ACM international workshop on Data warehousing and OLAP (pp. 59-64). Washington, DC, USA: ACM Press.