24
Web Archiving for Compliance & eDiscovery ALEPH ARCHIVES Ltd. 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland) [email protected] aleph-archives.com

Web Archiving Whitepaper Aleph Archives

Embed Size (px)

DESCRIPTION

Web archiving Platform CAMA By aleph archives whitepaper for more information go to our website www.aleph-archives.com

Citation preview

Page 1: Web Archiving Whitepaper Aleph Archives

Web Archiving for Compliance & eDiscovery

ALEPH ARCHIVES Ltd. ✉ 600 Blv de Maisonneuve suite 1700 - Montréal, Québec (Canada) / chemin des Croix-Rouges 16 - 1007 Lausanne (Switzerland) ✎ [email protected] ☞ aleph-archives.com

Page 2: Web Archiving Whitepaper Aleph Archives

WEB ARCHIVING

INTRODUCTIONQuick access to digital data and electronic information stored online is a «must have» when it turns to elaborate strategies in litigation or statutory compliance turmoil.

There are however many obstacles to permit and manage such access in an efficient way, whilst tak-ing into account both the frequent complexity of the related turmoil and the legal context which need to be dealt with. It is often impossible or too late to obtain the relevant information when it is neces-sary to, such as during eDiscovery processes.

Aleph Archives is an IT service provider dedicated to companies with specific needs regarding Web-content preservation. Aleph Archives offers turnkey tools to easily and efficiently retrieve relevant data stored online.

According to recent researches, the average life expectancy of a website is less than 75 days, and disputes over the content of websites are on the increase. In a certain number of countries, there are regulatory and archiving compliance regulations (i.e. Sarbanes-Oxley Act - US, Health Insurance Port-ability and Accountability Act - US, Gramm-Leach Bliley Act. -US, Federal Rules of Civil Procedure - US, etc) governing, and authorities (i.e. SEC and FINRA - US, Financial Services Authority - UK) based thereon which supervise, the different industry sectors.

Through a unique cloud-based Web archiving platform named CAMA®, Aleph Archives provides a «Web Preservation» services for regulatory compliance, litigation support and eDiscovery to help cor-porate entities, legal and governmental authorities in the collection, management and archiving of their huge and increasing Web content. CAMA® is the only platform that archives and keeps records of your websites, webpages, and web presence at large. CAMA® clearly evidences the content of web-site which has been shown to a particular enduser during its visit thereof and equally as important, which content – and hence which data - have not.

Web archiving for eDiscovery process is a recent "technological niche", as opposed to legacy eDis-covery which has been used for years to preserve electronic data (eg. email, files, etc.). The Web ar-chiving eDiscovery process is based on three main features, as outlined by the Electronic Discovery reference Model: thorough gathering of electronically stored information from Websites, full access and playback of any archived web content and conversion to a form that allows full-text search.

Copyright © 2012 Aleph Archives. All Rights Reserved.

1

Page 3: Web Archiving Whitepaper Aleph Archives

PRODUCTS & INNOVATION

CAMA® Web Archiving PlatformAleph Archives is a pioneer in the domain of Web archiving. We offer a high-quality archive accessibil-ity and rendering. With CAMA®, Aleph Archives sets the web archiving process and the related quality assurance (QA) to a higher level by working with crawl engineering experts, QA dedicated teams and a powerful - yet easy to use - archive access technology1.

Aleph Archives targets the companies in need of strict, reliable archiving processes to ensure compliance with SEC and FINRA regulations. The CAMA® Web archiving platform is more effi-cient and more reliable than any solution of its main competitors. Aleph Archives offers open (WARC - ISO 28500:20092 ), adaptive (cloud-based computing) and innovative (scheduled crawls, export Web archives as PDF/PNG, antiviral check, CAMA® Appliance, real-time results deduplication, multilingual search and translation), etc.

Copyright © 2012 Aleph Archives. All Rights Reserved.

2

1 Products demo at: http://www.youtube.com/user/alepharchives/

2 WARC ISO file format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

CAMA® in action: archived (07/04/2011) version of Toyota’s Corporate websiteTestimonials and videos

Load the archived version with a click

Page 4: Web Archiving Whitepaper Aleph Archives

CAMA® belongs to the category of « client-based & web-served » archiving solution (refer to Appen-dix A and B for more details) that allow creating and maintaining stable, time-structured, verifiably au-thentic and independent versions of corporate web presence, « social media » included.

Aleph Archives’s strategy aims at satisfying any of its clients, as CAMA® offers high-quality archived websites (which can be filed as evidence in case of litigation), easy-to-use browsing and access tools, and a full-Web-based service to reduce costs (refer to Appendix C).

Copyright © 2012 Aleph Archives. All Rights Reserved.

3

CAMA® in action: archived (05/10/2011) version of AerzteZeitung online German Newspaper

Play all embedded videos as usual

Page 5: Web Archiving Whitepaper Aleph Archives

Copyright © 2012 Aleph Archives. All Rights Reserved.

4

Today (08/02/2011) live version of NY Daily newspaper

CAMA® in action: archived (10/05/2011) version of NY Daily newspaper

Timeline

Qrcode, Digital Signing, andTimestamping

Options Pane

Page 6: Web Archiving Whitepaper Aleph Archives

MARKET SECTORS: who is CAMA® suitable for?

Corporatesa. E-DiscoveryLitigation Protection — Websites contain a growing proportion of business records that must be pre-served for long periods of time. This content is frequently requested during discovery proceedings be-cause of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is critical that all relevant electronic content be made available for e-discovery purposes.

Legal Hold — When a hold on data is required, it is imperative that an organization immediately begins preserving all relevant data. Our web archiving platform CAMA® allows organizations to immediately place a hold on data when requested by a court or on the advice of legal counsel. If an organization is not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of se-rious consequences, ranging from embarrassment to major legal sanctions or heavy fines.

b. Regulatory ComplianceFor just about every organization, there are a large and growing number of regulatory obligations to pre-serve electronic content. Some of the more important requirements are:

• Sarbanes-Oxley Act of 2002• Health Insurance Portability and Accountability Act of 1996 (HIPAA)• Securities and Exchange Commission Rules (SEC)• Financial Industry Regulatory Authority (FINRA)• Model Requirements for the Management of Electronic Records (MoReq)

c. Maintain Corporate Memory & Knowledge ManagementWeb archiving can be very useful for maintaining a corporate record of what has been posted to a Web site, how long this content was maintained or when it was replaced. For example, a company may want a record of its Web site for historical purposes, or it may need an archive in order to re-use some of its content at a later date. Maintaining an accurate archive of Web content can significantly reduce the costs associated with recreating this content.

Copyright © 2012 Aleph Archives. All Rights Reserved.

5

Page 7: Web Archiving Whitepaper Aleph Archives

GovernmentVirtually all government agencies have regulatory obligations to preserve electronic content. Because your agency’s online content is increasing both in complexity and volume, and because governments are held accountable for the information they publish on the web, you need to employ a records re-tention policy.

The 2006 changes to the Federal Rules of Civil Procedure indicate that all organizations (including go-vernments) must be able to find, capture, and produce electronically stored information that might be relevant to a judicial or regulatory request. This can’t be done with server backups, CMS revision con-trol, or other outdated methods. You need a solution that can provide indisputable proof of your online records integrity and authenticity (as required by the Federal Rules of Evidence).

For example, 2010 saw the Executive Office of the President (EOP) issue a solicitation to:

« Provide the necessary services to capture, store, extract to approved formats, and transfer content published by EOP on publicly-accessible web sites, along with information posted by non-EOP persons on publicly-accessible web sites where the EOP offices under PRA maintains a presence, throughout the term of the contract. »

Other requirements come from:

• Presidential Records Act (PRA)• National Archives and Records Administration (NARA)

• E-GOV - electronic records management initiatives• Guidance on Managing Records in Web 2.0/Social Media Platforms, October 20th, 2010• Library of Congress

• Federal Rules of Civil Procedure (FRCP)• Department of Commerce• Department of Energy• Department of Justice• Environmental Protection Agency• Office of Management & Budget• Securities and Exchange Commission Rules (SEC)• Library & Archives Canada

Copyright © 2012 Aleph Archives. All Rights Reserved.

6

Page 8: Web Archiving Whitepaper Aleph Archives

Website and « social media archiving3 » is a good solution for e-discovery preparedness. Aleph Archi-ves technology uses web bots (i.e crawlers) that capture all web pages (including social media). The web pages are stored exactly as they are captured (including links, rich media, video, and Flash), which satisfies regulatory requirements for digital records. Aleph Archives also provides a digital times-tamp and signature for each archived page, ensuring data integrity and authenticity.  With this SaaS solution (no tedious installation or software), governments can sign up and begin archiving in less than an hour.

Adopting a web archiving policy is essential. But it’s not just for big cities or the federal government. Aleph Archives’s pricing is competitive so that even small towns can stay prepared.  The Internet will only continue to grow in scale and complexity, and governments are increasingly in-terested in how it can be used for civic growth and development.The issue of records retention must be addressed from the start, so that agencies can move forward confidently online.

« Government websites are public records and must be archived to comply with Public Records Laws. Start archiving now. »

FinanceOnline marketing/communications can present a challenge for securities traders, investment advisors, banks, and others in the financial services industry. The benefits of advancing technologies must be weighed against the risks associated with non-compliance in the area of books and records retention. Failure to meet the demands of industry standards can result in hefty fines and bad publicity.

Multiple sets of guidelines for the financial industry (issued by SEC, FDIC, FSA, SOX, FINRA, and others) demand the preservation of business records (both paper AND electronic) in such a way that the data can be reproduced in a timely and complete manner to a regulator.  These requirements are now being extended to include newer tools such as social media platforms, and FINRA has advised that no compliance grace period will be in effect for these new technologies.

It’s critical that firms implement a robust records retention policy for their websites and « social media pages ».  Should your corporate web presence be investigated or questioned, a perfect representa-tion of your company’s online activity is a necessity — and that’s exactly what CAMA® provides. 

« Website archiving is vital to fulfilling many key FINRA and SEC regulations.Start complying today. »

Copyright © 2012 Aleph Archives. All Rights Reserved.

7

3 Twitter and Government Transparency

Page 9: Web Archiving Whitepaper Aleph Archives

Food and Drugs CompaniesIn archiving their electronic data, public traded companies need to comply with the records manage-ment regulations of the Sarbanes-Oxley (SOX) Act.

The past year has seen a dramatic increase in the FDA‘s enforcement of regulations that deal with product claims and labeling. In an effort to be more pro-active, the agency has been investigating companies for compliance with the FD & C Act, particularly section 403 A, which deals specifically with product descriptions and claims. As a result, a number of companies have received warning let-ters — which are viewable online, damaging brand reputation — addressing the product claims made on their labels or websites.

Since most marketing now happens via websites, social media, and other Internet tools, it is of ut-most importance for your company to have a reliable, accurate archive of all online activity. Should your claims be investigated or questioned, defensible evidence of your website’s precise content is a necessity — and that’s exactly what CAMA® provides.

Using crawling technology, we take automated snapshots of your website. Only new pages or chan-ged pages are archived, saving storage space. The whole process is automatic — you don’t have to remember to do anything.

« Have a reliable, accurate and defensible archive of all online activity. »

Law firmsCompanies creating content online or law firms can use CAMA® to provide legal proof of intellectual property. CAMA® provides each page with a digital timestamp and a digital signature that cannot be altered without detection and, hence, creates legal proof of copyright. This trusted, non-refutable evi-dence stands up in a court of law if copyright ownership is ever questioned.

« Use websites as legal evidence in court. Have CAMA® create integral and authentic evidence with support for e-Discovery. »

Copyright © 2012 Aleph Archives. All Rights Reserved.

8

“ This Court sees no reason to treat web sites differently than other electronic files. ”

Arteria Prop. Pty Ltd. vs. Universal Funding V.T.O., Inc

Page 10: Web Archiving Whitepaper Aleph Archives

CAMA® for Social Media e-Discovery

Organizations and their employees are leveraging social media tools at unprecedented levels. With over 150 million blogs, an average of 140 million tweets every day, and +800 millions of users of social media sites worldwide (Facebook, LinkedIn, MySpace...), organizations are challenged to define usage policies and implement solutions to appropriately govern, discover and preserve relevant infor-mation from these complex and malleable data sources. Complicating the challenge of performing discovery on social media sites is the fact that these sites also include rich media such as audio and video, adding to an already complex environment. Legacy tools and manual processes cannot effecti-vely manage the risk associated with social media sites and interactive content.

To successfully manage discovery of social media and protect themselves from potential risk, organi-zations must embrace new technologies to harness and understand the meaning of the social media content. Since social media content can be subject to legal hold if it contains relevant information, le-gal teams must be prepared to search, identify, preserve and collect this information. Social media sites must be managed as other enterprise data sources, as part of a comprehensive Social Media eDiscovery and information governance program. Given the complexity and volume of social media content, legal teams must be prepared with an automated solution that can understand meaning and cull through voluminous data sources to find relevant information.

According to a report issued by Garner, Inc., a leading technology research and advisory firm, half of all companies will have been asked to produce material from social media sites for e-Discovery by the end of 2013. Debra Logan, vice president and distinguished analyst at Gartner, wrote: 

« In e-Discovery, there is no difference between social media and electronic or even paper artifacts. The phrase to remember is if it exists, it is discoverable. Unique aspects of social media present addi-tional challenges, but as with an overall information governance strategy, the key to avoiding or miti-gating potential legal issues in the use of social media for business purposes is to have a governance framework, policy and user education. ».

In addition to the challenge of meeting the legal hold and preservation obligation, organizations inclu-ding those in the Financial Services, Healthcare, and Pharmaceutical industry, must ensure that em-ployees are not violating regulations by creating or posting non-compliant content. As regulators re-cognize the influence and risks associated with social media channels, they are beginning to require organizations to actively monitor and govern employees' social media interactions.

For instance, FINRA (Financial Industry Regulatory Authority) regulatory notice 10-06, requires mem-ber firms to supervise and archive content posted to social media sites. The Food and Drug Adminis-tration (FDA), Federal Trade Commission (FTC), and the National Futures Association (NFA) are also

Copyright © 2012 Aleph Archives. All Rights Reserved.

9

Page 11: Web Archiving Whitepaper Aleph Archives

developing rules associated with the use of social media, and the Federal Courts have issued guideli-nes for monitoring and managing social media sites usage (see Resources & Links section).

For example, if you don’t have an archiving system, you could be in trouble trying to find something you posted.

According to Facebook4:« Currently, you can only search for content that has been posted in the last 30 days. The range of the search history may be expanded in the future. »

Copyright © 2012 Aleph Archives. All Rights Reserved.

10

4 same apply to Twitter and LinkedIn, see Archiving Social Media prepares you for e-Discovery

CAMA® in action: archived (05/17/2011) version of NYTimes newspaper on Facebook

All media types (Flash, photos, videos, posts...)are preservedin their native

formatNYTimes newspaper on Facebook

All links are clickable. Browse the archived pages, play videos, load images...

Loading archived version

Page 12: Web Archiving Whitepaper Aleph Archives

Aleph Archives’s advanced web archiving platform for e-Discovery enables organizations to proactive-ly manage, search for, identify and preserve any social media content. CAMA® enables organizations to take advantage of the power and business value of social networks, while ensuring FRCP, and re-gulatory compliance.

Unique Selling Proposition

The main competitive advantages of the CAMA® platform are:

• superior technology to capture multiple web formats in dynamic websites,• more comprehensive web archiving process with crawl engineering experts,• high-quality archive accessibility and rendering,• Universal Archives View (UAW) independent from OSes and browser types or versions,• optimized fulltext search engine tailored to very large web archive collections (billions of

documents),• deduplicated full-text search results in real-time,• daily archiving capabilities,• support of WARC ISO file format,• dedicated quality assurance teams and processes,• ability to be deployed over commodity machines,• fault tolerant software design,• high availability 5

CAMA® is the only solution in the market capable of running without Internet connexion while accessing the archives and also being able to be fully deployed « In-House » (i.e inside the cus-tomer’s infrastructure). The « In-House » solution offers you the freedom of exploiting the potential of CAMA® (training required).

Copyright © 2012 Aleph Archives. All Rights Reserved.

11

5 See our Service Legal Agreement (SLA)

DISASTER & DATA RECOVERY

« Your data safe and secure »

Aleph Archives’s “retention service” includes shadow copies of your archived data in a geographically distinct locations (USA, Canada, Switzerland, France). This means that two copies of your web archives exist at any given time to provide high data availability and avoid data loss.

Page 13: Web Archiving Whitepaper Aleph Archives

Pricing ModelCloud-based solution

This section describes the implementation process for Aleph’s enterprise web archiving service and the pricing for the Set Up phases and for the provision of archive services thereafter.

Aleph may calculate the fees using one of two methods of estimation.

1. Where requirements are not fully defined, a simple overall price can be provided, which will be based on the size and scope of the archive policy in broad terms. A breakdown of these fees may be provided for transparency.

2. Where requirements are more fully defined, a more rigorous approach to estimating fees may be used. This will provide a price per URL (i.e archived resource), which will be more accurate than the simple overall price, in that it is based on the specifics of an archive strategy defined by the more de-tailed requirements. Three parameters are involved here: the scope, the frequency, and the price per URL.

• The scope defines which URLs are "in" a particular crawl: the list of URLs the customer would like to archive.

• The archiving frequency for each scope can vary from daily, to weekly, to monthly to quarterly, to annually. Aleph Archives is the only web archiving company offering a daily archiving service.

• The price per URL is composed of:‣ System administration charges;‣ Archiving services fees;‣ Infrastructure and storage costs (retention, data integrity, data security, etc.).

InHouse solution

All interested customers in the InHouse version of CAMA® are welcome to contact us for a quote.

Copyright © 2012 Aleph Archives. All Rights Reserved.

12

Page 14: Web Archiving Whitepaper Aleph Archives

APPENDIX A.

Web Archiving Policy

A web archiving policy is the only means of creating and maintaining a stable, time-structured, verifia-bly authentic and independent version of the corporate web presence. « Independent » means that access to the content must be possible without requiring the original CMS version to be installed, configured and running. Having a web archiving policy is the only way the corporate Web-publishing infrastructure can evolve without threatening accessibility to legacy content. It is also the only way to avoid the continuous licensing and maintenance costs of legacy CMSs.

A substantial and enduring web archive can be achieved by generating a flat, stable and time-struc-tred version of the published content, capturing authentic snapshots according to the corporate ar-chiving policy. These snapshots must be taken as user-centric views of the content, i.e. accurately reflecting the user’s experience of that particular content. In addition they must be stored and made accessible in precisely the same form, thereby meeting legal and compliance requirements as authen-tic copies. And they must enable discovery using familiar web paradigms such as full-text search, as well as more sophisticated e-discovery techniques including metadata, tagging, filters and complex search.

A1. How to choose your web archiving solution?Web archiving has made significant progress during the last five to seven years. It now offers a choice of approach to both policy and supporting technology. These choices should be considered carefully against business objectives before the decision is made. The main differences lie in the capture and access methods used.

Three different methods exist to capture and archive web content:

a. client-side archiving b. transaction archivingc. server-side archiving

Copyright © 2012 Aleph Archives. All Rights Reserved.

13

Page 15: Web Archiving Whitepaper Aleph Archives

A2. Client-side Archiving« Client-side archiving » uses an archival crawler, derived from search engine crawler technologies, with significant enhancements to ensure that complex and hard-to-reach content can be found and captured, as well as stored without change. Starting from seed pages or entry points, these tools au-tomatically capture pages and parse them to extract all links. The process repeats and continues as long as newly discovered pages remain within the scope defined for the crawl. The captured web content and embedded files are stored unchanged — original and authentic copies, an exact equiva-lent of what the generic user would have received in their browser at the time — and preserved in a flat, standards-based and self- contained file format that can be confidently considered as future-proof. This is especially important within a legal context.

To be effective this method requires a crawler with excellent link extraction and path-finding algorithms that can work in a wide range of circumstances and site/page designs. In addition to client-side archi-ving, there are two alternative methods to capture web content. Both methods need to be operated from the server-side; require prior authorisation to services; and need access to both front-end and back-end servers.

A3. Transaction ArchivingThe first of these alternative methods, called « transaction archiving », consists of the systematic cap-ture and archiving of all browser/server exchanges (request/response pairs), resulting from the interac-tion of users with sites, regardless of their content type and how they are produced.

Transaction archiving enables tracking and recording of every actual instantiation of content in an au-thentic flat HTML form, easy to maintain and preserve over time. Moreover, it can be used to archive hidden web content, provided this content is requested, i.e. read, by the websites’ users during the capture time.

However, transaction archiving generates unnecessary duplicates of frequently-visited pages and rai-ses serious privacy concerns as the method implicitly relies on usage tracking.

Copyright © 2012 Aleph Archives. All Rights Reserved.

14

Page 16: Web Archiving Whitepaper Aleph Archives

A4. Server-side ArchivingThe second, and more obvious, alternative to client side archiving is « server-side archiving ». This consists of directly copying files in the document folders to back-up servers. Although it might appear to be the simplest approach, it is in fact seriously flawed, from both the preservation and archive ac-cess points of view.

To make certain that any web content archived using this method can be properly restored, server-side archiving requires that all original CMSs, databases and other software are archived alongside the content or are actively maintained in an operational state; or that the content is migrated to newer CMSs, databases, etc. In any case, these activities will be required for the whole period of archive re-tention. Interestingly, IT backups essentially rely on this method in almost all cases, systematically fai-ling to meet long-term preservation and ac- cess capabilities that are essential for legal and com-pliance requirements. However, for some types of hidden-web content, this method can prove to be useful, mainly in situations where it is required to archive parts of websites that a client-side crawler cannot reach.

A5. Comparison of Content Capture MethodsThe following table summarises the main content capture methods, where: ✔ = fully supported and ● = possible/custom development.

Server-side Transaction Client-sideContent captured as user sees it, unchanged, and authentic ✔ ✔

Archive access independent of original publishing technology ✔ ✔

Able to capture interactive or query based content ✔ ✔ ●

Retains web URL space (not dependent on server link mapping) ✔ ✔

De-duplication possible ● ✔

Easily directed and scheduled capture ✔ ✔

Flexible archival scope, for a wide range of needs ✔ ✔

Able to capture browser/server exchanges (request/response pairs) ✔

Web server technology independence ✔

Archiving services can be centralized in one place ✔

Cost effective and efficient operations over time ✔

In most cases client-side archiving is the best approach for capturing content. The quality of the resul-ting archive will depend mainly on the capabilities of the crawler, particularly with respect to link ex-traction, even when links are encoded in scripts and executables. This is one of the key determinants for capture of all files in a consistent and timely manner.

Copyright © 2012 Aleph Archives. All Rights Reserved.

15

Page 17: Web Archiving Whitepaper Aleph Archives

APPENDIX B.

Accessing your Web Archives

Two different methods exist to provide access to archives:

a. website-copier approachb. Web-served approach

The choice is largely determined by how the files are stored. This is critically important, because web URLs use different naming conventions to file systems, with different permissible and reserved cha-racters, escaping rules, case sensitivity, etc.

B1. Website-copier ApproachWebsite copiers write all captured files directly to disk, and therefore need to modify names and links as they are stored in order to make the archive accessible. This results in an archive that is not an au-thentic version of the original server’s response stream.

B2. Web-served ApproachArchive web servers, on the other hand, store responses from the original server unchanged in con-tainer files. This ensures the content and server response stream are kept in an authentic form.

The emerging standard for web archive container files is WARC6 — the Web ARChiving file format — ISO standard ISO/DIS 28500. It is already being adopted as the foundation for web archive storage and preservation. A WARC file records the sequence of harvested web files captured by the crawler, each page preceded by a header containing metadata that briefly describes the harvested content, its length and checksum.

WARC ensures the preservation of the original naming scheme and linking, thereby providing archive storage of content in an authentic form, as well as providing the means for additional integrity checks during the entire period of custodianship.

Copyright © 2012 Aleph Archives. All Rights Reserved.

16

6 WARC file ISO format: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717

Page 18: Web Archiving Whitepaper Aleph Archives

B3. Comparison of Access MethodsThe following table summarises the main archive access methods, where: ✔ = fully supported and ● = possible/custom development.

Website Copy Web-served ArchiveSearchable ✔ ✔

Browsable ✔ ✔

Content directly navigable from disk ✔ ●

Content stored and accessed unchanged, and authentic ✔

Links independent of naming conventions ✔

Storage and preservation of metadata ✔ ✔

Access independent of file location ✔

Standards-based archives ✔

There is a consensus today that the website-copier approach has serious limitations concerning au-thenticity of the archive, whereas the Web-served approach can ensure authenticity by design. In pro-fessional use therefore, especially where legal and regulatory obligations are business priorities, the Web-served approach is a necessity.

Copyright © 2012 Aleph Archives. All Rights Reserved.

17

Page 19: Web Archiving Whitepaper Aleph Archives

APPENDIX C.

Web Archiving as a Best PracticeThe web has matured into a central communication channel for businesses and government agencies, with digital media (websites and other web-based content) all but replacing print media as the primary mode of communication with customers, constituents, prospects, investors, and others.

Organizations using the web must keep accurate records of web content — online communication is just as much of a liability as any other form of communication. As a recent case ruled: « This Court sees no reason to treat websites differently than other electronic files. »

Web archiving has become a best practice for any organization using the web to communicate. Organi-zations who neglect to retain accurate records of their web presence are placing themselves at unne-cessary risk, both from a compliance and litigation standpoint.

Protect your organization by regularly archiving web content with Aleph Archives Web Archiving Plat-form CAMA®. We provide all the technology and services you need to archive your websites and web presence from any domain.

Copyright © 2012 Aleph Archives. All Rights Reserved.

18

Page 20: Web Archiving Whitepaper Aleph Archives

APPENDIX D.

APPENDIX E.

Copyright © 2012 Aleph Archives. All Rights Reserved.

19

ALEPH ARCHIVES’s CAMA®

PLATFORM

ARCHITECTURE OVERVIEW

More details about the architecture internals are available upon request.

Page 21: Web Archiving Whitepaper Aleph Archives

APPENDIX E.

Elements of a Web Archiving PlanSetupAleph Archives runs, tests, and calibrates the CAMA® robots to get the best rules in order to capture your website(s) with the highest quality.

CaptureThe cost related to website crawl and engineering of the target URL’s on a specified frequency.

RetentionThe cost of annual storage and retaining archives of target websites. Standard plan calls 7 years re-tention.

OperationIncludes the maintaining the designated servers and machines up and running for CAMA®, archives access, retention, and quality assurance.

Quality Assurance (QA)- QA Level 1: we check and verify one level deeper (depth 1) from website root (i.e home page). - QA Level 2: we check and verify two levels deep from the root, and so on accordingly with QA

Level 3 and QA Level 4. QA can go as far down in website depth as the client needs. In industry practice, QA Level 4 is sufficient for most enterprises for regulatory compliance, legal and operations purposes. - Exhaustive QA: we check and verify all designated website's and levels, verifying every page to

the website’s full depth. Exhaustive QA may be cost prohibitive, depending on the customer’s requirements. Upon request, Aleph Archives will provide price quotation for Exhaustive QA.- Mixed QA: we combine a sampled QA per website level with an exhaustive QA to a certain level.

Copyright © 2012 Aleph Archives. All Rights Reserved.

20

Page 22: Web Archiving Whitepaper Aleph Archives

APPENDIX F.

Aleph Archives provides the following CAMA® Plans:

FEATURE PROFESSIONAL ENTERPRISE PREMIUMCrawl engineering team ✔ ✔ ✔

WARC format (ISO 28500:2009) compliance ✔ ✔ ✔

Scheduled crawls ✔ ✔ ✔

Archives summary pane ✔ ✔ ✔

Document format handling (HTML, Word, Power-

Point, PDF, Flash …)

✔ ✔ ✔

Full text search standard advanced advancedFull text search history ✔ ✔ ✔

Full text search queries import & export ✔ ✔ ✔

Automatic language detection ✔ ✔ ✔

Documents metadata extraction and indexing ✔ ✔ ✔

Infinite archives retention ✔ ✔ ✔

ARC to WARC batch migration ✔ ✔ ✔

WARC to WARC batch conversion ✔ ✔ ✔

Archives verification and repair tools ✔ ✔ ✔

Text summarizer ✔ ✔ ✔

Audit trails identification and traceability ✔ ✔

Deduplicated full text search ✔ ✔

Archived resources export (PDF, PNG) ✔ ✔

Multi-core aware archives servers ✔ ✔

Archives redundancy ✔ ✔

Load balancing for archives access ✔ ✔

Antivirus checker ✔ ✔

Trusted archives (digital signatures) ✔ ✔

SEC 17a-4 and FINRA compliance ✔ ✔

Secured archives access (SSL Encryption) ✔ ✔

Multilanguage instant translator ✔ ✔

Custom Branding ✔ ✔

Archives compression ✔ ✔

Archived data processing and management ✔

Copyright © 2012 Aleph Archives. All Rights Reserved.

21

Page 23: Web Archiving Whitepaper Aleph Archives

FEATURE PROFESSIONAL ENTERPRISE PREMIUMCAMA® Appliance ✔

CAMA® Appliance on USB pen drive ✔

CAMA® Kit (Access API) ✔

CAMA® 64bits ✔

Quality Assurance team (level) basic medium highCustom metadata limit 30 unlimited unlimitedCollections limit 100 unlimited unlimitedAccounts limit 10 unlimited unlimitedCrawled resources per month up to 500K up to 5M unlimitedArchived resources per month up to 500GB up to 1TB up to 2TB

A « Custom Plan » is also available via an online form which allows customers to choose product fea-tures that best suit their needs.

Copyright © 2012 Aleph Archives. All Rights Reserved.

22

Page 24: Web Archiving Whitepaper Aleph Archives

RESOURCES & LINKS

☞ Aleph Archives- Website

- Products demo

☞ Records ManagementFinance- FINRA Regulation Notices

- FINRA Guidance

- FINRA Regulatory Notice 10-06 on Social Media

- Summary of NASD Rule 3110 — Books and Records

- Federal Rules of Evidence 901 — Data Integrity & Authenticity

- SEC — Division of Trading and Markets

- SEC — Division of Investment Management

- SEC Rule 17 a-4 — Books and Records

- Sarbanes-Oxley Act (SOX)

- Financial Services Authority (FSA) Handbook (Europe)

- FSA Handbook Section 3.2 — see Records Requirements, Sec 3.2.20 (Europe)

- Model Requirements for the Management of Electronic Records (MoReq) (Europe)

Food and Drug Administration- Federal Rules of Evidence 901 — Data Integrity & Authenticity

- FDA Guidance Documents — Food

- FDA Compliance & Enforcement – Food

- FDA Guidance Documents — Drugs

- Code of Federal Regulations (CFR) Title 21

-Model Requirements for the Management of Electronic Records (MoReq) (Europe)

- Pharma Social Media Wiki

- FDASM (Everything About the FDA, Internet, Social Media)

Copyright © 2012 Aleph Archives. All Rights Reserved.

23