Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
M i Fil StManaging File Storage
Presented by Tony AsaroSenior Analyst and Founder, INI Group
Content Chaos• IT departments have no controls on file
creation and storagecreation and storage
• There are no quotas
• IT has no authority to delete data
• 70-80% of all data is unused 90 days after70 80% of all data is unused 90 days after its creation
C 10 f 100 f• Companies have 10s of TBs, 100s of TBs, PBs and even 10s of PBs
Voices From the Field
“We literally have no visibility into what is b i d W k i lbeing stored. We know we are storing a lot of files not being used, duplicate data and personal files.”
Voices From the Field
“Last year we had just over 4 PB of file d i l h h 5 5storage and in less than a year we have 5.5
PB.”
No Classification• All data is treated equally regardless of
importanceimportance
• There are tons of exact duplicates, content d li t d d li t d tduplicates and near duplicate data
• There are files that haven’t been accessed in over a year, two years, three years, etc.
Important data is hard to find what• Important data is hard to find – what documents have intellectual property; confidential information; evidence?
Scanned Images• There is still a ton of paper being used for
businessbusiness
• Applications, forms, contracts, etc.
• This information is scanned and becomes an “image”g
Stories From the Field
“We discovered we had 125 scanned i f k Chi dimages of a take-out Chinese menu stored on our Tier 1 storage.”
Backup and Recovery• Files systems range between 500 GB to 2
TB because of backup data setsTB because of backup data sets
• For 100 TB of file data that could mean 50-200 b k d t t200 backup data sets
• For 1 PB of file data it equates to 500-2000 qbackup data sets
Voices From the Field“I was told that we can’t buy more storage and that we have to make due with what weand that we have to make due with what we have. All we are doing right now is figuring out how we can optimize our environmentout how we can optimize our environment. I’ll be honest, we’ve had to play games to
d i dget more storage and put it under our services budget but we won’t be able to get away with that any more.”
Voices From the Field
“Power, cooling and floor space are huge i f Th i b d h I iissues for us. The cost is bad enough. It is three to four times my capital budget. But the bigger problem is I am running out of power and floor space in our data center.”power and floor space in our data center.
NAS Challenges• There are only a handful of solutions• NAS migrations are complicated, costly,
li bl d ti iunreliable and time consuming• Block-level migration tools can transfer
f bl d t performance problems due to fragmentationNo data verification• No data verification
• Heterogeneous NAS migration are even more complicated and difficult more complicated and difficult
Voices From the Field
“We have 40 NAS heads managing 400 TB f d ”of data.”
Voices From the Field
“I feel like I am being held hostage by my NAS d ”NAS vendor.”
Voices From the Field
“My vendor quoted us one million dollars d f h i f h d dand four months to migrate a few hundred
TBs and about 30 NAS systems.”
File Servers and SANs• There are a number of customers with
that use File Servers attached to their SANsSANs
• Creates more points of control and often harder to manage; more physical often harder to manage; more physical servers
• All of the same issues exist in terms of • All of the same issues exist in terms of visibility, optimization, protection, etc.
Unstructured Data= Useless Data• An abundance of documents have
valuable and vital information within them however there is no way to easily them however there is no way to easily access or extract it
• It often requires “us” to remember • It often requires “us” to remember what is important content
• Scanned in applications forms • Scanned in applications, forms, contracts, etc – are often essential to business but data is either manually business but data is either manually inputted or wasted
Why It Matters• Huge CapEx and OpEx impact
• More storage floor space power cooling• More storage, floor space, power, cooling
• Backup and data protection challenges
• Puts a huge strain on IT departments
• The problem will only get worse over time• The problem will only get worse over time as storage requirements continue to grow
• Unknown impact to the business
What is needed?• Heterogeneous support for NAS, file servers and
CMS solutions• Requires flexible scalability• Requires flexible scalability• Discovery and analysis tools• Auto-classification• Auto classification• Optimization based on use, duplicates and near
duplicates• OCR engines• High performance, policy-based migration
technology that is transparent and provides data technology that is transparent and provides data validation with ongoing analytics and tiering
• Database integrationDatabase integration• Replication and recovery
Heterogeneous Support• Support any system that works with
NFS and CIFSl f• Support proprietary platforms via APIs
for CMS systems such as SharePoint and Documentumand Documentum
• The ability to analyze, optimize, search and migrate data between these and migrate data between these platforms transparently
Flexible Scalability• The platform needs to be able to scale
to 100s of TBS, PBs and even dozens of PBsof PBs.
• Requires clustering, scale-out or grid-technologies to add resources as technologies to add resources as needed
• An index will probably be needed and • An index will probably be needed and this must be capacity-efficient, fast and requires high availabilityand requires high availability
Discovery and Analysis• Wide range of metadata analysis
including last access datesd l d l• Discover duplicates, content duplicates
and near duplicatesd d• Users and document types
• Reporting tools to provide insight to k k d i imake key decisions
Auto-classification• Users• Document types• Last access dates• Keywords – “confidential and restricted”Keywords confidential and restricted• Pattern matching
Act on the Information• Delete or move duplicate data• Delete or move near duplicates• Delete of move infrequently accessed
data• Delete or move certain file types
OCR Engine• Convert scanned images into searchable
PDFsbl k h “ l ”• Enables you to make this “useless”
content more usefulh b l• Requires the ability to convert
thousands to millions of files in a short windowwindow
• Analyze the scanned images for valuable contentcontent
File Migrations• Heterogeneous• High performance with linear scalabilityg• Online and transparent• Scheduling, start/stop, throttlingScheduling, start/stop, throttling• Data validation using MD5, SHA1,
SHA256SHA256
Database Integration• Bridge the gap between structured and
unstructured datad l bl f h• Find valuable information within
unstructured content and import into database applicationsdatabase applications
• Enables the ability to leverage database applications running reports and queriesapplications running reports and queries
Replication and Recovery• Make replicated copies of data versus
backupff• Requires a cost effective Tier 3 storage
platformh d l f h d d d• Schedule scans of changed data and
update replicated copiesP id fil l l d i lli • Provide file level recovery and intelligent search and retrieval tools
Summary• File storage is already a management
nightmare and the problem will only get worse unless acted uponget worse unless acted upon
• This issue will become a problem for the data center for the next decadethe data center for the next decade
• We need intelligence and management above the file system and storage above the file system and storage system level