View
225
Download
3
Category
Tags:
Preview:
Citation preview
The Inside Scoop:How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases
Barry WaldbaumFAST ArchitectMicrosoft Corporation
Paul J. LearningSr. ConsultantMicrosoft Corporation
Paul AndrewSr. Technical Product ManagerMicrosoft Corporation
SPC399
Session Objectives and Takeaways
Describe the very large scale test lab we did for
SharePoint
The test lab was shown in the keynote by Jeff Teper and Richard
Riley
Present Test Results for 6 series of testing on the
populated farm
Review Architecture for SharePoint and FAST
Identify lessons learned from building a large-scale
environment
Discuss tools leveraged to create & load content,
performance test
Project Overview and ResultsPaul Andrew, Sr. Technical Product Manager
Demonstrate very large SharePoint Farm Example of new SharePoint Boundaries and Limits
Enterprise Content Management (ECM) document archive scenario Use average Office document types Largest scale limits are document archive focused
Scale out across multiple content databases Adds scale out and scale up
Test SharePoint without limits on hardware or storage resources
Index content with FAST Search Load test with 15,000 concurrent users Test upgrading on a very large farm
Scale Lab Test Goals
Content database 60 million
Scale out permits multiple
New docs saved to dropbox
Content routing rules Separate content
databases Index all content with
FAST
Multiple SharePoint Content Databases
DocumentsDrop Box Document
Library
FAST Search Index
Archive Content
Database(s)
Content Routing
New boundaries and limits for SharePoint released in July 2011
SharePoint can scale to any customer requirement Partly thanks to this test lab Up to 200GB supported as before Up to 4TB supported for ALL scenarios with requirements
guidance Unlimited size supported for Document Archive scenarios
with requirements guidance New limit of 60 Million items in a content database 5TB SQL Server database instance limit is removed Remote Blob Storage (RBS) does not alter these limits
Software Boundaries and Limits Impacted
RBS allows Binary Large Objects to be stored outside SQL Server Reduces the size of the SQL Server database to metadata only This may be just 5% the total SharePoint Content Database
RBS does not alter SharePoint content size limits Blob and Metadata must be synchronized during backup/ restore Storage must return TTFB under 20 mS RBS extensions must use supported SharePoint APIs and not do direct
SQL database access RBS Benefits
Allows use of NAS (with iSCSI) ISV’s adding Tiered storage ISV’s adding custom Backup and Restore and other management features Performance improvements have been seen with > 1Mb files Useful in write once archive scenarios
We didn’t use RBS in this test lab
Value of Remote Blob Storage (RBS)
The report with all this detail published on Monday
http://go.microsoft.com/fwlink/?LinkId=229493
or http://blogs.msdn.com/pandrew
Very Large Scale Lab Whitepaper
announcing
NEC – Provided the Express5800 ServersIntel – Provided Westmere ProcessorsEMC – Provided the VNX5700 SAN
Partner Contributions
Testing Baseline on 100 million items, 30 TBDocument Download 30%
Browse Library 40%
Search Query 30%
Think Time 10 seconds
Concurrent Users 10,000
Requests Per Second at Load 200
Test Duration 1 hour
Web Caching On
FAST Content Indexing Paused
Number Web Front End Servers 3
User Ramp 100 users/ 30 seconds
Test Agents 19
4,000, 10,000, 15,000 Web Front Ends consistently used 2.5 GB RAM CPU use on WFE’s went down from 55% to 30% for
15,000 15,000 user load introduced some response delay
Load Test Series A – Vary User Load
A.1 4000 A.2 10,000 A.3 15,0000
5
10
15
20
25
Number WFEsAvg Page TimeAvg Response Time
A.1 4000 A.2 10,000 A.3 15,0000.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Avg CPU PACNEC01Avg CPU APP-1Avg CPU WFE-1Avg CPU WFE-2Avg CPU WFE-3Avg CPU WFE-4Avg CPU WFE-5Avg CPU WFE-6
16GB, 32GB, 64GB, 128GB, 256GB, 600GB No significant change in performance Response time has a curve, but all under 1 second
Load Test Series B – Vary SQL RAM
B.1 16GB
B.2 32GB
B.3 64GB
B.4 128GB
B.5 256GB
B.6 600GB
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Avg Page TimeAvg Response Time
B.1 16GB
B.2 32GB
B.3 64GB
B.4 128GB
B.5 256GB
B.6 600GB
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Avg CPU WFE-1Avg CPU WFE-2Avg CPU WFE-3Avg CPU PACNEC01Avg CPU APP-1
15%, 30%, 40%, 50%, 50%, 75% Maximum of about 75 Search Queries Per Second for this
Farm Notice at 75% search we have exceeded the search
capacity
Load Test Series C – Vary Search Transaction Mix
C.1 15% C.2 30% C.3 40% C.4 50% C.5 50% C.6 75%0
50
100
150
200
250
Avg RPS
C.1 15% C.2 30% C.3 40% C.4 50% C.5 50% C.6 75%0
5
10
15
20
25
30
Avg Page TimeAvg Response Time
4GB, 6GB, 8 GB, 16GB No impact on Requests Per Second Minimal impact on response time at 4GB
Load Test Series D – Vary Front End Server RAM
D.1 4GB D.2 6GB D.3 8GB D.4 16GB0
0.05
0.1
0.15
0.2
0.25
Avg Page TimeAvg Response Time
D.1 4GB D.2 6GB D.3 8GB D.4 16GB0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
Avg CPU WFE-1Avg CPU WFE-2Avg CPU WFE-3Avg CPU PACNEC01Avg CPU APP-1Avg CPU WFE-4
2, 3, 4, 5, 6 2 WFEs was clearly not enough, RPS was down a little
also for 2 Nice chart showing reducing CPU as number WFEs
increases
Load Test Series E – Vary Number Web Front Ends
E.1 2 WFEs
E.2 3 WFEs
E.3 4 WFEs
E.4 5 WFEs
E.5 6 WFEs
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
Avg CPU WFE-1Avg CPU WFE-2Avg CPU WFE-3Avg CPU WFE-4Avg CPU WFE-5Avg CPU WFE-6Avg CPU APP-1Avg CPU PACNEC01
E.1 2 WFEs
E.2 3 WFEs
E.3 4 WFEs
E.4 5 WFEs
E.5 6 WFEs
0
1
2
3
4
5
6
7
8
9
Avg Page TimeAvg Response Time
4 CPUs, 6 CPUs, 8 CPUs, 16 CPUs, 80 CPUs Impact in page response time when more resource
available Minimal impact to RPS at 4 CPUs
Load Test Series F – Vary SQL Server CPUs
F.1 4CPUs
F.2 6CPUs
F.3 8CPUs
F.4 16CPUs
F.5 80CPUs
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Avg Page TimeAvg Response Time
F.1 4CPUs F.2 6CPUs F.3 8CPUs F.4 16CPUs F.5 80CPUs0
50
100
150
200
250
Avg RPS
Report published this week with all details of test farm and results
Published document generator and load tools Published increased software boundaries and limits for
SharePoint 120 million 256KB items loaded into 30TB SharePoint
farm FAST Search index to 100 million items Farm renders pages and search results under load in 0.2
seconds
Results from the Lab
SharePoint ArchitecturePaul Learning, MCS Sr. Consultant
Paul LearningSr. ConsultantMicrosoft Consulting Services
Very Large SharePoint farm
demo
Did you see the keynote?
Logical Architecture
Physical Architecture
Hardware in the Lab – Physical SPDC01 (Domain Controller, DNS)
4 CPU Core, 8GB RAM, 33GB Disk PACNEC01 (SQL Host)
NEC Express5800/A1080a Server 80 CPU Core, 1TB RAM, 2x 8GB Fiber Optic HBAs SharePoint Service Application DBs and Content DBs,
FAST Admin DBs PACNEC02 (Hyper-V Host)
NEC Express5800/A1080a Server 64 CPU Core, 1TB RAM, 2x 8GB Fiber Optic HBAs
VNX5700 (Storage Area Network – SAN) 250x 600GB 7200 RPM SAS and 75x 2TB 5400 RPM NL-SAS drives in RAID10 for
120 TB 2x 8GB Fiber Optic HBAs
Hardware in the Lab – Virtual
35 Virtual Machinesin total
Testrig1 through 20 VS Controller and Test Agents APP-1 Central Administration, FAST SSA (Crawler,
Query) APP-2 Service Applications, FAST SSA, FAST Search
Center FAST-SSA-1/2 FAST Service and Administration FAST-IS1/4 FAST Search Indexers (Index, Search, Web
Analyzer) WFE-CRAWL1 Dedicated FAST Search Crawl Target
WFE WFE-1/6 SharePoint WFEs
Hardware in the Lab – Storage Area Network (SAN)
Content segregation to unique LUNs by database type is CRITICAL for reliability, high-scale and high-performance!
Data Sizing Details
Each NEC 1080a had 8x 146 GB drives
Two Document Center Sizes reflected below
Corpus total was ~30TB content
LUN #
Description Size (GB)
Server Disk Pool #
Drive Letter
0 SP Service DB 1,024 PACNEC01
0 F
1 PACNEC02 extra space
5,120 PACNEC02
0
2 FAST Index 1 3,072 PACNEC02
0 F
3 FAST Index 2 3,072 PACNEC02
0 G
4 FAST Index 3 3,072 PACNEC02
0 H
5 FAST Index 4 3,072 PACNEC02
0 I
6 SP Content DB 1 7,500 PACNEC01
1 H
7 SP Content DB 2 6,850
PACNEC01
1 I
8 SP Content DB 3 6,850 PACNEC01
1 J
9 SP Content DB 4 6,850 PACNEC01
1 K
10 SP Content DB TransLog
2,048 PACNEC01
1 G
11 SP Service DB TransLog
512 PACNEC01
0 L
12 Temp DB 2,048 PACNEC01
1 M
13 Temp DB Log 2,048 PACNEC01
0 N
14 SP Usage Health DB 3,072 PACNEC01
0 O
15 FAST Crawl DB / Admin DB
1,024 PACNEC01
1 P
16 Spare – not used 5,120 PACNEC01
2
17 Bulk Office Doc Content
3,072 PACNEC01
Extra T
18 VMs Swap Files 1,024 PACNEC02
Extra K
19 DB Backup 1 16,384
PACNEC01
Extra R
20 DB Backup 2 16,384
PACNEC01
Extra S
Pool #
Description Drive Type
Capacity (GB)
Allocate (GB)
0 FAST SAS 31,967 24,7351 Content DB SAS 34,631 34,0812 Not used NL SAS 58,586 5,261
SQL Content File FileGroup LUN Size (TB)SPCPrimary01.mdf Primary H:/ 0.01SPCData0102.ndf SPCData01 I:/ 3.67SPCData0103.ndf SPCData01 J:/ 4.39SPCData0104.ndf SPCData01 K:/ 3.47SPCData0105.ndf SPCData01 H:/ 3.14SPCData0106.ndf SPCData01 O:/ 0.01Document Center 1 TOTALS: 14.68SPCPrimary02.mdf Primary H:/ 0.01SPCData0202.ndf SPCData02 I:/ 3.02SPCData0203.ndf SPCData02 J:/ 2.93SPCData0204.ndf SPCData02 K:/ 3.23SPCData0205.ndf SPCData02 H:/ 3.54SPCData0206.ndf SPCData02 O:/ 2.32Document Center 2 TOTALS: 15.04CORPUS TOTAL: 29.71
Data IOPSLUN LUN
DescriptionSize (GB) Reads
IOPS (MAX)
Writes IOPS (MAX)
Total IOPS (MAX)
IOPS per GB
IOPS/GB from SQLIO
IOPS from SQLIO
G: Content DBs TranLog
412 5,437 11,923 17,360 8.48 16.3
H: Content DBs 1 6,850 5,203 18,546 23,749 3.47
I: Content DBs 2 6,850 5,284 11,791 17,075 2.49
J: Content DBs 3 7,500 5,636 11,544 17,180 2.29
K: Content DBs 4 6,850 5,407 11,146 16,553 2.42
L: Service DBs TranLog
0.7 5,285 10,801 16,086 31.42 61.25
M: TempDB 16 5,282 11,089 16,371 7.99 11.83
N: TempDB Log 8.5 5,640 11,790 17,429 8.51 15.76
O: Content DBs 5 2,388 5,400 11,818 17,218 5.60 10.26
P: Crawl/Admin DBs
491 5,249 11,217 16,467 16.08 24.81
TOTAL: 31,365 53,824 121,667 175,491 105,730
AVERAGE: 3,136 5,382 12,167 17,549 5.6 22
BulkLoader Utility Up to 10 million unique Word, Excel, PowerPoint and HTML documents Variable size (250KB used in lab effort) .NET Framework 4.0, OpenXML 2.0 SDK and Wikipedia dump file
required http://code.msdn.microsoft.com/Bulk-Loader-Create-Unique-eeb2d084
LoadBulk2SP Utility 4 Processes containing 16 Threads each targeting unique DL Mimics Folder/File hierarchy from file system Loads using SPFileCollection.Add() method Top load achieved was 233 documents/second Average load achieved was 127 documents/second http://code.msdn.microsoft.com/Load-Bulk-Content-to-3f379974
Document Creation and Loading
Applying Service Pack 1 and June Cumulative Update
Server SP1 (h:mm:ss) June CU (h:mm:ss) PSConfig (h:mm:ss)APP-1 0:15:51 0:15:05 0:04:25APP-2 0:13:24 0:14:53 0:01:56WFE-1 0:08:11 0:08:07 0:01:36WFE-2 0:07:23 0:08:01 0:01:34WFE-3 0:07:39 0:07:47 0:01:34WFE-4 0:07:24 0:07:49 0:01:36WFE-5 0:08:18 0:08:34 0:01:36WFE-6 0:07:15 0:07:40 0:01:35WFE-CRAWL1 0:07:15 0:07:41 0:01:44FAST-SSA-1 0:08:45 0:08:44 0:01:37FAST-SSA-2 0:09:21 0:08:41 0:01:58TOTAL TIME: 1:40:46 1:43:02 0:21:11GRAND TOTAL: 3:44:59
Database Name
B/U Start B/U End Diff (h:mm:ss)
Size (TB) Notes
SPContent01 7/10/2011 9:56:00
7/10/2011 23:37:00
13:41:00 14.40 Pre-SP1
SPContent01 7/29/2011 14:22:10
7/30/2011 4:28:00
14:05:50 14.40 Post- SP1 / June CU
SPC11 Keynote Demo SQL Server “Denali”
CTP3 Refresh Windows Cluster
Services SQL Availability
Group Client Access Point
with Clustered IP address
Reconstructed and connected to original SAN and Virtual Network
Full Farm Failover
PURELY OUT OF BOX INSTALLATION FOR LARGE-SCALE LAB No caching enabled No Thresholds No Site Quotas
Provided adequate recommendation of 2 IOPS per GB SPFileCollection.Add() vs. SPFolder.CopyTo()
Add achieved max of 233 documents/seconds with 16 concurrent threads
CopyTo achieved max of 31 documents/second Loopback Check Registry Key
Create Registry key and set to DISABLE \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\
DisableLoopbackCheck=1
Lessons Learned for SharePoint
SQL Server MAXDOP=1 ; Default Installation Value=0 Multiple LUNs on SAN and one virtual CPU to each LUN Database segregation to unique LUNs, spindles and CPUs Reduced SQL Server RAM to 600 GB Table Index Fragmentation (Bulk load only)
SP Timer Job (Health Analyzer) did not function correctly Microsoft.SharePoint.Administration.Health.DatabasesAreFragmented
Table indexes closely monitored during content loading Determined Indexes most impacted by load and created SQL
Stored Procedure to execute ALTER INDEX for dynamic rebuilds Stored Procedure also executes job to Update Statistics Procedure can be dynamically run at load start (Application
Configuration)
Lessons Learned for SQL Server
FAST Search for SharePoint Barry Waldbaum, MCS Architect
FAST Search Server 2010 for SharePoint
Built on SharePoint Search CenterLeverages all of innovations in SharePointOpen Web Parts, Federation, query suggestions, related queries, Did you mean?
Visual results connects users with contentThumbnails for Word and PowerPointVisual Best Bets highlight premium content Preview in browser without leaving the results
Deep Refinement
Thumbnails
Previews
Sort on any field
Similar Results
Big goals Access to big iron! Virtualized hardware and storage SharePoint topology Crawling SharePoint vs File Share content Monitoring at this scale
Why was this interesting to me?
Screenshots
Screenshots
FAST Topology 2 Physical nodes for
document processing 4 VMs
(16GB + 4 VCPUs) Index, Search, Web analyzer Disks:
C: 128GB VHD (not expanded, < 40GB used)
E: 3TB LUN IO Observed:
100MB/s Reads, 100MB/s Writes, 1K IOPS
SharePoint topology for FAST Search 2 Crawl components + 2 Query components VM specs:
4 Virtual CPUs @ 16GB of memory C: 128GB VHD (not expanded, < 40GB used)
Crawl Store database kept on a dedicated LUN
Registry Settings on Crawler Nodes HKLM\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering
Manager FilterProcessMemoryQuota
Default 100MB, Changed to 200MB DedicatedFilterProcessMemoryQuota
Default 100MB, Changed to 200MB
Monitoring the crawler via perfmon <confirm> OSS FAST plugin: Batches Open, Ready,
Submitted, Failed Incremental Crawl
Can take an hour to kick off, high database load 120M items crawldb stays under a 600GB Overall Crawl rate around 70 DPS
SharePoint Crawler Configuration
We can run on big iron FAST can run on VMs, but physical nodes do have
advantages The SAN performed very well Monitor the crawl at least 3 times a day
SCOM SharePoint Perfmon FAST command line tools
Backup of the index is not recommended at scale
FAST Search Lessons Learned
FAST has lots of tools to monitoring what’s going on! rc –r | select-string “# doc”
How busy are the doc procs Monitoring crawl queue size
Use reporting or SQL studio to see MSCrawlURL Indexerinfo –a doccount
Make sure all indexers are reporting to see how many are indexed in 1000 seconds
Indexerinfo –a status Monitor the health of the indexers and partition layout
Monitoring inside FAST
The limit of document processors per node is 20 can be increased if procserver_21 is stopped 50 ran successfully on the physical nodes
System maintenance during a crawl: pause the crawl Do not ignore the capacity planning guide
Make sure your hardware is spec’d to the minimums Admin node makes a great VM!
FAST Search Tips and tricks
Test Report (http://go.microsoft.com/fwlink/?LinkId=229493) SharePoint Server 2010 capacity management: Software boundaries and limits
(http://technet.microsoft.com/en-us/library/cc262787.aspx) Estimate performance and capacity requirements for large scale document repositories in SharePoint
Server 2010 (http://technet.microsoft.com/en-us/library/hh395916.aspx)
Storage and SQL Server capacity planning and configuration (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/cc298801.aspx)
SharePoint Performance and Capacity Planning Resource Center on TechNet (http://technet.microsoft.com/en-us/office/sharepointserver/bb736741)
Best practices for virtualization (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/hh295699.aspx)
Best practices for SQL Server 2008 in a SharePoint Server 2010 farm (http://technet.microsoft.com/en-us/library/hh292622.aspx)
Best practices for capacity management for SharePoint Server 2010 (http://technet.microsoft.com/en-us/library/hh403882.aspx)
Performance and Capacity Recommendations for FAST Search Server 2010 for SharePoint (http://technet.microsoft.com/en-us/library/gg702613.aspx)
Bulk Loader tool (http://code.msdn.microsoft.com/Bulk-Loader-Create-Unique-eeb2d084) LoadBulk2SP tool (http://code.msdn.microsoft.com/Load-Bulk-Content-to-3f379974) SharePoint Performance Testing Scripts (http://code.msdn.microsoft.com/SharePoint-Testing-
c621ae38)
References
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted
to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
anaheim, ca
Conference 2011october 3–6 th
2011
Recommended