No-Full Route Changed Our Lives@AS55394 Osamu Kurokochi, Data Center Team, Infrastructure Headquarters
Copyright GREE, Inc. All Rights Reserved.Copyright GREE, Inc. All Rights Reserved.NameOsamu Kurokochi
Dept.Data Center Team, Infrastructure Headquarters, GREE, Inc
Self IntroductionCopyright GREE, Inc. All Rights Reserved.There were 8 border gateway protocol (BGP) routers.
Full route reception was in operation on all routers,and each BGP router was connected as iBGP peers in a full mesh topology.Summer in 2012 (Configuration at the time)GREE environmentRRRRRRRR
Copyright GREE, Inc. All Rights Reserved.Summer in 2012 (occurrence of fault)One time, a fault on the Transit side occurred and caused peers to crash.
Convergence of the routes at the time took timeand the Router CPU froze.
iBGP peers also began to crash which caused chaos.
A shutdown of about 5 minutes lasted intermittently until convergence. (The only thing we could do was to watch what happened.)
Copyright GREE, Inc. All Rights Reserved.Summer in 2012 (cause of fault)There were 3 main factors.
1. Insufficient hardware processing capability2. Increased number of routes3. Too many iBGP peers (not that many)
Caused by one or a mixture of 3 factors above.
Copyright GREE, Inc. All Rights Reserved.Solution 1. Reinforced hardwareBuy hardware having better performance.
Solution 2. Configuration changeIntroduce RR and reduce the number of iBGP-Peers.
Solution 3. Decreased number of routesDecrease the number of routes with a mechanism to reduce the load during a BGP update.
Breakthrough Solutions Considered at the Time
Copyright GREE, Inc. All Rights Reserved.Key Judgment PointReplacement of BGP routers at all bases also was considered but is difficult in terms of effort.
When a procedure verification order delivery maintenance arrangement was considered, this remedy was too slowIt was judged that the problem was difficult to solve by introducing new hardware.
Copyright GREE, Inc. All Rights Reserved.Key Judgment PointWe narrowed the solutions down to solution 3.
In our companys business model, 99% of accesses were from mobile devices.The necessity of full route itself was reconsidered resulting in as follows:
Full route Partial route + Default route*Partial Route = 3 domestic mobile carriers and 5 ASs.
Copyright GREE, Inc. All Rights Reserved.Transit RouterOwn routerTransit Router1 In-house filtering2 TransitFilter methodFull routeDefault routePartial routeDefault routeRIBFIBOwn routerRIBFIBPartial RouteDefault RouteTransparentGREE adopts this solution.There Are Two Partial Routes
Copyright GREE, Inc. All Rights Reserved.Summary of SolutionsSolution 1. Reinforced hardwareBuy hardware having better performance. Verification required and it takes time for delivery.
Solution 2. Configuration changeImplement RR to reduce the number of iBGP-Peers. Verification required, it takes time for delivery and no conclusive evidence that the problem will be rectified.
Solution 3. Decreased number of routesLower the number of routes with a mechanism to reduce the load during a BGP update. This can solve the problem in a short time and is reliable.
Copyright GREE, Inc. All Rights Reserved.Solution of ProblemNumber of routes:At the time, 400,000 routes reduced to approx. 2600 routesCome on, line trouble!We actually tried the solution.These have been further reduced to approx. 1800 routes.
Copyright GREE, Inc. All Rights Reserved.I thought...Even without full route, it is possible to continue the our business.
Do not you think the meaning of that you have a complete route again?Copyright GREE, Inc. All Rights Reserved.
Copyright GREE, Inc. All Rights Reserved.13