A sane approach to microservices

A Sane Approach to Microservices!!!!@tobym!@TapadEng

A service-oriented architecture looks great as boxes and lines on a whiteboard, but what is it like in real life? Are the benefits of flexibility worth the overhead of administration? We've built a framework on top of Finagle that enables a simple approach to building and deploying a microservice with SBT and Scala.

Who am I?Toby Matejovsky!Long-time Scala developer at Tapad!!@tobym

What are we talking about?

vs

Gray boxes are applications, colored boxes are services. Monolithic on the left - means different services all run in the same application. Service-oriented means individual services run in their own application. If you want to scale the red service, you can do so dependently of the green and blue ones.

What is a microservice, anyway?

def doSearch(query: String): Seq[Result]

Just a function? Not sure about that. That’s a nanoservice :p


def doSearch(query: String): Future[Seq[Result]]

Ooh, use Futures! Nope, not that simple :)


A logically grouped set of functionality that is deployed independently of other functionality.

A logically grouped set of functionality that is deployed independently. Lines of code don’t really matter. Number of developers supporting the service doesn’t really matter. This is just good application design, plus a deployment strategy that allows individual parts to scale separately. I think SOA got a bad name because the services became the monoliths they were meant to replace, but the idea is good so it was rebranded.

Should I use microservices?No!!Well, not until you need to :)

Don’t do it because it’s cool, do it because your application needs this strategy. Tapad started breaking up monolithic application a few years ago, have embraced more services over last 1-2 years. Small distinction - the original breakup of the monolithic app was to break out independent applications, there was no RPC between them; the only connection was changed state in the DB. New services are shared across applications, so we use RPC.

So, when do you need to?

- You notice code smells like multiple startup flags that control behavior in (ostensibly) one application, multiple boot up paths. - Deployment process feels unwieldy, e.g. two different features are released on the same codebase that are not related, but they must be coordinated so the released version contains both. This also makes rollback more challenging. Impractical to put everything behind feature flags.

AdvantagesScaling different parts differently!Organization

The Tapad Difference.A Unified View.

Advantages - more efficient to scale (homogenous cluster scales up by merely instantiating new instances of the service, can tune for the service's unique usage patterns) - one real-world example: one application must read a large data file at boot time and store it off-heap with indexes for speedy lookups. Making a wholly unrelated change and redeploying is unnecessarily painful, even with a push-button, automated deploy process. It's not just speed; we don't want to have an application taking up gobs of memory that it doesn't need. - developer organization (changes to different services are deployed independently, avoid awkwardly conflicting changesets) - Support multiple protocols (Thrift for the Scala applications, JSON/HTTP for easily poking around from the command line.)

DisadvantagesMore to keep track of!RPC is slower than in-process!Migration!Magic?

The Tapad Difference.A Unified View.

- more things to keep track of (picture of juggling) - RPC is slower than an in-process function call - protocols are trickier to migrate. in a single codebase, the compiler will tell you immediately if there is a problem. with a service-oriented architecture you must choose a wire protocol and be aware of what versions are in use by the downstream clients. In practice this has not been much of a pain; it just takes more diligence to keep things backwards compatible, then remove deprecated fields after everything else is upgraded). Client-server in same build helps though, compared to having the only boundary be a stringly typed JSON thing. - an RPC boundary can hide the fact that there are still shared resources. E.g. if a database behind a service is the bottleneck, spinning up more service instances won't help. Must handle failures. - we let various exceptions bubble up (eg Timeout), rather than returning a Failure.

Enough talk, more code!

SBT Pluginclass ServiceBuild( name: String, mainClassName: String, serviceSupportVersion : String, finagleVersion : String = "6.18.0") extends Build {

lazy val root = Project( name, file("."), settings = BuildSettings).aggregate(server, client) !lazy val client = configureClient(Project(...)) !lazy val embedded = (...).dependsOn(client) !lazy val server = configureServer(Project(…)).dependsOn(client, embedded) !}

(Obviously code is truncated) - SBT plugin that sets up a multimodule project (client, server, and embedded). It's just an SBT project so you can override anything, but this makes it very easy to set up a new project. - This build actually looks fairly similar to the Remotely project from Runar, Paul Chiusano, Tim Perrett, and Stew Oconnor. Guess we’re on to something!

Client service definition in Thriftnamespace java com.tapad.service.sample.protocolstruct Greeting { 1: string content;} service SampleService { Greeting greet(1: optional string name);}

- The client project is relatively lightweight. It contains a Thrift definition of a service interface and data structures, and a way to get a new client, and (rather importantly) the service version...this number is what allows a cluster to simultaneously run multiple versions of the server and client. This is a Finagle feature.

Client bootstrapimplicit val ex = ExecutionContext.globalval client = ClientBootstrap[SampleService]( zkHosts = "localhost:2181", clientId = "test-client", serviceId = "sample-service", version = "1.0.0") val futureResponse = client.greet(Some("Tapad"))

Embedded projectclass EmbeddedSampleService(implicit val executionContext: ExecutionContext) extends SampleService[Future] { def greet(name: Option[String]): Future[Greeting] = { val n = name.getOrElse("Stranger") Future.successful(Greeting(s"Hello, $n")) }}

- The embedded project does most of the heavy lifting; this is where business logic lives. Sometimes this code is instantiated inside another application to avoid incurring the cost of a network call. Obviously this throws away all the benefits of service boundaries that I mentioned before, but sometimes speed is more important in that tradeoff.

Server projectobject SampleServiceServer { def main(args: Array[String]) { implicit val executionContext = ExecutionContext.global val server = ServerBootstrap[SampleService]( bindTo = new InetSocketAddress("localhost", 9000), service = new EmbeddedSampleService, zkHosts = "localhost:2181", serviceId = "sample-service", version = "1.0.0" ) Await.ready(server) }}

- The server project is mostly a wrapper around the embedded project, that handles server-y things like accepting connections and translating between wire protocol (e.g. thrift or json) and internal Plain Old Scala Objects.

Operations :: sbt-release

ReleaseKeys.nextVersion := { ver => Version(ver).map(_.bumpBugfix.asSnapshot.string).getOrElse(versionFormatError) },ReleaseKeys.releaseProcess := Seq[ReleaseStep]( checkSnapshotDependencies, inquireVersions, runTest, setReleaseVersion, commitReleaseVersion, tagRelease, publishArtifacts, setNextVersion, commitNextVersion, pushChanges)

- The sbt plugin also brings in sbt-assembly and sbt-release - sbt-release makes it simple to make a release. This means checking that the repo is clean, tests pass, then tagging the repo with the release version, and pushing said changes back upstream.

Operations :: sbt-assembly

sbtassembly.Plugin.assemblySettings ++ Seq( mainClass in assembly := Some(mainClassName), jarName in assembly := name + “-server.jar”, mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => { case "META-INF/MANIFEST.MF" => MergeStrategy.discard ... )

- The sbt plugin also brings in sbt-assembly and sbt-release - sbt-assembly compiles everything into a single fat jar, and gives you the tools to handle conflicting files with a mergeStrategy of discarding, concatenating, etc. !

Operations :: sbt-native-packager// Fat jar is the only file required in application's classpath.scriptClasspath := { Seq(jarName.in(assembly).value)}, // removes all jar mappings in universal and appends the fat jarmappings in Universal := { // universalMappings: Seq[(File,String)] val universalMappings = (mappings in Universal).value val fatJar = (assembly in Compile).value // removing means filtering val filtered = universalMappings filter { case (file, name) => ! name.endsWith(".jar") } // add the fat jar filtered :+ (fatJar -> ("lib/" + fatJar.getName))},

- Next, we use sbt-native-packager combined with sbt-assembly to put that jar into an RPM which of course uses the same version number as the application. This extra step is triggered when our CI server sees a newly tagged release, and it makes our ops team very happy, because we can upgrade/downgrade applications using tried and true tools like yum and System V (system five) scripts and the plethora of devops tools that build on top of them.

- Has a lot of settings, is sort of confusing to use. But we liked the RPM aspect. - Note inserting values from sbt-assembly into the config for sbt-native-packager

Operations :: sbt-native-packager// Use sample to create conf files, puppet will overwrite with correct conf per envlinuxPackageMappings += packageMapping({ val props = sourceDirectory.value + "/main/resources/tapestry.sample.properties" file(props) -> "/usr/share/tapestry/conf/tapestry.properties"},{ val log = sourceDirectory.value + "/main/resources/logback.xml.sample" file(log) -> "/usr/share/tapestry/conf/logback.xml"},{ val jvmParams = sourceDirectory.value + "/main/resources/jvm-app.params.sample" file(jvmParams) -> "/etc/default/tapestry"}) withConfig "noreplace" withGroup daemonGroup.in(Linux).value withUser daemonUser.in(Linux).value

Add package mapping, but force “no-replace” because we use Puppet to manage all the application configuration; this ensures RPM will not clobber a file Puppet has put in place.

Operations :: sbt-buildinfobuildInfoSettings ++ Seq( buildInfoPackage := "com.tapad", sourceGenerators in Compile <+= buildInfo, buildInfoKeys := Seq[BuildInfoKey]( name, version, scalaVersion, sbtVersion, BuildInfoKey.action("buildTime") { // re-computed each time at compile System.currentTimeMillis }, BuildInfoKey.action("buildHost") { (Process("bash" :: "-c" :: "hostname" :: Nil) !!).trim }, BuildInfoKey.action("gitSha") { (Process("bash" :: "-c" :: "git rev-parse HEAD || echo None" :: Nil) !!).trim }))

- Use sbt-buildinfo and an internal-only admin endpoint so with a single curl command, you can see the application's version, scala version, sbt version, build time, build host, and git hash!

MiscTwitter Admin module!Monitoring!Load-balancing

• - TwitterServer's admin module provides a way to easily inspect the running application; for example we'll use the various pprof tools to get sample the running threads

• - Goes without saying, we track lots of metrics with Graphite to keep a good understanding of what these services are doing. • - Finagle client handles load balancing itself, there isn’t an LB in front of services. •

What’s next?

Make the plugin contain more standard configuration. Activator template so it’s even faster to start a project?

Thank You @tobym!

@TapadEng

Toby Matejovsky, Director of [email protected]!@tobym

Yes, we’re hiring

mailto:[email protected]?subject=

Technology

A sane approach to microservices