35
021 ٿฦਫහഝቘຝ 1 ӳᮐ

&+ ø Ä R X

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: &+ ø Ä R X

0 2 1 1

Page 2: &+ ø Ä R X

2

2013

“ ” “ ” “”

2000

Page 3: &+ ø Ä R X

3

������

A

Page 4: &+ ø Ä R X

4

�������

I

N FP

I

I I P

U

I

���������������

���� ����

3

����

Page 5: &+ ø Ä R X

5

1.

2. -Flink

3.

4.Butterfly-Sql

Page 6: &+ ø Ä R X

CIF -CIF -

CIF HDFS

Mysql MongoDB

Canal Oplog (cif-oplog-sync)

cif-rest-server

Mongdb( )

Azkaban

DBV RestFul

Kafka

Flink

HDFSBase

Cif-trickle

HDFS( )

ODS2CIF

HBaseElasticsearch

Butterfly

UTCS WEB

SparkSql

cif.schema

Canal-analyzer oplog-analyzer

Page 7: &+ ø Ä R X

7

1.

2. -Flink

3.

4.Butterfly-Sql

Page 8: &+ ø Ä R X

8

1.oplog

2.consumer

3.

Page 9: &+ ø Ä R X

Flink

9

2008 Stratosphere0.6 Flink

2014 4 16 Apache ASFApache Software Foundation

Page 10: &+ ø Ä R X

Flink Ecosystem

10

localSingle JVM

ClusterStandalone,YARN

CloudGEC,EC2

RuntimeDistributed Streaming Dataflow

DataStream APIStream Processing

DataSet APIBatch Processing

CEPEvent Processing

Streaming TablesRelational

Batch TablesRelational

FlinkMLMachine Learning

GellyGraph Processing

Deploy

Core

API

Libraries

Flink Flink

Page 11: &+ ø Ä R X

Flink

11

Flink JobManger TaskManager Client JobManagerJobManager TaskManager TaskManager JobManagerTaskManager JVM

Client Job JobManager Job

Client Streaming

JobManager Job Task checkpoint Storm Nimbus Client Job JAR

Task TaskManager

TaskManager Slot slot Task Task JobManager Task

Netty Job JobManager Job Client

Streaming

Page 12: &+ ø Ä R X

Flink DAG

12

JobGraph StreamGraph JobGraph JobManager chain / /

StreamGraph Stream API

ExecutionGraph JobManager JobGraph ExecutionGraph ExecutionGraph JobGraph

JobManager ExecutionGraph Job TaskManager Task “ ”

Page 13: &+ ø Ä R X

Flink WaterMake

13

1.

2.

1. Event Time2. Ingestion Time Flink3. Processing Time Operator

Flink Google MillWheel WaterMarkEvent Time

Page 14: &+ ø Ä R X

Flink WaterMake

14

Event Time

WaterMark Flink WaterMark FlinkFlink WaterMark

Flink Flink WaterMarkWaterMark WaterMark

Page 15: &+ ø Ä R X

Flink

15

JVM , Hadoop, Spark,Hbase,Kafka .JVMJVM .

1.JVM OOM

2.Full GC

3.Java

Page 16: &+ ø Ä R X

Flink

16

Network Buffers: 32KB buffer TaskManager 2048

taskmanager.network.numberOfBuffers

Memory Manager Pool: MemoryManager MemorySegment Flink sort/shuffle/

join MemorySegment 70%

Remaining (Free) Heap: TaskManager

GC

MemorySegment: flink MemorySegment flinkMemorySegment cell cell 32kb flink

MemorySegment Flink java.nio.ByteBuffer Java byte[] ByteBuffer MemorySegment

Page 17: &+ ø Ä R X

Flink

17

//1.Personpublic class Person { public int id; public String name;}//Tuple3<age:Integer, height:Double, Person>(25,175.5,Person(1,"zhangsan"))

int 4 double 8 POJO headerPojoSerializer header serializer

Flink Schema Flink Java Scala Flink Flink Java Reflection Java Flink UDF (User Define Function) Scala Compiler

Scala Flink UDF TypeInformation TypeInformation

BasicTypeInfo: Java ( ) String BasicArrayTypeInfo: Java ( ) String WritableTypeInfo: Hadoop Writable TupleTypeInfo: Flink Tuple ( Tuple1 Tuple25) Flink tuples Java Tuple CaseClassTypeInfo: Scala CaseClass( Scala tuples) PojoTypeInfo: POJO(Java Scala)Java public getter/setter GenericTypeInfo:

1. Flink TypeSerializer 2. Flink Kryo

Page 18: &+ ø Ä R X

BackPressure( )

18

>=

<

Page 19: &+ ø Ä R X

Spark BackPressure( )spark streaming

19

Spark 1.5 Receiverspark.streaming.receiver.maxRate Spark Streaming v1.5 back-pressure ,

,spark.streaming.backpressure.enabled RateController, OnBatchCompleted

processingDelay schedulingDelay . Estimatorrate Receiver Input Stream rate ReceiverTracker

ReceiverSupervisorImpl BlockGenerator

Page 20: &+ ø Ä R X

Flink BackPressure( )

20

task1 task2 TaskManagertask task2

task 2 task1 task1 task1

task1 task2(TCP Channel)

TCP

Page 21: &+ ø Ä R X

Flink BackPressure( )

21

Flink Web Job Backpressure Sampling TaskJobManager 50ms Job Task 100

radio=0.01 100 1

Flink BackpressureOK: 0 <= Ratio <= 0.10LOW: 0.10 < Ratio <= 0.5HIGH: 0.5 < Ratio <= 1

Page 22: &+ ø Ä R X

Flink

22

1. At most once 2. At least one 3. Exactly once

Flink checkpoint

Lightweight Asynchronous Snapshots for Distributed Dataflows Flink Chandy-Lamport Flink

Flink snapshot1.2.

Page 23: &+ ø Ä R X

Flink checkpoint

23

Barrier Flink barrier barrier Barrier barrier

barrier ID barrier Barrier barrier

Asynchronous Barrier Snapshots

Flink Exactly once at least once

Page 24: &+ ø Ä R X

Flink savepoint

24

checkpoint Flink savepoint checkpoint checkpointFlink checkpoint savepoint checkpoint checkpoint

savepoint

flink list

flink savepoint job_id

flink cancel job_id

flink run -d -s hdfs://savepoint/1 ###.jar

Page 25: &+ ø Ä R X

25

1���� 1���� 15����

Page 26: &+ ø Ä R X

26

1.

2. -Flink

3.

4.Butterfly-Sql

Page 27: &+ ø Ä R X

27

Kafka Topic

Mongdb0-> Column11-> Column22-> Column3

N-1 ->ColumnN

Spark

hdfsrecord

Encoder

Parquet

ParquetStructField(“0”)StructField(“1”)StructField(“2”)

StructField(“N-1”)

Canal

ODS

CIF

Spark

ParquetStructField(“ C1”)StructField(“C2”)StructField(“C3”)

StructField(“Cn-1”)

HDFS

Json -> version1

Json -> version2Json -> version3

Flink

Trickle

Page 28: &+ ø Ä R X

28

Schema

+

cif

mysql/mongo

shema ()

parquet

Page 29: &+ ø Ä R X

29

1.

2. -Flink

3.

4.Butterfly-Sql

Page 30: &+ ø Ä R X

30

Scala scalaScala parser combinators

json sql

Scala parser combinators Scala “ ”

Scala parser combinators RegexParsers

StandardTokenParsers StandardTokenParsersSpark sql sql StandardTokenParsers

-Scala Parsers

DSL

Abstract Syntax Tree AST AST

AST AST

Page 31: &+ ø Ä R X

BNF

31

DSL BNF BNF

Backus Algol60 Backus-Naur Backus-Naur Form, BNF EBNF BNF

BNF ("word") double_quote    

< > :   [ ] :   { } : 0   |  : "OR"   ::= : “ ”   "..." :  

  [...] :   {...} : 0   (...) :   |   :  

BNF expr ::= term {'+' term | '-' term} term ::= factor {'*' factor | '/' factor} factor ::= floatingPointNumber | '(' expr ')'

{} 0 |  factor  floatingPointNumber  expr 

Page 32: &+ ø Ä R X

StandardTokenParsers

32

Scala EBNF Scala StandardTokenParsers

lexical.delimiters lexical.reservedDSL ”+","-","*","/","(",")"

lexical.reserve"if","then","SUM","COUNT" lexical.reserve

/

* Parser Parser * Parsers / / Parser * Reader / * ParseResult / / * Positional * Position

:Parser parser

* | Parser * ~ Parser * ~> ~ * <~ ~ * ^^ * opt(p) Some(x) None * rep repeat

Page 33: &+ ø Ä R X

33

Butterfly

Page 34: &+ ø Ä R X

34

ButterflyButterfly Sql Parser&AST, Table Engine

Sql Parser&AST : Sql Engine : AST Table 7 Table : Schema (MeteData)

Sql Query Sql Parser AST SelectStmt

projections: Seq[SqlProj]

relations: Option[Seq[SqlRelation]]

filter: Option[SqlExpr]

groupBy: Option[SqlGroupBy]

orderBy: Option[SqlOrderBy]

limit: Option[Int]

formFumc where groupBy aggre select orderBy limit

Table Table Table Table Table Table Table

Schema : Table Table : Row Row Map[String, MetaData[_]] LoadTableData loadSchema

34

Page 35: &+ ø Ä R X

35