Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Apache Cassandra tour (lab exercise)
COSC430—Advanced Databases David Eyers
Learning objectives
• You should be able to • understand the architecture of Cassandra and its replication
strategies • explain how a distributed database works using Cassandra as
an example • understand the installation and configuration of Cassandra • understand why Cassandra can provide high availability with
no single point failure
• There is no assessment for this lab
2COSC430 Apache Cassandra lab exercise, 2020
What is Apache Cassandra?
• Apache Cassandra is a free and open-source distributed NoSQL DBMS signed to handle vast amounts of data across large clusters of commodity servers, providing high availability with no single point of failure
3COSC430 Apache Cassandra lab exercise, 2020
Cassandra uses peer-to-peer architecture
Elements in Cassandra: • Cluster • Data center(s) • Rack(s)
• Server(s) • Node(s)
• Uses a gossip protocol for communication between nodes • Cassandra Query Language
(CQL)—many similarities to SQL4COSC430 Apache Cassandra lab exercise, 2020
Application
Data replication
• Nodes are logically structured in a ring topology
• Each data item replicated at N (replication factor) nodes • Two replication strategies: • SimpleStrategy • use only for a single data centre and one rack • replicas are placed on the next node clockwise in the ring without
considering topology (i.e., rack or datacenter location)
• NetworkTopologyStrategy • cluster can be deployed across multiple data centres • attempts to place replicas on distinct racks because nodes in the same
rack (or similar physical grouping) often fail at the same time5COSC430 Apache Cassandra lab exercise, 2020
Apache Cassandra’s data model
6COSC430 Apache Cassandra lab exercise, 2020COSC430 Lecture 6: Apache Cassandra Tour 6
Data Model
keyspace
settings
column family
settings column
name value timestamp
Virtualisation: abstracting over resources
• Single OS per machine
• Software and hardware tightly coupled
• Underutilised resources
• Inflexible and costly infrastructure
• Hardware independent of OS and applications
• Virtual machines to any system
• OS and application as a single unit into virtual
7COSC430 Apache Cassandra lab exercise, 2020COSC430 Lecture 6: Apache Cassandra Tour 7
Virtualization • Separation of resource from the underlying hardware • An abstraction layer on top of the hardware
�
7.8"3& �8)*5&�1"1&3
*OUSPEVDUJPO"NPOH�UIF�MFBEJOH�CVTJOFTT�DIBMMFOHFT�DPOGSPOUJOH�$*0T�BOE�*5�NBOBHFST�UPEBZ�BSF��DPTU�FGGFDUJWF�VUJMJ[BUJPO�PG�*5�JOGSBTUSVD�UVSF��SFTQPOTJWFOFTT�JO�TVQQPSUJOH�OFX�CVTJOFTT�JOJUJBUJWFT��BOE�GMFYJCJMJUZ�JO�BEBQUJOH�UP�PSHBOJ[BUJPOBM�DIBOHFT��%SJWJOH�BO�BEEJUJPOBM�TFOTF�PG�VSHFODZ�JT�UIF�DPOUJOVFE�DMJNBUF�PG�*5�CVEHFU�DPOTUSBJOUT�BOE�NPSF�TUSJOHFOU�SFHVMBUPSZ�SFRVJSFNFOUT��7JSUVBMJ[BUJPO�JT�B�GVOEBNFOUBM�UFDIOPMPHJDBM�JOOPWBUJPO�UIBU�BMMPXT�TLJMMFE�*5�NBOBHFST�UP�EFQMPZ�DSFBUJWF�TPMVUJPOT�UP�TVDI�CVTJOFTT�DIBMMFOHFT�
7JSUVBMJ[BUJPO�0WFSWJFX
7JSUVBMJ[BUJPO�JO�B�/VUTIFMM4JNQMZ�QVU �WJSUVBMJ[BUJPO�JT�BO�JEFB�XIPTF�UJNF�IBT�DPNF��5IF�UFSN�WJSUVBMJ[BUJPO�CSPBEMZ�EFTDSJCFT�UIF�TFQBSBUJPO�PG�B�SFTPVSDF�PS�SFRVFTU�GPS�B�TFSWJDF�GSPN�UIF�VOEFSMZJOH�QIZTJDBM�EFMJWFSZ�PG�UIBU�TFSWJDF��8JUI�WJSUVBM�NFNPSZ �GPS�FYBNQMF �DPNQVUFS�TPGUXBSF�HBJOT�BDDFTT�UP�NPSF�NFNPSZ�UIBO�JT�QIZTJDBMMZ�JOTUBMMFE �WJB�UIF�CBDLHSPVOE�TXBQQJOH�PG�EBUB�UP�EJTL�TUPSBHF��4JNJMBSMZ �WJSUVBMJ[BUJPO�UFDIOJRVFT�DBO�CF�BQQMJFE�UP�PUIFS�*5�JOGSBTUSVDUVSF�MBZFST���JODMVEJOH�OFUXPSLT �TUPSBHF �MBQUPQ�PS�TFSWFS�IBSEXBSF �PQFSBUJOH�TZTUFNT�BOE�BQQMJDBUJPOT�
5IJT�CMFOE�PG�WJSUVBMJ[BUJPO�UFDIOPMPHJFT���PS�WJSUVBM�JOGSBTUSVD�UVSF���QSPWJEFT�B�MBZFS�PG�BCTUSBDUJPO�CFUXFFO�DPNQVUJOH �TUPSBHF�BOE�OFUXPSLJOH�IBSEXBSF �BOE�UIF�BQQMJDBUJPOT�SVOOJOH�PO�JU�TFF�'JHVSF����5IF�EFQMPZNFOU�PG�WJSUVBM�JOGSBTUSVDUVSF�JT�OPO�EJTSVQUJWF �TJODF�UIF�VTFS�FYQFSJFODFT�BSF�MBSHFMZ�VODIBOHFE��)PXFWFS �WJSUVBM�JOGSBTUSVDUVSF�HJWFT�BENJOJTUSBUPST�UIF�BEWBOUBHF�PG�NBOBHJOH�QPPMFE�SFTPVSDFT�BDSPTT�UIF�FOUFS�QSJTF �BMMPXJOH�*5�NBOBHFST�UP�CF�NPSF�SFTQPOTJWF�UP�EZOBNJD�PSHBOJ[BUJPOBM�OFFET�BOE�UP�CFUUFS�MFWFSBHF�JOGSBTUSVDUVSF�JOWFTUNFOUT�
'JHVSF����7JSUVBMJ[BUJPO
0QFSBUJOH�4ZTUFN
"QQMJDBUJPO"QQMJDBUJPO
0QFSBUJOH�4ZTUFN
7.XBSF�7JSUVBMJ[BUJPO�-BZFS
Y���"SDIJUFDUVSF
$16 .FNPSZ /*$ %JTL
"GUFS�7JSUVBMJ[BUJPO�t�)BSEXBSF�JOEFQFOEFODF�PG�PQFSBUJOH�TZTUFN�BOE�BQQMJDBUJPOT
t�7JSUVBM�NBDIJOFT�DBO�CF�QSPWJTJPOFE�UP�BOZ�TZTUFN�
t�$BO�NBOBHF�04�BOE�BQQMJDBUJPO�BT�B�TJOHMF�VOJU�CZ�FODBQTVMBUJOH�UIFN�JOUP�WJSUVBM�NBDIJOFT
#FGPSF�7JSUVBMJ[BUJPO�t�4JOHMF�04�JNBHF�QFS�NBDIJOF
t�4PGUXBSF�BOE�IBSEXBSF�UJHIUMZ�DPVQMFE�
t�3VOOJOH�NVMUJQMF�BQQMJDBUJPOT�PO�TBNF�NBDIJOF�PGUFO�DSFBUFT�DPOGMJDU
t�6OEFSVUJMJ[FE�SFTPVSDFT
t�*OGMFYJCMF�BOE�DPTUMZ�JOGSBTUSVDUVSF
"QQMJDBUJPO
0QFSBUJOH�4ZTUFN
Y���"SDIJUFDUVSF
$16 /*$ %JTL.FNPS Z
�
7.8"3& �8)*5&�1"1&3
*OUSPEVDUJPO"NPOH�UIF�MFBEJOH�CVTJOFTT�DIBMMFOHFT�DPOGSPOUJOH�$*0T�BOE�*5�NBOBHFST�UPEBZ�BSF��DPTU�FGGFDUJWF�VUJMJ[BUJPO�PG�*5�JOGSBTUSVD�UVSF��SFTQPOTJWFOFTT�JO�TVQQPSUJOH�OFX�CVTJOFTT�JOJUJBUJWFT��BOE�GMFYJCJMJUZ�JO�BEBQUJOH�UP�PSHBOJ[BUJPOBM�DIBOHFT��%SJWJOH�BO�BEEJUJPOBM�TFOTF�PG�VSHFODZ�JT�UIF�DPOUJOVFE�DMJNBUF�PG�*5�CVEHFU�DPOTUSBJOUT�BOE�NPSF�TUSJOHFOU�SFHVMBUPSZ�SFRVJSFNFOUT��7JSUVBMJ[BUJPO�JT�B�GVOEBNFOUBM�UFDIOPMPHJDBM�JOOPWBUJPO�UIBU�BMMPXT�TLJMMFE�*5�NBOBHFST�UP�EFQMPZ�DSFBUJWF�TPMVUJPOT�UP�TVDI�CVTJOFTT�DIBMMFOHFT�
7JSUVBMJ[BUJPO�0WFSWJFX
7JSUVBMJ[BUJPO�JO�B�/VUTIFMM4JNQMZ�QVU �WJSUVBMJ[BUJPO�JT�BO�JEFB�XIPTF�UJNF�IBT�DPNF��5IF�UFSN�WJSUVBMJ[BUJPO�CSPBEMZ�EFTDSJCFT�UIF�TFQBSBUJPO�PG�B�SFTPVSDF�PS�SFRVFTU�GPS�B�TFSWJDF�GSPN�UIF�VOEFSMZJOH�QIZTJDBM�EFMJWFSZ�PG�UIBU�TFSWJDF��8JUI�WJSUVBM�NFNPSZ �GPS�FYBNQMF �DPNQVUFS�TPGUXBSF�HBJOT�BDDFTT�UP�NPSF�NFNPSZ�UIBO�JT�QIZTJDBMMZ�JOTUBMMFE �WJB�UIF�CBDLHSPVOE�TXBQQJOH�PG�EBUB�UP�EJTL�TUPSBHF��4JNJMBSMZ �WJSUVBMJ[BUJPO�UFDIOJRVFT�DBO�CF�BQQMJFE�UP�PUIFS�*5�JOGSBTUSVDUVSF�MBZFST���JODMVEJOH�OFUXPSLT �TUPSBHF �MBQUPQ�PS�TFSWFS�IBSEXBSF �PQFSBUJOH�TZTUFNT�BOE�BQQMJDBUJPOT�
5IJT�CMFOE�PG�WJSUVBMJ[BUJPO�UFDIOPMPHJFT���PS�WJSUVBM�JOGSBTUSVD�UVSF���QSPWJEFT�B�MBZFS�PG�BCTUSBDUJPO�CFUXFFO�DPNQVUJOH �TUPSBHF�BOE�OFUXPSLJOH�IBSEXBSF �BOE�UIF�BQQMJDBUJPOT�SVOOJOH�PO�JU�TFF�'JHVSF����5IF�EFQMPZNFOU�PG�WJSUVBM�JOGSBTUSVDUVSF�JT�OPO�EJTSVQUJWF �TJODF�UIF�VTFS�FYQFSJFODFT�BSF�MBSHFMZ�VODIBOHFE��)PXFWFS �WJSUVBM�JOGSBTUSVDUVSF�HJWFT�BENJOJTUSBUPST�UIF�BEWBOUBHF�PG�NBOBHJOH�QPPMFE�SFTPVSDFT�BDSPTT�UIF�FOUFS�QSJTF �BMMPXJOH�*5�NBOBHFST�UP�CF�NPSF�SFTQPOTJWF�UP�EZOBNJD�PSHBOJ[BUJPOBM�OFFET�BOE�UP�CFUUFS�MFWFSBHF�JOGSBTUSVDUVSF�JOWFTUNFOUT�
'JHVSF����7JSUVBMJ[BUJPO
0QFSBUJOH�4ZTUFN
"QQMJDBUJPO"QQMJDBUJPO
0QFSBUJOH�4ZTUFN
7.XBSF�7JSUVBMJ[BUJPO�-BZFS
Y���"SDIJUFDUVSF
$16 .FNPSZ /*$ %JTL
"GUFS�7JSUVBMJ[BUJPO�t�)BSEXBSF�JOEFQFOEFODF�PG�PQFSBUJOH�TZTUFN�BOE�BQQMJDBUJPOT
t�7JSUVBM�NBDIJOFT�DBO�CF�QSPWJTJPOFE�UP�BOZ�TZTUFN�
t�$BO�NBOBHF�04�BOE�BQQMJDBUJPO�BT�B�TJOHMF�VOJU�CZ�FODBQTVMBUJOH�UIFN�JOUP�WJSUVBM�NBDIJOFT
#FGPSF�7JSUVBMJ[BUJPO�t�4JOHMF�04�JNBHF�QFS�NBDIJOF
t�4PGUXBSF�BOE�IBSEXBSF�UJHIUMZ�DPVQMFE�
t�3VOOJOH�NVMUJQMF�BQQMJDBUJPOT�PO�TBNF�NBDIJOF�PGUFO�DSFBUFT�DPOGMJDU
t�6OEFSVUJMJ[FE�SFTPVSDFT
t�*OGMFYJCMF�BOE�DPTUMZ�JOGSBTUSVDUVSF
"QQMJDBUJPO
0QFSBUJOH�4ZTUFN
Y���"SDIJUFDUVSF
$16 /*$ %JTL.FNPS Z
• Single OS per machine• Software and hardware tightly coupled• Underutilized resources• Inflexible and costly infrastructure
• Hardware independent of OS and applications• Virtual machines to any system• OS and application as a single unit into virtual
machines
Docker and Vagrant
• Docker • Provides OS-level virtualisation, also known as containerisation • Package an application and its dependencies in a virtual
container that can be installed and run on any Linux server • Lightweight—a single server or virtual machine can run a large
number of containers simultaneously
• Vagrant • An open-source software platform for managing virtual
software development environments • Vagrant sits as a layer over the top of virtualisation software
8COSC430 Apache Cassandra lab exercise, 2020
Apache Cassandra lab exercise
• You can view a formatted version of the Markdown file containing the instructions at the following URL:https://altitude.otago.ac.nz/cosc430/cassandra-intro/-/blob/master/README.md
• In the past a PDF version of the instructions was provided, however some people’s PDF viewers were copy/pasting commands with extra spaces, so I have removed the PDF version
9COSC430 Apache Cassandra lab exercise, 2020