51
Bioinformatics Course Day 3 MySQL

Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Embed Size (px)

Citation preview

Page 1: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Bioinformatics CourseDay 3

MySQL

Page 2: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Topics

● Databases● MySQL● SQL● Permissions● Usage● Examples

Page 3: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

What are databases?

● DBMS (database management systems)● Data storage and provision● Software running on servers● Designed for high-capacity, high-

availability usage● Relational, object-orientated,

hierarchical, network model● Examples:

Oracle, PostgreSQL, MySQL, Sybase, DB2, dBASE, Microsoft SQL Server

Page 4: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

What do they do?

● Record storing● Indexing for quick access● Data organization● Data processing

Page 5: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Application areas

● BioPharma● E-Commerce● Education● Energy● Finance

● Government● Media● Retail● Telecom● Transport

Anywhere with large data volumes!

Page 6: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

MySQL Customers

● Bayer● Sanger● Ensembl● Google● Yahoo● Ticketmaster● Deutsche Post

● State of New York● UNICEF● Yamaha● Wikipedia● BT● Nokiache Post● Lufthansa

Page 7: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Why MySQL?

● World's most popular open source database ( 8 million active installations and 50,000 downloads per day)

● High-performance● Reliable● Ease of use● Free!

Page 8: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

What is SQL?

● Structured Query Language● create, modify, retrieve and manipulate

data● 1970's IBM: Structured English Query

Language ("SEQUEL"), later SQL● simple command set● intuitive

Page 9: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

● Most important: data selectionSELECT name,sequence FROM swissprot WHERE name = 'TLR4_HUMAN';

DELETE FROM blast WHERE expect > 1e-20;

UPDATE installs SET version = 8 WHERE db = 'uniprot';

● Data update:

INSERT INTO BLAST VALUES('TLR4_HUMAN', 'TLR4_PANPA', 1e-104);

● Data insertion:

● Data deletion:

Page 10: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

MySQL setup

MySQLServer

MySQLClient

MySQLClient

MySQLClient

MySQLClient

MySQLClient

MySQLClientlocal and

remote access

Page 11: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

MySQL accounts

● Administrator: root● Users: kahokamp, guest1

(not necessarily the same as login names)● Passwords: *******

(not necessarily the same as login passwords)

Page 12: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Permissions

● Assigned by administrator● Multiple levels:

– Access– Database usage– Select, Insert, Update, Delete, Drop, ...

● May depend on host

Page 13: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

$ mysql -h localhost -u guest -p

Command line access:

Page 14: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

$ mysql -h localhost -u guest -p

MySQLclient

programserver host user name password

Page 15: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

$ mysql -h localhost -u guest -p Enter password:Welcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 427 to server version: 5.0.18-standard-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

Page 16: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

$ mysql -h bioinf.gen.tcd.ie -u guest -p uniprot

Remote command line access:

preselect a database

Page 17: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

use DBI;

$user = 'guest';$host = 'bioinf.gen.tcd.ie';$password = '';$db = 'uniprot';

$dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password);

$statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”;$sth = $dbh->prepare($statement);$rv = $sth->execute;

unless ($rv >= 1) {die “No match!”;

}

($sequence) = $sth->fetchrow_array;

print “$sequence\n”;

Using Perl:

Page 18: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

use DBI;

$user = 'guest';$host = 'bioinf.gen.tcd.ie';$password = '';$db = 'uniprot';

$dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password);

$statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”;$sth = $dbh->prepare($statement);$rv = $sth->execute;

unless ($rv >= 1) {die “No match!”;

}

($sequence) = $sth->fetchrow_array;

print “$sequence\n”;

Using Perl: database connection

module

access details

connection

query

data retrieval

Page 19: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Connection

Using the Web (PHPMyAdmin):

Page 20: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Orientation

mysql> SHOW TABLES;+-------------------+| Tables_in_uniprot |+-------------------+| swissprot |+-------------------+1 row in set (0.00 sec)mysql>

Show what's available:

Page 21: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Orientation

mysql> SHOW DATABASES;+--------------------+| Database |+--------------------+| information_schema || test || uniprot || uniprotKB8 |+--------------------+4 rows in set (0.00 sec)mysql>

What other databases are there?

Page 22: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Orientation

mysql> SHOW DATABASES;+--------------------+| Database |+--------------------+| information_schema || test || uniprot || uniprotKB8 |+--------------------+4 rows in set (0.00 sec)mysql> USE TEST;Database changedmysql>

What other databases are there?

Page 23: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Organization

uniprot test

swissprot test1 test2

test4test3

MySQL Server

databases

tables

Page 24: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Permissions

● Creation of databases:– Normally only by administrator (root)

● Creation of tables:– All users with according permissions

● Special database 'test':– Normally accessible by all users

● Special user 'guest':– Limited access– Empty password

Page 25: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Work flow

Create database

Create of table(s)

Insert data

Query database

Page 26: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Table creation

– Text– Numbers – Dates– Binary data– Sets

Table columns need to be defined!

Column types:

Page 27: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot;+-------------+--------+| name | length |+-------------+--------+| 104K_THEAN | 893 || 104K_THEPA | 924 || 108_LYCES | 102 || 10KD_VIGUN | 75 |.........| ZYX_CHICK | 542 || ZYX_HUMAN | 572 || ZYX_MOUSE | 564 |+-------------+--------+222289 rows in set (0.89 sec)

Page 28: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot LIMIT 10;+-------------+--------+| name | length |+-------------+--------+| 104K_THEAN | 893 || 104K_THEPA | 924 || 108_LYCES | 102 || 10KD_VIGUN | 75 || 110KD_PLAKN | 296 || 11S2_SESIN | 459 || 11S3_HELAN | 493 || 11SB_CUCMA | 480 || 128UP_DROME | 368 || 12AH_CLOS4 | 29 |+-------------+--------+10 rows in set (0.00 sec)

Page 29: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot LIMIT 222279,10;+-------------+--------+| name | length |+-------------+--------+| ZYG12_CAEEL | 774 || ZYG1_CAEBR | 709 || ZYG1_CAEEL | 706 || ZYGBL_HUMAN | 766 || ZYGBL_MOUSE | 779 || ZYGBL_PONPY | 766 || ZYS3_CHLRE | 371 || ZYX_CHICK | 542 || ZYX_HUMAN | 572 || ZYX_MOUSE | 564 |+-------------+--------+10 rows in set (0.60 sec)

Page 30: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot ORDER BY length LIMIT 10;+------------+--------+| name | length |+------------+--------+| GWA_SEPOF | 2 || ACI_TRIGI | 3 || GRWM_HUMAN | 3 || LUXE_VIBFI | 3 || TRH_BOMOR | 3 || TRH_NOTVI | 3 || TRH_PIG | 3 || TRH_SHEEP | 3 || ACH1_ACHFU | 4 || DCML_PSECH | 4 |+------------+--------+10 rows in set (0.00 sec)

Page 31: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT * FROM swissprot WHERE length = 2;+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+| name | accession | version | dataset | created | modified | prot_name | component | type | lineage | tax_id | organism | checksum | length | seq_version | mass | seq_date | sequence | keyword |+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+| GWA_SEPOF | P83570 | 15 | Swiss-Prot | 2004-01-16 | 2006-02-07 | Neuropeptide GWa | | | Eukaryota; Metazoa; Mollusca; Cephalopoda; Coleoidea; Neocoleoidea; Decapodiformes; Sepioidea; Sepiidae; Sepia | 6610 | Sepia officinalis (Common cuttlefish) | 7378100000000000 | 2 | 1 | 261 | 2003-06-01 | GW | Amidation; Direct protein sequencing; Neuropeptide |+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+1 row in set (0.00 sec)

Page 32: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,sequence,organism,prot_name FROM swissprot WHERE length = 2;+-----------+----------+---------------------------------------+------------------+| name | sequence | organism | prot_name |+-----------+----------+---------------------------------------+------------------+| GWA_SEPOF | GW | Sepia officinalis (Common cuttlefish) | Neuropeptide GWa |+-----------+----------+---------------------------------------+------------------+

1 row in set (0.00 sec)

Page 33: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT * FROM swissprot WHERE length = 2 \G*************************** 1. row *************************** name: GWA_SEPOF accession: P83570 version: 15 dataset: Swiss-Prot created: 2004-01-16 modified: 2006-02-07 prot_name: Neuropeptide GWa component: type: lineage: Eukaryota; Metazoa; Mollusca; Cephalopoda; ... tax_id: 6610 organism: Sepia officinalis (Common cuttlefish) checksum: 7378100000000000 length: 2seq_version: 1 mass: 261 seq_date: 2003-06-01 sequence: GW keyword: Amidation; Direct protein sequencing; Neuropeptide1 row in set (0.01 sec)

Page 34: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot ORDER BY length DESC LIMIT 10;+-------------+--------+| name | length |+-------------+--------+| DIG1_CAEEL | 13100 || SYNE1_HUMAN | 8797 || ANC1_CAEEL | 8545 || UNC89_CAEEL | 8081 || OBSCN_HUMAN | 7968 || LGRC_BREPA | 7756 || BPA1_MOUSE | 7389 || R1AB_CVMJH | 7180 || R1AB_CVMA5 | 7176 || R1AB_CVM2 | 7124 |+-------------+--------+10 rows in set (0.00 sec)

Page 35: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot WHERE length < 10;+-------------+--------+| name | length |+-------------+--------+| GWA_SEPOF | 2 || ACI_TRIGI | 3 || GRWM_HUMAN | 3 || LUXE_VIBFI | 3 |.........| UPA7_HUMAN | 9 || XYLA_STRS8 | 9 || YBFR_AZOVI | 9 |+-------------+--------+365 rows in set (0.01 sec)

Page 36: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT COUNT(*) FROM swissprot WHERE length < 10;+----------+| count(*) |+----------+| 365 |+----------+1 row in set (0.00 sec)

Page 37: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT DISTINCT length FROM swissprot WHERE length < 10;+--------+| length |+--------+| 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 |+--------+8 rows in set (0.00 sec)

Page 38: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT length, COUNT(length) FROM swissprot WHERE length < 10 GROUP BY length;+--------+---------------+| length | COUNT(length) |+--------+---------------+| 2 | 1 || 3 | 7 || 4 | 22 || 5 | 30 || 6 | 18 || 7 | 50 || 8 | 103 || 9 | 134 |+--------+---------------+8 rows in set (0.00 sec)

Page 39: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

CREATE TABLE test.splen SELECT length, COUNT(length) FROM swissprot GROUP BY length;

Query OK, 2717 rows affected (0.30 sec)Records: 2717 Duplicates: 0 Warnings: 0

Page 40: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT * FROM test.splen ORDER BY `COUNT(length)` DESC LIMIT 10;+--------+---------------+| length | COUNT(length) |+--------+---------------+| 379 | 1004 || 146 | 921 || 141 | 749 || 156 | 694 || 148 | 633 || 207 | 591 || 155 | 590 || 152 | 579 || 215 | 573 || 119 | 570 |+--------+---------------+10 rows in set (0.01 sec)

Page 41: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR4\_PA%';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) || TLR4_PAPAN | Papio anubis (Olive baboon) |+------------+---------------------------------+2 rows in set (0.00 sec)

Wild-cards: _ (single character)% (multiple characters)

Escape with backslash (\)!

Page 42: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,organism FROM swissprot WHERE NAME LIKE 'tlr4\_PA%';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) || TLR4_PAPAN | Papio anubis (Olive baboon) |+------------+---------------------------------+2 rows in set (0.00 sec)

Case-insensitive (unless binary format)!

Page 43: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,organism FROM swissprot WHERE NAME = 'TLR__PANPA';Empty set (0.00 sec)

Page 44: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR__PANPA';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) |+------------+---------------------------------+1 row in set (0.00 sec)

Page 45: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

SQL Examples

SELECT name,length FROM swissprot WHERE NAME REGEXP '^TLR[4-9]\_HUMAN';+------------+--------+| name | length |+------------+--------+| TLR4_HUMAN | 839 || TLR5_HUMAN | 858 || TLR6_HUMAN | 796 || TLR7_HUMAN | 1049 || TLR8_HUMAN | 1041 || TLR9_HUMAN | 1032 |+------------+--------+6 rows in set (0.00 sec)

Page 46: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Normalization

● Optimize database design● Avoid duplication of data● Least redundancy in tables

Page 47: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Normalization

Name KeywordTLR4_HUMAN Direct protein sequencing; Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;TLR4_MOUSE Disease mutation; Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;TLR4_BOVIN Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;

Bad design! Repetition of entries, difficult

to index and awkward to search

Page 48: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Normalization

Name Keyword1 Keyword2 Keyword3TLR4_HUMAN Direct protein sequencing Glycoprotein Immune responseTLR4_MOUSE Disease mutation Glycoprotein Immune responseTLR4_BOVIN Glycoprotein Immune response Inflammatory response

Alternative Design:

Not optimal either:different number of keywords for each entry

still very repetitive

Page 49: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Normalization

Name SequenceTLR4_HUMAN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKTLR4_MOUSE MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKOUSETLR4_BOVIN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASK

ID Keyword1 Direct protein sequencing 2 Disease mutation3 Glycoprotein4 Immune response

ID1 ID21 TLR4_HUMAN2 TLR4_MOUSE3 TLR4_HUMAN3 TLR4_MOUSE3 TLR4_BOVIN

Normalized version:

Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name

Page 50: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

Normalization

Name SequenceTLR4_HUMAN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKTLR4_MOUSE MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKOUSETLR4_BOVIN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASK

ID Keyword1 Direct protein sequencing 2 Disease mutation3 Glycoprotein4 Immune response

ID1 ID21 TLR4_HUMAN2 TLR4_MOUSE3 TLR4_HUMAN3 TLR4_MOUSE3 TLR4_BOVIN

Normalized version:

Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name

Page 51: Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

More Info

● MySQL tutorials on the web

● Learning MySQL (O'Reilly)

● http://dev.mysql.com/doc/ (searchable and browsable on-line)