40
Hands-on (Crash) Introduction to C++ for (Forensic) Scientists 3 2 1 0 1 2 3

Hands-on (Crash) Introduction to C++ for (Forensic) Scientists

Embed Size (px)

Citation preview

Hands-on (Crash) Introduction to C++ for (Forensic) Scientists

3 2 1 0 1 2 3

Outline• C++ : The standard for high performance scientific

computing• Why bother learning C++?

• Web resources that help when you get stuck

• Getting a computer to do anything useful

• Compilers (Open-source ones at least…)

• The IDE we’ll use: Netbeans

• Common data types

• Structure of a basic C/C++ program

• Compiling and running your first program

• Useful headers, the STL

• We live in oceans of data. Computers are essential to record and help analyse it.• Competent scientists speak C/C++, Java,

MATLAB, Python, Perl, R and/or Mathematica

• Codes can easily be made available for review

• If you can speak C++, you can speak anything

• Sometimes you need the speed of C++• Using C/C++ with R is now trivial with Rcpp

Why C++?

Tool

mar

ks (

scre

wdr

iver

str

iati

on p

rofi

les)

for

m d

atab

ase

Consecutive Matching Striae (CMS)-Space

Took ~1 week on “beefy” Mac Pro

1,786,980 “line” comparisions

Biasotti-Murdock Dictionary

Tool

mar

ks (

scre

wdr

iver

str

iati

on p

rofi

les)

for

m d

atab

ase

Biasotti-Murdock Dictionary

Approximate-Consecutive Matching Striae (CMS)-Space

Took ~3 min• Intensive steps

in C++• Parallel

“foreach” in R

Took ~20 min on same Mac Pro• Approximate

intensive steps and code in C++

• Google: “C++ how do I <your question here>”• Often answered on http://stackoverflow.com/ site

• www.learncpp.com/ (Great tutorials)

• http://www.cplusplus.com/ (Great reference)

• https://www.youtube.com/user/voidrealms

(Nice, clear “bite-sized” video tutorial series)

Web resources

• All machines understand is on/off!• High/low voltage

• High/low current

• High/low charge

• 1/0 binary digits (bits)

• To make a computer do anything, you have to speak machine language to it:

Getting a computer to do anything useful

000000 00001 00010 00110 00000 100000

Add 1 and 2. Store the result.Wikipedia

• Machine language is not intuitive and can vary a great deal over designs

• The basic operations operations however are the same, e.g.:• Move data here

• Combine these values

• Store this data

• Etc.

• “Human readable” language for basic machine operations: assembly language

Getting a computer to do anything useful

• Assembly is still cumbersome for (most) humans

Getting a computer to do anything useful

MOV AL, 61h

10110000 01100001

Assembly

A machine encoding

Move the number 97 over to “storage area” AL

• Better yet is a more “Englishy”, “high-level” language• Enter: C, C++, Fortran, Java, …

• Higher level languages like these are translated (“compiled”) to machine language• Not exactly true for Java, but it’s something

analogus…

• From now on we will just talk in C++

Getting a computer to do anything useful

• All you need to program is a text editor and a compiler• There are both commercial and open-source

compilers

• Open-source compilers are really good and have been around forever. They are the de-facto standard in science• GCC: GNU compiler collection (Unix/Linux, OS X)

• MinGW: a port for gcc for Windows

• Clang: From the LLVM Project (OS X, Unix/Linux)

Programming

• Minimalist programming with just:

• a text editor (people like vim or emacs)

• a compiler

• maybe some unix tools like make …

Programming

sucks for beginners

• To make life a tad easier, we’ll use an integrated development environment (IDE) called Netbeans.

Programming In Netbeans

Code WindowConsole/Output

Compile button Compile/Run button

• Computers are DUMB! We have to be explicit about the type of data we want to work with• int: 32 bit (4 byte) signed integer

• double: 64 bit (8 byte) floating point (decimal)

• std::string: a “string” of characters. Pretty high level representation however. More on this another day.

• Chatacters between “ ” or ‘ ’ are recognized as C++ strings.

Common Data Types

Structure of a Basic C++ program

All stand alone c++ programs must have an int main() function

“Header” file include section

Your program executes commands from the “main” code block

Comments are set off by: // (1 line each) -or- /*A block of text commentsBlah blah blah blah*/

• STL: C++ standard template library• Template: mechanism to handle data of any type

• Subject of templates goes VERY deep. For another day…

• Common/handy STL headers:• <iostream>: Basic input/output

• <vector>: Common container for numbers

• <fstream>: Basic file handling

• <cstdlib>: The old C standard library of functions

• <chron>: Timers for code performance measurement

Common STL headers

Outline• Pointers and References

• What’s the point??

• Arrays

• Basic memory management trivia and tips

• Basic control structures

• Functions

• Code files .cpp

• Header files .hpp

• Conditional blocks

• Looping

Pointers and References• (An art-world analogy…..) In computing:

• Our “medium” is data.

• Our “pottery wheel” is hardware.

• We want to get the most utility from our hardware.• Want to do a lot of work on data

• Don’t want the hardware to work too hard on each task• Copying data from place to place:

• Is time intensive

• Wasteful of precious memory (RAM)

• We always want to minimize copying!

• Pointers and references deal with memory addresses• REALLY handy for cutting down on copying

Pointers and References• Pointers/references refer to (point at) where data is• Some notation gymnastics:

• double x = 3.0; //A double

• &x; //The address of the data in x

• & is the address operator

• double *a_ptr //A variable that will //hold a memory addr• a_ptr called a pointer

• It DOESN’T point at anything yet!

• a_ptr = &x; //a_ptr “points at” the //data in x. It holds //x’s memory address

Pointers and References• a_ptr = &x; //a_ptr “points at” x.

• *a_ptr; //REFERS to the data in x. It //is the same thing as x, 3.0

//in this case.

• * operator serves two purposes!

• double *a_ptr DECLARES a pointer to a double.

• *a_ptr REFERS to what a_ptr is pointing at (called de-referencing). NOTE there is no type name in front.

• The C++ kosher thing to do is point at the data as soon as the pointer is declared:

• double *a_ptr = &x;

Pointers and References• So we learned about * operator and & operator:

• Getting used to using them requires practice!!• int y = 3;

• int *a_ptr2 = &y;

• cout << “What gets printed?: ” *a_ptr2 <<endl;

• y = 9;

• cout << “Now what?: ” *a_ptr2 <<endl;

• *a_ptr2 = 14;

• cout << “Now what?: ” *a_ptr2 <<endl;

• Why are these BAD?:• int *a_ptr3 = 18;

• int &z = 9;

Arrays• Pointers are great because they refer to data in-place.

• They prevent us having to copy data from place-to-place!

• This is very convenient when working with:

• files

• large vectors and matrices (arrays, STL containers, etc.)

• We usually (in memory) store related data together• arrays

• STL containers

• CAREFUL THOUGH:• Data pointer is pointing at can be changed unintentionally!

• Memory should be FREED when you are done with it (more later)

• For arrays (a little more primitive but sometimes offer a speed advantage) we will declare/free with the important operators

• new and delete.

Arrays

Pointer “arithmetic”:Indexing!

Output from code above

Arrays• Matrices are allocated and freed in a similar way:

• NOTE: We’ll not typically do this however. Usually we’ll use STL containers or the wonderful modern templated C++ linear algebra libraries: Armadillo (http://arma.sourceforge.net/) and Eigen (http://eigen.tuxfamily.org/)

Functions• A simple function called from main():

Define the function ABOVE where it is first used

Use the function in main

“arguments” to the function would go here

Functions• Here is a function with arguments:

Dummy variables. They will be substituted with actual argument when the function is actually called

Functions• You can define a function after it is used, but care must be

taken:

Define the function

The function “signature” MUST be defined before it is used

Explicit dummy variable names are not necessary.

Functions• The C++ kosher way to organize a function. Use separate

header and implementation files:

main.cpp

func.cpp (function implementation)

func_header_file.hpp (function signature)

Need to #include the header file here

Functions• If the function implementation isn’t too complicated define it

in the header file (cuts down on the C++ “bureaucracy”).

main.cppfunc_header_file.hpp

Conditional Statement Blocks

• If-else. These are equivalent:

Looping• Repeat (essentially) the same actions over and over: for-loop

• Fill up a matrix

• The main loop:• Fills up A

• This is the “matrix”

This is a lot of code to do a simple thing…

Later we’ll use libraries to cut down on the work and clean up the code!

Looping• Repeat (essentially) the same actions over and over: for-loop

• Dot product between vectors

• The loop:

• Define a dot product function

Note the return type

• Call the function in main()

.

Outline• Common C++ lingo

• Objects

• Classes

• Structure of your average class

• Inheritance/Polymorphism

• Operator overloading

• Basic templates

• Great libraries/Tools/Building blocks:• Armadillo

• Eigen

• Qt

• Boost

• Rcpp, RInside

Objects• An object: Pretty much any self contained entity in the

program.

• Common examples of objects:

• Variables (We saw these already)

• Functions (We saw these already)

• Classes (These are new!)

• Templates (These are new!)• Arbitrarily “typed” versions and combinations of the above!

• Object oriented means we want to build a program out of these very general objects

• We will learn to think about a program in terms of interacting class objects

Classes• A class encapsulates (somehow) related data and functions

(methods) to interact with it.

• Let say we we collect evidence from a crime scene:• “Evidence” will be a class

• Associated with the evidence class will be:• A case number

• Location of collection (a string)

• The evidence type

• Number of items collected

• These are the data of the class

• We need to set and access this data with functions for the user

• These are the methods of the class

Classes

Evidence class

Data members• Case #• Location• Type• # items

Method members• Get a data member• Set a data member

We can decide in the level of exposure these members have to other parts of the program

• Public• Private• Protected

Gets and Sets are common class methods

Classes• What does this look like in C++ code???

class keyword declares a class

Constructors will create instances of the class

“override” default constructor/destructor by explicitly declaring a new one

Optional initialization list for class member variables

Copy constructor to copy an instance of the class. We can override with a custom one. Explicit declaration is optional.

Class methods. Usually public

Destructor to delete an instantiated class instance

Class variables. Usually private or protected

Class declaration header file

Classes

evidence.cpp implementation file

evidence:: prefix indicated method of class• :: is the scope operator

Classes

Using the class

Create an instance of the class

#include header for class

Use public members of the class

When class “goes out of scope” (here when program ends) destructor automatically deallocates resources for it

Derived Classes• Let say we we collect evidence from a crime scene:

• “Evidence” will be a class

• XXXX

• XXXX