42
Introduction to Surabhi Gupta

Git basics with notes

Embed Size (px)

Citation preview

Introduction to

Surabhi Gupta

Fast, open-source, distributed source control system.

Client-Server vs Distributed models

VCS SERVER

Version 1

Version 2

Version 3

Version 1 Version 1

Version 1

Version 2

Version 3

Version 1

Version 2

Version 3

Version 1

Version 2

Version 3

To see what a distributed source control system looks like, let us contrast it with a client-server model. In this model, you checkout one snapshot — the state of a file or files at a particular point in time. In a distributed model, you checkout everything locally.

Advantages of Git over P4

Perforce (Client-Server) Git (Distributed)

Version management system Source control system

Slow due to network latency and increased dependency on server calls Fast! Work locally, offline

Intermediate work cannot be easily saved to P4

Various checkpoints for saving intermediate work

Difficult to experiment Facilitates experimentation

A merger is typically responsible for merging between branches

The developer is responsible for merging their branch into master

Perforce model is centered around being able to MANAGE branches. One can restrict branches, setup policies for checking in, etc. Since changing the history of a branch in P4 is an admin-only privilege and is virtually never done, Perforce is good at keeping an audit trail of your commits. On the other hand, git allows you to change the history of a branch completely, as we will see later on. !Why people love Git? Almost all the work is done locally — lots of freedom when you’re doing work.

Server for Git

❖ Github, Stash, CloudForge, etc are code management and collaboration tools for Git repos!

❖ They provide fine grained control over permissions, audit of commit history.!

❖ The distributed model of Git facilitates open source projects since individuals can easily fork off repos and merge the changes back in.

You may ask why we need a server in a distributed model? The central server is just another Git repo that everyone has access to and that the team uses to synchronize their work. It is mainly used for collaboration and is designated as the ‘source of truth’. It can be switched out with another repo easily. Distributed model advantage for open-source projects: if a repo for an open-source project is no longer being maintained by the owner but there is interest in the community to keep it alive, someone can fork it off. Over time, changes will be contributed to this location and it will become the de-factor new home for the project.

Scope of the talk❖ Various roles require different levels of expertise in Git:!

❖ Manager !

❖ Software Engineer/QA Engineer !

❖ Merger/Release Engineer — consumer of git scripts!

❖ Develop scripts that extend git functionality — deep dive into git internals.!

❖ We will cover concepts and commands that will come in handy in your day-to-day work as a developer.!

❖ This talk is a road map of the Git world. Hopefully, it will whet your appetite for exploring the trails.

!Roles: managers: usage of Git will most likely be limited to checking out branches Developers require a working knowledge of git Merger - consumer of git scripts, such as those for bulk merging across releases Develop tools to extend git functionality — deep dive into git internals. !This talk is primarily designed for a developer.

Roadmap

❖ Content hashing!

❖ Blobs to Branches!

❖ Staging and committing !

❖ Remotes and pull requests!

❖ Merge conflicts!

❖ Git resources

Roadmap for the presentation.

Content Hashing

❖ Contents are referenced using their hashes: !

sha1(“blob ” + fileSize + “\0” + fileContent)!

echo “foobar” > foo.txt git hash-object foo.txt = sha1 (“blob 7\0foobar\n”)!

323fae03f4606ea9991df8befbb2fca795e648fa!

❖ Fun fact: Renames are not stored in the repo. They’re computed by commands such as git diff, git merge, etc.

SHA1: secure hash algorithm, used on the content of downloaded files to verify that the content is authentic !$ sha1("blob 7\0foobar\n") = "323fae03f4606ea9991df8befbb2fca795e648fa" $ echo "foobar" > foo.txt $ git hash-object foo.txt 323fae03f4606ea9991df8befbb2fca795e648fa !This is a low-level concept but it introduced you to the fundamental representations used by Git. It also helps you build intuition for the graph structures, as we will cover in the following slides. !Renames are computed based on the similarity between the contents of a ‘deleted’ and an ‘added’ file. mv a.txt b.txt git add -A . Output: renamed: force.txt -> fourth.txt

Blobs to trees❖ A tree is an object that stores !

a) blob!

b) subtree!

❖ Each of these contain metadata about their mode, type and name!❖ A tree object can contain objects of type “blob” or “tree”.!

❖ Example modes: 100755 means it’s an executable file, 120000 specifies a symbolic link

Trees are analogous to directories on a file system. Let us build upon the notion of blobs and see how they come together to form trees.

Git Internals: Tree

blob

blob

tree

Commit from trees

❖ A commit is a pointer to a tree!

❖ It is pointed to by one or more parent commits!

❖ It also contains metadata about its:!

1) Author !

2) Committer

Example description of a commit object: tree 9acd01e7390a64900bde0b9749f462c53ccb3c65 parent 770479ca34ffd3450d406228f32aa1cb1d8564a0 author Joan Doe <[email protected]> 1421112508 -0800 committer John Doe <[email protected]> 1421112508 -0800 !Author is the person who originally authored the commit. Anyone who patches the commit after creation is a ‘committer’.

Git Internals: Commit

parent!commit

tree’

tree blob’

blob

commit

Commits to trees

parent!commit

commit

tree

tree blob

blob

tree’

blob’

blob

Reuse of objects

tree

tree blob

blob

tree’

blob’

blob

parent!commit

commit

Reusing blob/tree !from elsewhereor

… under-the-hood!object!

sharing

Since only blob was changed to blob’ in this commit, other git objects (trees and blobs) can be reused.

Reuse of objects within a tree

“B”“A” “C”

“A”

tree

Blobs can be shared within!a single tree.

The contents of the blob that is grayed out are identical to another blob. These two will there share a common underlying object.

Multiple parents

P1 P2

C

Git fundamentally forms a directed, acyclic graph. !

Multiple parents

T1

B1

T2

B2

T3

B3

P1 P2

C

Commits with multiple parents!have a one-to-one relationship with trees, !

similar to commits with single parents

Gain familiarity with the idea of a commit having two parents.

Branch - pointer to a commitMaster

git branch

The branch pointer moves with the HEAD, as you make additional commits. Git branch command shows all the local branches.

HEAD - pointer to the current commit

HEAD

git checkout C

Master

HEAD

Master

C C

The checkout command allows you to specify any ref such as a commit SHA, a branch name or even a relative path such as HEAD~1.

All your codebase are belong to me

❖ git clone!

❖ git log

Version 1

Version 2

Version 3

Version 1

Version 2

Version 3

Version 1

Version 2

Version 3

Server/Remote

You Peer

Download a repo to your local machine using `git clone` !git branch -a to see both local and remote branches When a branch is checkout out for the first time, a local copy of the branch is created. There is nothing special about the repo hosted on the server from the perspective of git — in fact, you could set up a remote that is another git repo on your local machine and pull/push to it just like you would here.

Our first commit

❖ echo “May the 4th” >> “force.txt”!

❖ git status!

❖ git add force.txt!

❖ git diff —cached!

❖ git commit -m “May the force be with you”

After creating a new file, we need to add it to the git index before we can view the diff. Use git diff —cached to see the differences between the HEAD and the staging area. Use git diff to see the differences between the staged and the unstaged files.

C3

C2

C1

C4

C3

master

C2

C1

You

Remote

remotes/master

master

git branch -a will show all the local and the remote branches Master is tracking remotes/master Master is a branch and therefore, as we make a new commit on this branch, the pointer moves forward. Tag is a pointer to a commit that cannot be moved, while branches can.

C4

C3

C2

C1

You

git push

Remote

C4

C3

C2

C1

origin/master

master

master

You may ask, What if I made a mistake?

What if I made a mistake?

Undo unstaged changes

force.txt

git checkout — force.txt

echo “new” >> force.txt

Com

mitt

edSt

agin

g !

Are

aU

nsta

ged!

chan

ges

Unstage changes

force.txt

force.txt

git reset HEAD force.txt

git add force.txt

Com

mitt

edSt

agin

g !

Are

aU

nsta

ged!

chan

ges

git add is actually adding the changes to the index. The add command should be interpreted as “add any new updates” rather than “add new file”. force.txt is already being tracked in the Git index; `git add` stages the new addition to the file namely the word “new”. !Note: As mentioned previously, you can use `git diff —cached` to see the differences between the HEAD and the staging area. It will output ‘+new’ for the diagram on the left and will output nothing for the right diagram. Use git diff to see the differences between the unstaged and staged (or committed, if nothing is staged) versions of the file. It will output ‘+new’ for the diagram on the right and will output nothing for the left diagram.

Uncommit changes

force.txt

force.txt

git reset —soft HEAD^

git commit -m “Second commit”

Com

mitt

edSt

agin

g !

Are

aU

nsta

ged!

chan

ges

Note: git reset —soft HEAD^ will not change your local working directory. It will merely move the changes from a committed state to a staged state. git reset --hard HEAD^ which will completely blow away all changes between your current HEAD and the reference you specify. As we saw, there are a number of checkpoints in your git workflow. If used wisely, you will never have to wonder what the last “working” state of your codebase was before you made some breaking changes.

Typical workflow

Typically, if your team has more than one person, you wouldn’t commit to master directly. Recommended workflow:!

1) Check out a private branch!

2) Commit to the branch, and regularly push to remote.!

3) When the work is complete, get a code review (likely via a pull request) and merge the branch into master

Also, regularly rebase over master, assuming you are working in a private branch.

Step 1: Create a new branch

git branch bugFix

HEAD

masterbugFix

HEAD

master

Checkout said branch

git checkout bugFix

bugFixHEAD

masterbugFix

HEAD

master

Current branch

Now your pointer is at bugFix. These two commands can be combined into one: git checkout -b bugFix. It is helpful to decompose a command when first learning git as it gives you a glimpse into the atomic actions being performed by git.

Step 2: Feature development

HEAD

master

B

CbugFix

masterB

C bugFix

D

Local Remote

A A

If you want to experiment with an alternate codeline, you can easily do this in a new branch off of master. git checkout master git checkout -b newDirection !Let us assume that while you’ve been working on bugFix, someone else has committed their changes to the master branch causing it to move forward. The common ancestor of bugFix and master is no longer master (diagram on the right).

Step 3: Merge into master

A

masterB

CbugFix

D

Remote

A

masterbugFix

B

E

C

New merge commit E

Remote after!merge

D

gitk - show git graph

As we mentioned in the introduction, within the Git model it is the responsibility of the developer to merge their changes into the mainline. It would be remiss not to mention merge conflicts. If there are no conflicts, then you will be able to merge in your changes via a pull request as shown in the right diagram. However, it is recommended that you rebase on top of master, especially If there are merge conflicts. In the latter case, you will need to resolve the conflicts and then run ‘git rebase —continue’. We will explore the graphical underpinnings of rebase in a couple of slides.

Can we do better?

A

masterB

CbugFix

D We would like to modify the commit history to make it

appear as if bugFix was based on commit D all along!

Rebase to the rescue

❖ Rebase allows you to replay a series of commits on top of a new base commit. !

❖ Helps keep the commit history clean

Your changes were based off of commit A. Commit D was introduced in parallel. Rebase allows you to modify commit history to make it appear as if you were working on top of D all along!

Rebase in action

A

masterB

CbugFix

D

bugFix

A

D

C*

B*

git rebase master bugFix

B

C

\

master

Note that commits C and D have been supplanted by C* and D* in the right diagram. If bugFix was a shared branch, you would not want to rebase it on top of master since anyone who was working off of C or D would have the rug pulled out from under them. It is possible to recover from this by cherry picking any changes made on top of C/D into C*/D*. However, it is best to avoid such situations altogether.

Merge bugFix with master

A

D

EmasterbugFix

A

master

C*bugFix

B*

D

C*

B*Merging the rebased branch bugFix !into master. This merge is typically!

triggered in the code management tool! (Github, Stash, etc) after a pull request!

is approved.

Note: the merge from a feature branch to the mainline (master) is usually done with an explicit “—no-ff” flag which will create a merge commit even when a fast forward is possible. The diagram on the right explains visually how this policy helps keep commits in the mainline have a one-to-one correspondence with features.

Merge conflicts

❖ Situation: Conflicting modifications to a file that has changed since we checked it out!

❖ Two options: merge, rebase!

❖ On a private branch, it is recommended that you rebase. !

❖ On a shared branch, merge is the way to go.

Let us take a moment to appreciate that a merge conflict cannot be automated away. There is no way for the source control system to know our intention.

Changing the commit history

❖ “git commit —amend” rewrites the your last commit with the current changes instead of creating a new commit!

❖ Interactive rebase: git rebase -i!

❖ Swiss army knife of modifying history!

❖ Allows you to amend, squash, split, or skip commits as they're applied

Many roads, one destination❖ There are often multiple ways to accomplish a task in Git, for example:

git branch <branchName> git checkout <branchName>

git checkout -b <branchName>

git checkout -b <branchName> <remoteName>/<remoteBranch>

git branch --track <branchName> <remoteName>/<remoteBranch>

git fetch!git merge git pull

Lots of facades -- actions that can be executed using one (or a combination of) flag(s) in some command may be pulled out into their own command. If you get into a bind, there is most probably a way to recover from the situation. Do not hesitate to seek help! git-users mailing list

Give It a Try

Explore the topics discussed so far by creating a new Git repository. Let us assumed it has one file foo.txt with the contents “foo bar”. Person A changes it to foo bar bas in the user/personA branch and creates a pull request to merge this change in. Meanwhile, person B changes the contents of foo.txt to “food bazaar. This commit gets merged into master first. For the purposes of this exercise, personB can commit directly to master. Keep in mind that in a real-life scenario, the conflicting change will be typically introduced by the pull request for personB getting merged into master before that of personA). PersonA’s pull request now has merged conflicts and will need to be resolved using rebase.

Git Resources❖ Learn by playing: http://pcottle.github.io/learnGitBranching/!

❖ Atlassian tutorial: https://www.atlassian.com/git/tutorials/setting-up-a-repository/!

❖ Free CodeSchool course on Git: https://www.codeschool.com/courses/git-real!

❖ StackOverflow is a great resource: http://stackoverflow.com/questions/2706797/finding-what-branch-a-commit-came-from!

❖ Pro Git by Scott Chacon and Ben Straub: http://git-scm.com/book/en/v2

Closing thoughts

❖ Git is a powerful source control tool designed to maximize the efficiency of the developer. Take full advantage of it!!

❖ We’ve only explored the tip of the iceberg. May the power of Git be with you.