32
! How to store large binary files in git repositories

How to store large binary files in git repositories

Embed Size (px)

Citation preview

Page 1: How to store large binary files in git repositories

!

How to store large binaryfiles in git repositories

Page 2: How to store large binary files in git repositories

Storing large binary files inGit repositories seems tobe a bottleneck for manyGit users.

Page 3: How to store large binary files in git repositories

Because of it'sdecentralized nature,changes in large binaryfiles can cause Gitrepositories to grow bythe size of the file afterevery commit.

Page 4: How to store large binary files in git repositories

Luckily there aremultiple3rd party workaroundsthat try to solve theproblem.

Page 5: How to store large binary files in git repositories

Here are seven alternative approaches forhandling large binary files in Git repositories.

Page 6: How to store large binary files in git repositories

1. Git Annex

Page 7: How to store large binary files in git repositories

Git-annex works by storing the contents of filesbeing tracked by it to separate location.What'sstored in the repo, is a symlink to the to the keyunder the separate location.

Page 8: How to store large binary files in git repositories

In order to share the large binary files betweena team, tracked files need to be stored to adifferent backend.

Page 9: How to store large binary files in git repositories

Pros

• Supports multipleremotes that you canstore the binaries.

• Can be usedwithoutsupport from hostingprovider.

Cons

•Users need to learnseparate commands for

day-to-day work

Page 10: How to store large binary files in git repositories

2. Git Large File Storage(Git LFS)

Page 11: How to store large binary files in git repositories

In Git LFS, instead of writing large blobs to aGit repository, only a pointer file is written. Theblobs are written to a separate server using theGit LFS HTTP API. The API endpoint can beconfigured based on the remote which allowsmultiple Git LFS servers to be used.

Page 12: How to store large binary files in git repositories

Git LFS requires a specific serverimplementation to communicate with, and usesfilters, meaning that you only need to specifythe tracked files with one command.

Page 13: How to store large binary files in git repositories

Pros

•Github behind it.

• Ready binariesavailable tomultipleoperating systems.

• Easy to use.

• Transparent usage.

Cons

• Requires a customserver implementation

to work.

• API not stable yet.

• Performance penalty.

Page 14: How to store large binary files in git repositories

3. Git-bigfiles

Page 15: How to store large binary files in git repositories

Git-bigfiles makes life bearable for peopleusing Git on projects with very large files,merging back asmany changes as possible intoupstreamGit.

Page 16: How to store large binary files in git repositories

Git-bigfiles is a fork of Git, however, the projectseems to have been untouched for some time.

Page 17: How to store large binary files in git repositories

Pros

• If the changes were tobe backported, theywould be supported bynative Git operations.

Cons

• The project is dead.

• Fork of Git mightcause compatibility

issue.

• Only allowsconfiguring threshold offile size when tracking a

large file.

Page 18: How to store large binary files in git repositories

4. Git-fat

Page 19: How to store large binary files in git repositories

Git-fat works in a similar manner as git lfs.Large files can be tracked using filters in`.gitattributes` file. Large files are stored to anyremote that can be connected through rsync.

Page 20: How to store large binary files in git repositories

Pros

• Transparent usage.

Cons

• Supports only rsyncas backend.

Page 21: How to store large binary files in git repositories

5. Git-media

Page 22: How to store large binary files in git repositories

Git media is probably the oldest of thesolutions available. It also uses a filterapproach, and supports Amazon's S3, localfilesystem path, SCP, atmos andWebDAV asthe backend for storing large files.

Page 23: How to store large binary files in git repositories

Pros

• Supports multiplebackends

• Transparent usage

Cons

•No longer developed.

• Ambiguous commands(e.g. git update-index --

really refresh).

• Not fullyWindowscompatible.

Page 24: How to store large binary files in git repositories

6. Git-bigstore

Page 25: How to store large binary files in git repositories

Git-bigstore was initially implemented as analternative to git-media. It also works bystoring a filter property to `.gitattributes` forcertain file types.

Page 26: How to store large binary files in git repositories

Git-bigstore supports Amazon S3, GoogleCloud Storage, or Rackspace Cloud account asbackends for storing binary files. git-bigstoreclaims to improve the stability whencollaborating betweenmultiple people.

Page 27: How to store large binary files in git repositories

Pros

• Requires only Python2.7+

• Transparent usage.

Cons

•Only cloud basedstorages supported at

themoment.

Page 28: How to store large binary files in git repositories

Git-sym is the newest player in the field,offering an alternative to how large files arestored and linked in git-lfs, git-annex, git-fatand git-media. Instead of calculating thechecksums of the tracked large files, git-symrelies on URIs.

Page 29: How to store large binary files in git repositories

The benefits of git-sym are performance aswell as ability to symlink whole directories,though because of its nature, themaindownfall is that it does not guarantee dataintegrity.

Page 30: How to store large binary files in git repositories

Because of its nature, themain downfall is thatit does not guarantee data integrity. Git-sym isused using separate commands. Git-sym alsorequires Rubywhichmakes it more tedious toinstall onWindows.

Page 31: How to store large binary files in git repositories

Pros

• Performancecompared to solutionsbased on filters.

• Support for multiplebackends.

Cons

•Does not guaranteedata integrity.

• Complex commands.

Page 32: How to store large binary files in git repositories

!

Howhave you solved theproblem of storing largefiles in git repositories?