What is Git?

17 readers like this.
A maze

Opensource.com

Welcome to my series on learning how to use the Git version control system! In this introduction to the series, you will learn what Git is for and who should use it.

Read:

On the other hand, all the excitement and hype over Git tends to make things a little muddy. Can you only use Git to share your code with others, or can you use Git in the privacy of your own home or business? Do you have to have a GitHub account to use Git? Why use Git at all? What are the benefits of Git? Is Git the only option?

So forget what you know or what you think you know about Git, and let's take it from the beginning.

What is version control?

Git is, first and foremost, a version control system (VCS). There are many version control systems out there: CVS, SVN, Mercurial, Fossil, and, of course, Git.

Git serves as the foundation for many services, like GitHub and GitLab, but you can use Git without using any other service. This means that you can use Git privately or publicly.

If you have ever collaborated on anything digital with anyone, then you know how it goes. It starts out simple: you have your version, and you send it to your partner. They make some changes, so now there are two versions, and send the suggestions back to you. You integrate their changes into your version, and now there is one version again.

Then it gets worse: while you change your version further, your partner makes more changes to their version. Now you have three versions; the merged copy that you both worked on, the version you changed, and the version your partner has changed.

As Jason van Gumster points out in his article, Even artists need version control, this syndrome tends to happen in individual settings as well. In both art and science, it's not uncommon to develop a trial version of something; a version of your project that might make it a lot better, or that might fail miserably. So you create file names like project_justTesting.kdenlive and project_betterVersion.kdenlive, and then project_best_FINAL.kdenlive, but with the inevitable allowance for project_FINAL-alternateVersion.kdenlive, and so on.

Whether it's a change to a for loop or an editing change, it happens to the best of us. That is where a good version control system makes life easier.

Git snapshots

Git takes snapshots of a project, and stores those snapshots as unique versions.

If you go off in a direction with your project that you decide was the wrong direction, you can just roll back to the last good version and continue along an alternate path.

[Download our Git cheat sheet]

If you're collaborating, then when someone sends you changes, you can merge those changes into your working branch, and then your collaborator can grab the merged version of the project and continue working from the new current version.

Git isn't magic, so conflicts do occur ("You changed the last line of the book, but I deleted that line entirely; how do we resolve that?"), but on the whole, Git enables you to manage the many potential variants of a single work, retaining the history of all the changes, and even allows for parallel versions.

Git distributes

Working on a project on separate machines is complex, because you want to have the latest version of a project while you work, makes your own changes, and share your changes with your collaborators. The default method of doing this tends to be clunky online file sharing services, or old school email attachments, both of which are inefficient and error-prone.

Git is designed for distributed development. If you're involved with a project you can clone the project's Git repository, and then work on it as if it was the only copy in existence. Then, with a few simple commands, you can pull in any changes from other contributors, and you can also push your changes over to someone else. Now there is no confusion about who has what version of a project, or whose changes exist where. It is all locally developed, and pushed and pulled toward a common target (or not, depending on how the project chooses to develop).

Git interfaces

In its natural state, Git is an application that runs in the Linux terminal. However, as it is well-designed and open source, developers all over the world have designed other ways to access it.

It is free, available to anyone for $0, and comes in packages on Linux, BSD, Illumos, and other Unix-like operating systems. It looks like this:

$ git --version
git version 2.5.3

Probably the most well-known Git interfaces are web-based: sites like GitHub, the open source GitLab, SavannahBitBucket, and SourceForge all offer online code hosting to maximise the public and social aspect of open source along with, in varying degrees, browser-based GUIs to minimise the learning curve of using Git. This is what the GitLab interface looks like:

GitLab graphical Git interface.

Additionally, it is possible that a Git service or independent developer may even have a custom Git frontend that is not HTML-based, which is particularly handy if you don't live with a browser eternally open. The most transparent integration comes in the form of file manager support. The KDE file manager, Dolphin, can show the Git status of a directory, and even generate commits, pushes, and pulls.

Dolphin

Sparkleshare uses Git as a foundation for its own Dropbox-style file sharing interface.

Sparkleshare screenshot

For more, see the (long) page on the official Git wiki listing projects with graphical interfaces to Git.

Who should use Git?

You should! The real question is when? And what for?

When should I use Git, and what should I use it for?

To get the most out of Git, you need to think a little bit more than usual about file formats.

Git is designed to manage source code, which in most languages consists of lines of text. Of course, Git doesn't know if you're feeding it source code or the next Great American Novel, so as long as it breaks down to text, Git is a great option for managing and tracking versions.

But what is text? If you write something in an office application like Libre Office, then you're probably not generating raw text. There is usually a wrapper around complex applications like that which encapsulate the raw text in XML markup and then in a zip container, as a way to ensure that all of the assets for your office file are available when you send that file to someone else. Strangely, though, something that you might expect to be very complex, like the save files for a Kdenlive project, or an SVG from Inkscape, are actually raw XML files that can easily be managed by Git.

If you use Unix, you can check to see what a file is made of with the file command:

$ file ~/path/to/my-file.blah
my-file.blah: ASCII text
$ file ~/path/to/different-file.kra: Zip data (MIME type "application/x-krita")

If unsure, you can view the contents of a file with the head command:

$ head ~/path/to/my-file.blah

If you see text that is mostly readable by you, then it is probably a file made of text. If you see garbage with some familiar text characters here and there, it is probably not made of text.

Make no mistake: Git can manage other formats of files, but it treats them as blobs. The difference is that in a text file, two Git snapshots (or commits, as we call them) might be, say, three lines different from each other. If you have a photo that has been altered between two different commits, how can Git express that change? It can't, really, because photographs are not made of any kind of sensible text that can just be inserted or removed. I wish photo editing were as easy as just changing some text from "<sky>ugly greenish-blue</sky>" to "<sky>blue-with-fluffy-clouds</sky>" but it truly is not.

People check in blobs, like PNG icons or a speadsheet or a flowchart, to Git all the time, so if you're working in Git then don't be afraid to do that. Know that it's not sensible to do that with huge files, though. If you are working on a project that does generate both text files and large blobs (a common scenario with video games, which have equal parts source code to graphical and audio assets), then you can do one of two things: either invent your own solution, such as pointers to a shared network drive, or use a Git add-on like Joey Hess's excellent git annex, or the Git-Media project.

So you see, Git really is for everyone. It is a great way to manage versions of your files, it is a powerful tool, and it is not as scary as it first seems.

Seth Kenlon
Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time.

12 Comments

Hello.

Thank you for the article!

Another FLOSS (Affero GPLv3+) and free of charge Git hosting software is at https://rocketgit.com

Disclaimer: I am the author.

Bravo! thanks for the referral. I wish I'd known about this prior to writing this series of articles. Maybe I'll be able to sneak in a mention in a later revision. Thanks for providing a hosting platform that is free and open source. I've signed up and look forward to trying it out.

In reply to by Catalin(ux) M. BOIE (not verified)

Is it safe to use GIT like softwares for developing code of financial products? If we have to build something for cashoverflow.in

Yes, of course it's safe to use git for financial software. Any "risk" involved is in how you manage your own code, and of course, the encryption between server and clients (but that has nothing to do with git).

Obviously you should not publish private key files in your public git repository; to protect yourself against that, use a strict .gitignore policy, and force code review prior to each push.

.gitignore is quite simple; read more about it here: https://git-scm.com/docs/gitignore

Forcing a code review and sign-off is something you can set up git hooks to do; I'll be covering git hooks in a later article in this series, so check back!

In reply to by Avinash (not verified)

If you have to ask that question then you should in no way at all be developing anything to do with financial services.

In reply to by Avinash (not verified)

We all have to start somewhere. There are not stupid questions. And so on.

In reply to by imbldin (not verified)

You mentioned this was a series. When will you be posting further content to this series?

I've noticed that most people when starting out have difficulty with branching. Knowing when, why and how to branch and when to merge branches back into the master. Then finding answers to questions like what should the master branch hold, should people develop on the master branch, how many people should have merge permissions.

Git is awesome, but simple tutorials need to be written that gives new users some help. I've been using Git for about two years now, and I feel like I'm still a basic Git user because everyone I've ever collaborated with has been a new user as well.

Agreed. Branching is key when you work on a project with many collaborators; but can also be useful in a "standalone" project where you want to experiment with changes without impacting on your primary (master) code.

In reply to by coryhilliard

Branching is important! but this introduction covers cloning. I cover branching later in the series (third article, according to fgrep).

In reply to by Derek (not verified)

You mention git annex and Git Media for large files. GitHub also offers git lfs -- git Large File System.

I didn't know about that one. Thanks!

Git-annex and git-media are discussed in (if my memory serves) the final article in this series. I won't have time to review git-lfs but it's good to know about, and I'll try to get a mention of it in the article.

In reply to by Duane Murphy (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.