You are currently browsing the archives for the RCS tag.

The Top 5 Things I’ve Learnt about Git

January 31st, 2014

During the last couple of years, internally we’ve moved over to using Git as our Revision Control Old_Git_Wit___RGB___small_2System (RCS). It’s been an interesting exercise, especially where, like me, you’ve come from a traditional model (such as subversion or even back to good old SCCS). I’m sure you’ve all got your own “top 5” and I don’t necessarily expect you to agree with me, but here’s my key learning points:

#1 “Branch always, branch often”
At the outset this was certainly the biggest mindset change*. In many RCSs, branching is particularly “heavyweight” and merging branches (and branches of branches, and branches… you get the gist) can be, to say the least, challenging. So you could end up being counter-productive by trying to avoid branching unless absolutely necessary.

It couldn’t be more different using Git, and once I’d “got it”, then it is an instinct to always work on a branch. I guess it all fits in with the aim to keep work cycles too much shorter timeframes, so branching and merging must be trivial. There is an excellent tutorial to help learn branching on Git.

#2 “Don’t ignore your .gitignore
You know you’re going to need it, so do it before you start any real work. It’ll just save you a whole load of tidying up later on (ok so maybe this is an age thing? don’t procrastinate!). There are even a language specific .gitignore templates at https://github.com/github/gitignore, so no excuses.

It’s also worth getting to know the basics of the pattern matching that can be used in the .gitignore file, they’re well documented.

#3 “GitHub is not Git”
Maybe it’s my onset of senility, but initially it wasn’t exactly clear where Git ended and GitHub started. Most of the early stuff I looked at to help understand and learn Git almost always used GitHub to host your project. In hindsight it’s all obvious (well of course it’s just a web-based hosting service for software development projects that use Git), but at the time I can say my understanding was a little “fuzzy” around the edges.

Two things really helped; first the excellent book Pro Git written by Scott Chacon which you can find online at http://git-scm.com/book. I found it so useful to I also bought the eBook (Kindle) version.

Second, moving to BitBucket away from GitHub. In fact we use both, GitHub for our publicly hosted projects and BitBucket for internal work. This just made it plainly obvious where Git ended and GitHub/BitBucket started.

#4 “Quickly learn how to undo”
There are many ways to mess it up, if you really try. Committing without adding, modifying files without branching (see #1), committing unwanted files (see #2), not pushing changes (see #3) and the list goes on (and I’ve done them all).

Be assured, you can always fix it, from a simple commit-amend to, probably one of the coolest things, being able to create a branch from a stash. But I’d highly recommend spending a bit of time understanding unstaging as opposed to unmodifying a file.

Using Git is very much about understanding workflow and typically messing up means you’ve not quite got your workflow model running smoothly. The nice folks at Atlassian have put together a set of guides on workflow.

#5 “Initially do all your work on the command line”
For Linux people (and hopefully most programmers on a Mac), this doesn’t need saying as why would you do it any other way? However for all those Windows-based people (and in the embedded world this is the majority) then there is a tendency to use graphical tools. Both GitHub and BitBucket offer graphical frontends to managing Git local/remote repo’s but without understanding the underlying Git model (which you only really get from using it at the command line) then it’s easy to get lost.

I didn’t find the GitHub Windows client tool especially friendly, whereas SourceTree from Alassian I found a more intuitive client. Nevertheless I’d still rather work at the shell.

So that’s my top 5; What have I missed (remember you’ve got to drop one to add one)?

What would I do differently? certainly I’d read “Pro Git” cover-to-cover much sooner than I did. Otherwise, as usual, just throw yourself in and have a go.

Now what was that someone said about Mercurial being better…

* Actually the initial challenge was getting used to say “Git” outside of its usual context in the UK.

Revision Control

May 23rd, 2011

What is Revision Control?

Revision control is the management of multiple revisions of the same unit of information. The focus is on controlling access to the artefact; and recording the history of changes.

Revision Control is variously known as version control, source control, source code management and several other titles.

Revision control has its roots in the management of engineering blueprints and paper documents. Today, any practical application of Revision Control requires the use of specialist software tools.

Artefact Identification

The core to revision control is the artefact. An artefact is a unique unit of information that may change over time (or not – some artefacts are deliberately unchanging – see below). Every revision-controlled artefact is identified by a unique name made up of:

Artefact Name

The artefact’s name represents a human-readable identifier. This is typically how the artefact is referred to during the development process. The Artefact’s name is constant – that is, it does not change.

Revision identifier

The revision identifier is used to track the history of the information. It is usually a number (e.g. 1.1); the highest number is the most recent version (revision)

image

Figure 1 – The artefact identifier reflects the name of the unit of information and its history

As the artefact is modified (revised) the revision identifier is incremented and a new entity is created (see Figure 1). Thus, artefact MyDoc v1.0 is a different entity to MyDoc v1.1 since it contains different contents. One of the principles of revision control is that this process is hidden from the user, and they only see the most recent version, unless they explicity choose to access the artefact’s history (by accessing a previous revision).

Problems with ‘file system’ Revision Control

The simplest form of revision control is to use the directory system on your computer. Each revision of a product is kept as a separate directory on the disk (Figure 2). This is a simple technique but is rarely very effective.

image

Figure 2 – Storing file revisions using a file system is rarely effective

Since few file systems have a history facility built in, each revision must result in the storage of multiple (complete) copies of each artefact.

Normally, using a file system as a repository implies using a networked file server. Most modern file systems will recognise that a file is open for edited and restrict access (see File Locking, below). However, to improve responsiveness most users will make a local copy of their ‘working files’. This can lead to a problem known as ‘Latest-write wins’. This means, if multiple developers copy the same file to modify it (usually in different ways) then each copy their modified file back to the server this can lead to the loss of any previous changes; and only the last developer to save has their revisions included. Clearly, to control access to each file requires careful management to ensure consistency. Such systems, requiring process and procedures to retain consistency are easy to abuse, particularly in the heat of the rush to delivery.

Another problem comes when artefacts are shared between systems; especially if each product will make different demands, and require different modifications, to the artefact.

Using your file system for source control is, at best, adequate for one-man projects.

Revision Control Systems

For anything more than trivial revision control (for example, with more than one developer) then a Revision Control System (RCS) is required.

A Revision Control System (also known as a Version Control System – VCS) is a piece of software that acts as an archive for artefacts. The RCS stores the latest version of the artefact and all revisions. The developer can take latest revision (often called the ‘tip’) or any named revision.

Typically, the RCS stores first version of artefact, plus changes between one revision and the next (known as ‘deltas’). Artefact version numbering is usually automatically performed by the tool.

Access control

To ensure integrity of artefact revisions access control is required. Typically this involves locking the file against changes.

Acquiring the lock is referred to as Checking Out.

Committing the change (releasing the lock) is referred to as Checking In.

There are two common methods for achieving access control: File locking and Revision Merging

File locking

In a file locking system only one developer has write access to the artefact (Figure 4). Other developers will have read-only access to the current (stored) version. The file is only available again once it is checked back in.

image

Figure 4 – File locking allows only one developer at a time to modify an artefact.

File locking avoids complex merges due to large-scale changes since only one developer has access to the file and the ‘merges’ are essentially the new changes made by the developer.

Because a file may be checked out for a significant time developers may be tempted to simply circumvent the system

Revision Merging

Most systems allow multiple check-outs of the same file (see Figure 5).

If the file is checked-out multiple times the first developer to check-in always succeeds.

image

Figure 5 – Revision merging allows multiple check-outs on an object

Subsequent check-ins must merge their changes into the current revision. For simple merges this may be performed automatically by the RCS software. More complex merges may (and typically do) require human intervention.


Revision Control System Configurations

Revision control systems are designed to allow multiple users to access and modify artefacts from any location, either locally or across a network. There are two basic configurations of RCS – centralised and distributed. Each has its own merits; and the choice of RCS depends on the type of project being developed.

Centralised database systems

In a centralised system there is one master reference copy of all artefacts (Figure 6). Clients access the artefacts by making copies of a subset of central archive. This subset is a ‘view’ on the repository known as a Workspace. The workspace acts as an additional form of access control. The client is free to modify any artefact within their workspace, but may not modify anything not in their workspace (in fact, it should not be visible to them). Workspaces allow CM Managers to restrict access of staff to only the artefacts that are relevant to them.

image

Figure 6 – In a centralised system clients view a subset of the central repository

Centralised systems are best suited to geographically-close, commercial development projects, keeping all the company’s artefacts (which comprise their intellectual property) in a central location (for ease of back-up, etc.).

Distributed database systems

In a distributed system each client’s workspace is a bona fide repository. Each client has a full copy of the archive, complete with all revisions (Figure 7). The client is free to modify the database as they see fit.

image

Figure 7 – Multiple versions of the archive exist in a distributed system

Copies of the archives are kept synchronised with periodic updates (known as patches) sent between each of the RCSs.

Distributed database systems are widely used for open source software development. They are well suited to developments that may be geographically distant and being modify according to widely differing requirements.

%d bloggers like this: