Excel Hacks: Calculating Oligo Temperature with the Wallace Rule

Calculating oligonucleotide melting temperatures using the Wallace-Itakura formula (2*(number of A's and T's) + 4*(number of G's and C's)) in Excel is a piece of cake. I rarely use it myself as I typically use nearest neighbor with SantaLucia thermodynamic parameters for oligo design, but on occasion it does come up. Assuming your oligo sequence is in A1, this formula will calculate the Wallace TM:
=2*(LEN(A1)-LEN(SUBSTITUTE(SUBSTITUTE(A1,"A",""),"T","")))+4*(LEN(A1)-LEN(SUBSTITUTE(SUBSTITUTE(A1,"G",""),"C","")))
The formula counts the numbers of A's and T's by making a new oligo sequence with A's and T's removed. It then compares the length of the original oligo to the new one - the difference is the number of A's and T's. Counting G's and C's is done in the same manner. Any base other than A,C, G or T will be ignored in the calculation.

Excel Hacks: Reverse Complement a Nucleotide Sequence

Reverse complementing a nucleotide sequence in Excel turns out to be really ugly - in fact, this may be one case where it's better to do some cut and pasting into a web tool or finally go pick up that Perl book that's been sitting on your shelf. However, if you’re desperate and absolutely have to do it in Excel, here's how you can reverse complement short (~30 base) nucleotide sequences. 



Complementing


Let's break reverse complementing into its two steps - reversing the DNA sequence and then taking its complement. We'll start with complementing:

= SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( A1, "A", 1), "C", 2), "G", 3),"T", 4), 1, "T"), 2, "G"), 3, "C"), 4, "A")

As in previous posts, we assume the sequence to be complemented is in cell A1.

This rather lengthy formula is performing a two-step substitution. First, nucleotides are converted to numbers, e.g. “A” to “1”, “C” to “2”, etc. Next, the numbers are then replaced with the complementary nucleotide, so “1” goes to “T”, “2” to “G” and so on. Doing the substitution in two steps allows us to distinguish which bases have been complemented from those that have not. The formula works correctly for nucleotides “A”, “C”, “G”, “T” and “N”. Other degenerate IUPAC codes could be handled, but we’ll leave this as an exercise for the reader.

Reversing



Surprisingly, reversing the DNA sequence turns out to be even trickier. Most programming languages have a function to reverse text, but not Excel. Breaking out a macro in Excel might be worthwhile since writing a little Visual Basic to reverse text isn't too bad. Unfortunately, this is not possible for folks with Macs using Office 2008 since macro functionality was removed.

For a general method to reverse our sequence, our options are limited and ugly. Perhaps one of the less awful methods is repeated use of the MID function. Assuming that our complemented nucleotide sequence is in cell B1, we would do something like this:

= MID(B1,30,1) & MID(B1,29,1) & MID(B1,28,1) & MID(B1,27,1) & MID(B1,26,1) & MID(B1,25,1) & MID(B1,24,1) & MID(B1,23,1) & MID(B1,22,1) & MID(B1,21,1) & MID(B1,20,1) & MID(B1,19,1) & MID(B1,18,1) & MID(B1,17,1) & MID(B1,16,1) & MID(B1,15,1) & MID(B1,14,1) & MID(B1,13,1) & MID(B1,12,1) & MID(B1,11,1) & MID(B1,10,1) & MID(B1,9,1) & MID(B1,8,1) & MID(B1,7,1) & MID(B1,6,1) & MID(B1,5,1) & MID(B1,4,1) & MID(B1,3,1) & MID(B1,2,1) & MID(B1,1,1)

The formula constructs the reverse sequence base-by-base up to 30 bases. Make certain that your sequence to reverse is 30 or fewer nucleotides, otherwise your sequence will be truncated without warning. You could extend the function to handle larger sequences, but clearly this method is only practical for short sequences.

Excel Hacks: Calculating GC%

As a quick hack, you can calculate the GC% of a DNA sequence in Excel without breaking out the macros. Here’s how you do it assuming your sequence is in Excel cell A1.

= (1 - LEN(SUBSTITUTE(SUBSTITUTE(A1,"G",""),"C",""))/LEN(A1)) * 100

Although the formula is a bit daunting, the principle is straightforward. First, use the SUBSTITUTE command to get a version of your sequence with all G’s and C’s removed. Determine the length of this new sequence using the LEN command and compare this to the length of the original sequence. This gives you the fraction of sequence that is NOT a G or a C. To get the GC%, subtract this fraction from 1 and multiply by 100% to go from a fraction to a percent.

This will probably satisfy most folks, but it’s not perfect. For an accurate calculation, we need to take into account ambiguous IUPAC base codes. For example, a "N" should be treated as 0.5 of a G-C base. A "D" should count as 0.33. Although it’s possible to extend the hack above to handle these cases, it might be a good time to break out Perl or Python.

Excel Hacks: Plate Well to Number Conversions

Here’s a simple but useful transformation for going from a well “address” on a plate (e.g. “B5”) to a numbered location. For a 96 well plate, the numbered location will be a number from 1 to 96. Typically, this numbering goes first from left to right across the 12 columns and then top to bottom down the 8 rows. So, A1 would map to 1, B1 would map to well 13 and H12 would map to 96.

Let's assume the plate address you want to convert is in the Excel cell A1. The conversion is done in Excel as shown below:

= (CODE(LEFT(A1,1))-65)*12 + RIGHT(A1,LEN(A1)-1)

The formula assumes the well format is one character denoting the row followed by one or more integers which denote the column. The left half of the formula before the '+' converts the character into an integer using the CODE function. Since "A" has a code of 65, we subtract this off before multiplying by the row length. The portion of the formula after the '+' simply extracts the integer portion of the well address and adds it to get the well number. The formula works with padded (A01)and unpadded (A1) representations.

To go back from a numbered address to the letter-number format, use the following:

= CHAR(INT((A1-1)/12)+65) & (MOD(A1-1, 12)+1)

The portion of the formula left of the '&' handles the row calculation. The CHAR function performs the opposite duty of CODE, taking a number and converting it back to a character. The portion to the right of the '&' calculates the integer (column) portion of the well address.

It's quite easy to do 384-well or other formats. Just replace the red 12 with the correct number of columns. In the case of a 384-well format, you'd replace it with 24.

It is less common to see plate wells converted into numbers from top to bottom and then left to right, although it does happen. In this case, B1 maps to well 2 and H2 to 16 for a 96 well plate.

The conversion from plate well to number in this case is:

= (CODE(LEFT(A1,1))-65) + 1 + (RIGHT(A1,LEN(A1)-1)-1)*8

And to go back:

= CHAR(MOD((A1-1),8)+65) & (INT((A1-1)/8)+1)

Note that instead of using the number of columns (12), we're now using the number of rows (8). It's left as an exercise to the student to tweak the formula for 384-well formatted plates ;)