What is Deep Learning

Deep learning is something of a buzz-word nowadays. It crops up everywhere from recommending products on websites, to image recognition and fraud protection in banks. But what is deep learning? What does it mean?

Deep learning is something of a rebranding of a concept called Neural Networks. The idea behind neural networks was that we could mimic the structure of our brain to teach computer programs to learn. We can have lots of weighted connections between variables, just like synapses in our brain. Over time we can strengthen connections that are important, as we’re presented new evidence.

Neural Networks

The image below shows the concept behind a simple neural network. This particular network has 3 inputs. Let’s say that the 3 inputs are the time of a credit card transaction, the time zone in which the transaction took place, and the amount of money transferred.

Our network has a layer of weights in the middle, this is what we call a “hidden layer”. Our hidden layer has 4 “nodes” and each node combines different variables.

From the output of our neural network we want to know the probability that the transaction is fraudulent. Our network has two outputs. One output represents the chance that the transaction is fraudulent. The other represents the chance it was genuine.

To train the network we need to gather a large amount of data, containing time, time zones, and amounts for the inputs, and whether or not the transaction was fraudulent.

Once the network is trained we can take information about a new transaction and make an educated guess about whether or not it is fraudulent.

Problems

The network that we specified in the previous section wouldn’t work very well. We only have one hidden layer, and the information that we have coming in is probably not even enough for a human to deduce whether a transaction was suspicious.

Also, the idea that we’re trying to mimic synapses is today seen as old-hat and a little sci-fi. What actually tends to be going on here is that we use well-known mathematical techniques like matrix multiplication. We still use the terminology weights, but these generally refer to matrix elements rather than synapse strengths.

Improvements

What could we do to improve? We could add more inputs. Perhaps the average transaction amount for that credit card? What about the latitude and longitude of the current transaction vs the latitude, longitude and time of the last transaction? If someone spends $100 in one city, and 1 minute later spends $100 100 miles away then something is probably wrong.

Now we have 9 inputs, so we’re going to need more nodes in our hidden layer to manage these all these combinations right?

Well that used to be the paradigm, but actually, keeping a single hidden layer, or having a small number of hidden layers to manage combinations, doesn’t work too well. That is what is known as a “wide” network.

Deep Learning

So this is where deep learning comes in. It turns out that it works much better to have lots of hidden layers with a smaller number of nodes than it does to have few hidden layers with lots of nodes.

It’s quite a simple difference, but the improvements in performance of deep neural networks compared to wide neural nets is huge.

To use our network with 9 inputs to generate a good indication of whether something is fraudulent or not we’d probably have to build and train a network that looked like the image below.

There is still no guarantee that our network will work well. We still need to ensure that we have good training data, the more the better. In the age of “big data” this is becoming much less of a problem.

Even given good data, we still need to pick the features that we use as inputs. That can be a hard task in itself, sometimes equally as hard as building and testing the network.

If you’d like to know more there is a great website by Google with an interactive network builder. The website uses TensorFlow which I used to build intelli.bet. There are some concepts we’ve not covered, such as activation functions, but it’s fun to play around with.

New Android App To Control Classroom Chatter

I’ve not blogged for quite a while now, despite saying I would make an effort to blog more often. Anyway I decided to write a post to let people know about a new app that I’ve made.

My sister works as a Math teacher in a secondary school, teaching kids aged 11 – 16. She sometimes struggles to keep the class quiet, so she had the idea of sounding an alarm on her phone when the noise level got too high.

I set to work and a week or so later I had put together an app that did exactly what she wanted. The app is called Quiet Classroom, you can download it from the play store. So far it’s been fairly well received. I linked to the app on reddit where it seemed to have a fairly decent response, 90% of people liked it, so I was pretty happy with that.

This was the first app that I’d tried charging people for. I decided to charge £0.69 for the app to recoup the $25 developer signup fee and also to try and make a little profit as I’d spent a good chunk of time working on it.

This was the first app I’d released in a couple of years, and I was disappointed with the changes to the Android market. When I released my first ever app there was at least a “new releases” section that showcased some apps, my first app got 250 downloads in the first day because of this. However, now there is no “showcasing” and it seems really difficult to get any downloads at all.

For all of the 90% likes and positive reviews to-date, the app has only had 14 downloads. I don’t know how this stacks up to other apps, or apps for iOS etc, but I was really disappointed by this. I was wondering what techniques other Android developers use? It would be good to perhaps start a discussion about this in the comments section.

A Ruby Gem for Parsing Finktank Soccer Predictions

I’ve recently been learning Ruby, as I have an idea for a web app I’d like to make which I think would be well suited to the Rails framework. In order to become familiar with Ruby on Rails I’ve completed the Rails for Zombies course, and enrolled on an Edx course.

I also did a short tutorial on using APIs with Ruby. This gave me the idea of generating an API for a Soccer prediction website run by a group known as the Finktank. The predictions of the Finktank have been consistently more accurate than the bookmakers’ odds, allowing the opportunity for profit.

Deciding which games to bet on based on the Finktank’s odds is a cumbersome process as they are listed as percentage probabilities and don’t link to bookmakers, so you have to do some tedious calculations. That’s why I wanted to do something to automate the system.

To do this I decided to take a look at the site’s source code. In there they call a perl script which runs on the server and generates a json of game probabilities. The script can be passed various division Ids, so you can control which games you get predictions for.

Now to the code. This is really simple to do, all we have to do is send a request to the dectech server and parse the json that they send us. So here it is:

require ‘open-uri’
require ‘json’

def get_all_predictions()

  game = open(“http://www.dectech.org/cgi-bin/new_site/GetUpcomingGames_r.pl”)
  body = game.read
  hashes = JSON.parse(body)
  predictions = hashes[“games”]
  return predictions

end
#
def get_league_predictions(league)

  game = open(“http://www.dectech.org/cgi-bin/new_site/GetUpcomingGames_r.pl?divID=#{league}”)
  body = game.read
  hashes = JSON.parse(body)
  predictions = hashes[“games”]
  return predictions

end

def get_prem_predictions()

  return get_league_predictions(0)

end

def get_champ_predictions()


  return get_league_predictions(12)

end

def get_l1_predictions()

  return get_league_predictions(13)

end

def get_l2_predictions()

  return get_league_predictions(14)

end

def get_spl_predictions()

  return get_league_predictions(1)

end

There you go, you can now get an array of hashes that contains all the predictions for the upcoming two weeks. Each hash has the following format:
[“home”:<home team>, “away”: <away team>, “homeID”:<home integer>, “awayID”:<away integer>, “date”:<match date>, “expGoalDiff”: <expected goal difference (positive in favour of home, negative in favour of away)>, “hpct”:<home percentage win probability>, “dpct”:<draw probability>, “apct”:<away win probability>]

If you want to use this code luckily it’s really easy as I posted it on www.rubygems.org, so you can simply do gem install finktank. Once you’ve done that you can very quickly test it by running this file:

require ‘finktank’

puts get_l2_predictions

And if you run this some time before October 12th, you’ll see that the might Morecambe FC have only a 33% chance of a win away at Bury this weekend!

Write Your First Rails 4 App

I’ve recently been trying to learn Ruby on Rails, as I’ve got an idea for an app that I’d like to develop. I completed the excellent Rails for Zombies series of tutorials, this gave me a good grasp of models, controllers, views and routes, however I wasn’t sure how to stick all these things together from scratch.

After a lot of Googling I managed to put together the steps required to make a “Hello World” type app. I think a lot of tutorials have been made obsolete by updates to Rails, but this one will work as long as you’re using Rails version 4.

The first step is to install Rails. I used Ruby Version Manager to install Rails on my Mac, there are pretty good instructions on that website to get up and running quickly.

We’re going to make a really simple website where anyone can come along and write a message, and all the messages that have been left can be read.

To generate a new app – which we’re calling message_list – run the following command: rails new message_list. This will create a directory called message_list, next you should cd message_list, and then rails server.

If you point your browser to localhost:3000 you should see a Rails test page if everything has worked properly. If there are any problems at this stage it’s probably best to got back to the RVM page and check your installation.

Now we make our model, it’s going to be called a message. We do this by typing rails g model message. This will make us the following files: app/models/message.rb and db/migrate/<date>_create_messages.rb.

Next, edit db/migrate/*_create_messages.rb file, to look like this:

class CreateMessages < ActiveRecord::Migration
  def change
    create_table :messages do |t|

        t.string :name
        t.string :content

      t.timestamps
    end
  end
end

Now we need to build the database, cd to the top directory of the app, and type:

rake db:create
rake db:migrate

Run rails console and then run these commands:

mes = Message.new
mes.attributes={ :name => “James”, :content => “This is a test”}
mes.save
exit

Now edit app/models/message.rb so it looks like this:

class Message < ActiveRecord::Base

        validates :name, :presence => true;
        validates :content, :presence => true;

end

Now you’re done with the model we need view and controllers. To get these we run this command from the top directory of the app:

rails g controller messages

Next, edit app/controllers/messages_controller.rb. We’re going to add functions index – for showing all the messages and new and create for writing new messages. These functions should of course have corresponding view files, which we’ll get to later.

Here is what messages_controller.rb should look like:

class MessagesController < ApplicationController

        def index
                @posts = Message.all
        end

        def new
                @message = Message.new 
        end

        def create
                @message = Message.new(mess_params)
                @message.save
                redirect_to ‘/’
        end

       def show
                @mess=Message.find(params[:id])

                respond_to do |format|
                        format.html
                end               

        end

 private
 def mess_params
    params.require(:message).permit(:name, :content)
   end
end

We also have to add an additional function to permit the the create function to access the parameters necessary to formulate a message. Apparently this is new to Rails 4.

Now the Model and Controller is in place, let’s write the views.

We need two files, app/views/messages/index.html.erb and app/views/messages/new.html.erb. In index.html.erb we need the following:

<html>
<h1>Leave A Message!</h1>
<%= link_to “New Message”, new_message_path %>
<h1>All Posts:</h1>
<table>
<tr>
        <th>Name</th>
        <th>Post</th>

<% @posts.each do |post| %>
<tr>
        <td><%= post.name %></td>
        <td><%= post.content %></td>
</tr>
        <% end %>
</table>
</html> 

And in new.html.erb we need this:

<html>
<%= form_for :message, url: messages_path do |f| %>
  <p>
    <%= f.label :name %><br>
    <%= f.text_field :name %>
  </p>
  <p>
    <%= f.label :Message %><br>
    <%= f.text_area :content %>
  </p>
  <p>
    <%= f.submit %>
  </p>
<% end %>
</html>

So we have Models, Controllers and Views, now we just need Routes to complete the app. In config/routes.rb we need to add the following:

root :to => “messages#index”
resources :messages

Now you have a complete Rails app, run rails server and go to localhost:3000. You should see the message that we wrote before on the console. There will also be a link to write a new message. Click that link, write out a message and click submit, and you should go back to the list of messages and see your message!

Hopefully that’s brought a few things together and you can see how to pull together the theory you learnt to build an app from scratch.

Makefile Tutorial

Introduction – Makefile Tutorial

I thought I’d write a brief Makefile Tutorial, as this was a subject I had little understanding of while at university, and it wasn’t until I began my current job that I realised how powerful they were.

The main use of a Makefile is to simplify the process of compiling large multi-file projects. However, this is not their only use. They can also be used in a similar way to shell scripts, to perform other tasks for you.

Makefiles generally consist of targets, dependencies, commands and variables. A Makefile is generally in the following format:

VARIABLE=something;
target: dependency1 dependency2...
[tab]command to make target

Note: There must be a tab before the command! Spaces are not adequate, you will get mysterious errors if you don’t include a tab.

Create a fresh directory somewhere and create a file called “Makefile”, then copy in the following commands:

NAME=James

all: hello myname

hello:
[tab]echo “Hello”

myname:
[tab]echo “My Name Is “$(NAME)

mynamedep: hello myname
[tab]echo “This was written using dependencies”

Now, if you run make you will see the commands in the Makefile and the result of their execution. There are three rules here, one for the target all,one to target hello,and one to target myname.

You can run make and supply a specific target to just run that command, for instance running make myname will execute the rule to the target myname.

If you don’t specify a target then it will run for target all. If you don’t specify a target called all, it will work on the target you specify at the top of the Makefile. There are two targets that make use of dependencies, these are all, and mynamedep.

You can list a dependency next to a target and make will check if the dependency target needs to be built before it can continue. This comes in useful for compiling later, the above example is a little contrived.You will notice that there is also a variable being used, much like you might see in a shell script, however in Makefiles these variables are braced in parentheses when they are referenced. I’ll talk more about variables later.

Compiling With a Makefile

Suppose that you had two files, main.c and functions.c. The first file main.c contains only the “main” function, but inside main, functions from functions.c are called. How would we write a Makefile for that?

Well we need to compile functions.c to get our object file functions.o, and then compile main.c to generate main.o, and finally we need to link functions.o and main.o to create ourprogram.

A really simple file would look like this:

#ourprogram needs main.o and functions.o
ourprogram: main.o functions.o
        gcc main.o functions.o -o ourprogram

#main.o is the target and needs main.c and functions.o
main.o: main.c functions.o
        gcc -c main.c

#functions.o is the target and depends on functions.c
functions.o: functions.c
        gcc -c functions.c

The above Makefile would work just fine, when we call “make” our code would build, assuming there were no compiler errors. However, if we wanted to tweak this Makefile it would be kind of laborious – say we wanted to use a different compiler we would have to find and replace the instances of gcc.

Introducing Variables


We could use variables to make this easier to maintain:

CC=gcc
EXEC=ourprogram

#ourprogram needs main.o and functions.o
$(EXEC): main.o functions.o
        $(CC) main.o functions.o -o $(EXEC)

#main.o is the target and needs main.c and functions.o
main.o: main.c functions.o
        $(CC) -c main.c

#functions.o is the target and depends on functions.c
functions.o: functions.c
        $(CC) -c functions.c 

Now if we want to change the name of our program we can change the variable EXEC, or we can change the compiler by just editing the CC variable. We could also have written the line $(CC) main.o functions.o -o $(EXEC) as $(CC) main.o functions.o -o $@. The $@ variable is a special variable that references the target name.

We can also store more than one thing in a variable, and make will iterate through. For example, we could store all of the objects we need in a variable:

CC=gcc
EXEC=ourprogram
OBJECTS=main.o functions.o

#ourprogram needs main.o and functions.o
$(EXEC): $(OBJECTS)
        $(CC) $(OBJECTS) -o $@

#main.o is the target and needs main.c and functions.o
main.o: main.c functions.o
        $(CC) -c main.c

#functions.o is the target and depends on functions.c
functions.o: functions.c
        $(CC) -c functions.c 

Pulling it Together


Now we are beginning to see how Makefiles are so powerful. They can do other really cool stuff like working out which file to compile to generate the correct object. This means that we can replace the final two rules with a single rule, if we add another variable containing our source files, and one more containing compiler flags:

 CC=gcc
EXEC=ourprogram
OBJECTS=main.o functions.o
SOURCES=main.c functions.c
CFLAGS=-c

$(EXEC): $(OBJECTS)
        $(CC) $(OBJECTS) -o $(@)

$(OBJECTS): $(SOURCES)
    $(CC) $(CFLAGS) $*.c -o $@

 

There we have a complete Makefile that can easily be extended to incorporate more files, change the name of the executable, change the compiler, or change the compiler flags.

 

I hope that this has been a comprehendible tutorial and that you’ve managed to follow along. I remember when I was new to Linux that tutorials could quickly go out of my depth and look a little daunting. There are plenty more Makefile tutorial sources, but there is one book in particular I’d recommend if you are keen to learn more, 21st Century C.

Create a Simple Bash Script GUI

Introduction

I’ve been looking for a tool to create a Bash Script GUI recently. I wanted to find something that wouldn’t introduce too much more coding but add a professional front-end.
I came across dialog, it’s a tool that you can use in your scripts to do just what I wanted to do. Here is a simple “Hello World!” code that you can use:
dialog –title ‘Message’ –msgbox ‘Hello, world!’ 5 20
 
You can create menu pages, checklists, yes-no boxes, input-boxes and much more. 
 
To install it, all you need to do on Ubuntu is sudo apt-get install dialog. You can also install it on Mac OSX – if you install home brew – using brew install dialog.

Example: Create a Bash Script GUI

There are a couple of things you need to know about dialog to get a simple app running. You get data back from the GUI by either reading stderr, or from dialog’s return value. For instance the dialog –menu page will return 0 if the user presses Ok, 1 if the user presses Cancel, or 255 is Esc is pressed. If the user selects Ok, the number of the selected item will be written to stderr.
This is best illustrated by a very simple example app. This script doesn’t have much functionality but should illustrate the principles required to create a Bash script GUI:
#!/bin/bash
 
reader=/tmp/dialog-reader
 
 
while [[ 1 ]]; do
dialog –cancel-label “Quit” –menu “Choose one:” 10 30 5 1 “Enter Name” 2 “Something Else” 2>$reader
retval=$?
 
if [[ $retval != 0 ]]; then
exit 0
fi
 
 
choice=$(cat $reader)
 
if [[ $choice == 1 ]]; then
 
dialog –inputbox “Enter your name:” 8 40 2>$reader
name=$(cat $reader)
dialog –infobox “Your name is ${#name} characters long” 10 50
sleep 2
 
 
else
 
dialog –infobox “Unfortunately I can’t think of another example.” 10 50
sleep 2
fi
 
done
 
 
If you run this script you will see a menu page like this:
 
 
 
Here you can see that stderr – file descriptor number 2 – is redirected to a file in /tmp. That is how the menu and input boxes are read. The return value of the menu page is checked to see if the user wants to quit. On the main page it should be possible to cancel or press escape to quit the script.
 
Another compatible library for developing GUIs is Xdialog. It is syntactically similar to dialog, but shows X windows instead.

Playing with an Arduino

I’m making a conscious effort to try and blog more regularly, so I thought I’d write up the stuff I learnt playing around with an Arduino Uno this weekend. My boss bought me this some time ago and I’ve kind of ignored it until I thought it might be useful for something work related. I was really impressed, these things are cool, so I just thought I’d add to the massive amount of resources out there with my experiences.
There are tons of projects out there that people have shared online. If you need some ideas about what can be done then check this website out. Also there’s an interesting TED talk from the inventor of the Arduino board who talks about some of the stuff people have done with them.
Getting started was really easy, I just needed a USB cable that I got from an old external hard-drive, and a 70 MB download. Within about 5 minutes I had one of the arduino’s LEDs blinking. I can’t provide a better guide than the one on the official site.
Anyway, I am interested in using the digital reading/writing capabilities for my work project, so I just thought I’d experiment with these, along with the analog reading feature. The simplest test I could think of was writing a digital signal out, converting this to analog, and then reading this back in through the analog reading port.
To make a digital to analog converter you just need some resistors and a breadboard. You can use something called an R-2R ladder to convert the signal to analog, here’s the circuit diagram:

 I only had a few resistors handy, so I made a 2 bit DAC. I used pins 2 and 3 of the digital connectors to output the digital signal, and pin 0 of the analog connector to read the analog input.

The code looks like this:

int analogPin = 0;

int val =0;

int lsb = 2;
int msb = 3;

void setup()
{
  
  Serial.begin(9600);
  pinMode(lsb, OUTPUT);
  pinMode(msb, OUTPUT);
}

void loop()
{
 int num = 0;
 for (num =0; num < 4; num++){
   flash(num);
   val = analogRead(analogPin);
   Serial.println(val*0.0049);
   delay(2000);
 }
  
}

void flash(int num)
{
  if (num == 0){
    digitalWrite(lsb, LOW);
    digitalWrite(msb, LOW);
  }
  
  if(num == 1){
    digitalWrite(lsb, HIGH);
    digitalWrite(msb, LOW);
  }
  
  if(num == 2){
    digitalWrite(lsb, LOW);
    digitalWrite(msb, HIGH);
  }
  
  if(num == 3){
    digitalWrite(lsb, HIGH);
    digitalWrite(msb, HIGH);
  }
  
  if ( (num < 0) || (num > 3) ){
    digitalWrite(lsb, LOW);
    digitalWrite(msb, LOW);
  }
}

To test this code, you simply need to copy the above into an Arduino sketch window, connect up the circuit properly, and then upload the code. When it starts running if you click “Serial Monitor” and you should see output that looks like this:

0.00
0.88
2.14
3.04

The code will keep looping through, and your values may be different to mine depending on the tolerance of your resistors, but it seems to do a reasonable job of converting the values.

Since playing around with this I’ve actually been able to use its serial.read() functions to accomplish my task. Using and working with the arduino has been quite refreshing. As an embedded software engineer I find myself googling obscure stuff, and any information I find is generally unreliable. 

In contrast there is so much reliable information out there for the arduino, and it’s really quick to get things to work. I can see why it’s taken off and been commercially successful, and I will certainly be extolling it’s virtues to anyone with a passing interest.  

That’s all I’ve got to share today, hope you enjoy playing with your Arduino!

Free Live Python XML Soccer Scores

In this post I will present some code that you can use to get free live python XML soccer scores. You can use this to log data, or maybe use it in a betting bot or something.

The Code


import feedparser
import time
import datetime



def get_feed():
    url = “http://www.scorespro.com/rss/live-soccer.xml”
    feed = feedparser.parse( url )
    return feed

def extract_info(game):

    game_info={}
    
    start = game[‘title’].find(‘#’,0)
    end = game[‘title’].find(‘vs’,0)
    game_info[‘home’]=game[‘title’][start+1:end-1]

    start = game[‘title’].find(‘#’, start+1)
    end = game[‘title’].find(‘:’, start)
    game_info[‘away’]=game[‘title’][start+1:end]

    start = game[‘title’].find(‘ ‘,end)
    end = game[‘title’].find(‘-‘,start)
    game_info[‘home_goals’]=game[‘title’][start:end]

    start=end+1
    end = len(game[‘title’])
    game_info[‘away_goals’]=game[‘title’][start:end]

    game_info[‘summary’]=game[‘summary’]
    game_info[‘timestamp’]=datetime.datetime.now()
    return game_info


def process_feed(feed):
    
    nicely_processed_feed=[]
    for game in feed.entries:
        nicely_processed_feed.append(extract_info(game))
    return nicely_processed_feed
    

def get_feed_processed():

    return process_feed(get_feed())
    
    
def print_in_play_games():

    for info in get_feed_processed():
        if (info[‘summary’] != “Game Finished”):
            print info[‘home’] + ” : “+ info[‘home_goals’]
            print info[‘away’] + ” : “+ info[‘away_goals’]
            print info[‘summary’] 
            print “”

So far the feed has proved pretty reliable for my purposes. You will also need another Python module called Feedparser for the above code to work.I hope this is helpful to some of you guys. Please leave a comment if you put this code to use!

Approximate String Matching

Recently I’ve been trying to find a way to do String Matching between similar but non-identical strings.

The problem involved matching football teams (soccer franchises- for the Americans) from an XML feed with teams from a bookmakers’ API feed.  For example, a game from around a month ago that was matched was Anzhi Makhachkala v Rostov — that’s how it was represented in the XML feed — with Anzhi v FC Rostov in the bookmakers’ API.

Useful Tools

I had tried a few things to get good matches, and here are some tools that might be useful to yourselves if this is a problem you’re coming up against:

  • Soundex – This algorithm encodes phonetic information about how the words would be pronounced in English. There’s a pretty cool example of how this can be used here. I had an idea that I could look for teams based on the same pronunciation, but this was quite unreliable.
  • Jaro-Winkler distance – This can be used to compare the similarity of two strings. It returns an value between 0 and 1, 1 being an identical match and 0 being a complete mismatch.
  • Bayesian Training – I broke the strings down into letters, trained a Bayesian guesser, and got the probability that the two strings were the same. This method is the one I finally chose to use as it provided me with the best results.

As I was programming this in Python I came across a couple of really useful libraries. For the Jaro-Winkler/Soundex stuff I used a library called Jellyfish. For the Bayesian training method I used Reverend. I highly recommend them both, you should check them out!

My String Matching Method

I made use of the Reverend python module to calculate the probability that the String Matching was correct. Here is my code:
import scorefeed
from reverend.thomas import Bayes

guesser = Bayes()
teams=['home','away']


def train(team_no, name):

    for char in name:
        guesser.train(teams[team_no], char)

def untrain(team_no, name):

    for char in name:
        guesser.untrain(teams[team_no], char)

def guess(name):

    home_guess = 0.0
    away_guess = 0.0

    for char in name:

        if len(guesser.guess(char)) > 0:

            for guess in guesser.guess(char):

                if guess[0] == teams[0]:
                    home_guess = home_guess + guess[1]
                    print home_guess
                if guess[0] == teams[1]:
                    away_guess = away_guess + guess[1]
                    print away_guess
    home_guess = home_guess / float(len(name))
    away_guess = away_guess / float(len(name))

    probs = [home_guess, away_guess]
    return probs

def game_match(betfair_game_string, feed_home, feed_away):
    home_team = betfair_game_string[0:betfair_game_string.find(' V ')]
    away_team = betfair_game_string[betfair_game_string.find('V')+2:len(betfair_game_string)]

    train(0, home_team)
    train(1, away_team)

    probs = []
    probs.append(guess(feed_home)[0])
    probs.append(guess(feed_away)[1])

    untrain(0, home_team)
    untrain(1, away_team)

    return probs

print game_match("Man Utd V Lpool", "Manchester United", "Liverpool")

Improvements

The above code produced a probability that “Man Utd” matched “Manchester Utd” of 44%, and “Lpool” matching “Liverpool” was assigned a 55% probability. It then occurred to me that I required the probability that at least one of these names was correct. So I used the following method:

def prob_match(probs):

    prob_not_home = 1.0 - probs[0]
    prob_not_away = 1.0 - probs[1]

    prob_not_home_and_away = prob_not_home*prob_not_away
    prob_home_and_away = 1.0 - prob_not_home_and_away

    return prob_home_and_away

This gives a probability of 75% that the String Matching was correct. I used this code and was able to get a reasonably large data set of matched games. It worked fairly well for me, but it could certainly be improved and I’d be interested to hear what suggestions you have.

Tips From a Former PhD Student

As I’ve been working as a software engineer for 7 months since finishing my PhD I’ve learnt a lot of stuff that would have been really helpful to know when I started my PhD.

I studied Astrophysics, in particular solar physics and modelling the propagation of particles through the interplanetary magnetic field. This involved a good deal of mathematical modelling and also writing a pretty extensive simulation code.

Before I started my PhD, I had completed my BSc in Mathematics where I earned a first class degree. I only had a relatively small amount of coding experience, although at the time I didn’t realise how little.

I’d completed an introductory C++ course and also written a few different ODE solvers for my main project. I had absolutely no Linux experience at all, and had never heard of a Makefile.

I also had very little background knowledge of the subject I was going to be studying. I had to read up very quickly, ploughing through hundreds of papers.

Anyway, enough rambling about my experience, here are my tips for anyone starting a PhD in a technical subject:

1. Read 21st Century C.

This book was suggested to me when I started my new job. What I found useful was the tips about Makefiles and explaining exactly how compilers and linkers work. I had kind of pieced together this understanding over time, but it wasn’t clear in my mind.
In particular I had tried to link to a few libraries when I developed my simulation code, but I didn’t know about pkg-config, that could have saved me lots of time. There’s also a section on Autotools for wrapping up your projects, and stuff about version control using SVN – this leads me to my next tip…

2. Use Version Control, Preferably Git.  

Version control will save you a lot of hassle. It’s useful for both code, and for text like papers and theses, particularly if you are collaborating on a paper with someone else. You can also set up online accounts to store your work, this gives you added assurance regarding backups, as your work is also stored online.
You will probably want to keep your work private, but allow your supervisor and collaborators access to it. I would suggest Bitbucket, this allows you unlimited private repositories and the ability to set up teams.
Also in terms of employability it seems that Git is the “cool kids” version control software, and is used pretty extensively in industry.

3. Use Sublime Text Editor

Selecting a text editor was tricky when I switched to Linux, my supervisor suggested emacs which is a popular editor, but I found the key shortcuts hard to adapt to, on top of all the other Linux things I was getting used to.
Sublime is a great text editor, it has loads of cool features, like multiple cursors, and also has traditional ctrl-c ctrl-v copy and paste commands, so the learning curve isn’t too steep. It’s also fully customisable and is cross-platform.

4. Learn Shell Scripting

Learn how to write Bash scripts, or any other shell scripts. 21st Century C recommends zsh which I’ve been using recently and is also pretty cool. 
Just taking one day out to do a tutorial like this will save you so much time in the long run. You will never have to do any boring repetitive tasks again if you can master shell scripts.
Also learn about Makefiles, as 21st Century C recommends. This will save you lots of time.

5. Use Python Instead of IDL/Matlab/Octave.

If you use a language such as IDL, Matlab or Octave to analyse data then I’d say use Python instead. Python is a growing language, it’s free, it has many many libraries available, and you will be much more employable.
It’s really easy to learn too, I used this website to get started. It has lots of nice features and the string manipulation stuff is great. Did I mention that it’s cross-platform too?

6. Start a Webpage.

If you have department webspace then stick up a webpage explaining your research interests. Add a few things like links to papers and your CV, and then make sure you track them with Google Analytics.
You’ll be surprised how many hits you get. Also, people reading your page from other universities will often show up when you check the internet service providers of the people who have read your page. This information could be useful when you consider applying for post-docs.

7. Don’t Get Bogged Down in the Details of Papers.

At the start you’re going to have to read lots of papers very quickly and take in a lot of information. It’s hard for you to filter out what is important in the papers that you’re reading if you have little subject knowledge.
Most of the key points will be outlined firstly in the abstract section, and then in the discussion/conclusions section. The context of the work is given in the introduction which lays out the issues the paper sets out to address. The rest of the paper is made up of technical details, these are important, but unless you are going to aim to reproduce the results the main thing is to take in the findings of the paper.

8. Block Facebook on your University PC.

I spent a lot of time messing about on Facebook/Twitter/BBC Sport/various forums. I downloaded a tool to block URLs, I set the password to be a random number that I didn’t remember so I couldn’t undo  the block. This made me much more focussed during working hours.

9. Set up SSH and Dropbox.

SSH allows you to connect to your University network. This means that you can use all the utilities on your University PC from home. This obviously improves your productivity and makes it feasible to work from home.
SSH isn’t perfect though, it can be slow, especially using applications that use X windows. I set up a Dropbox account which I think initially entitled me to 2Gb of free cloud storage that automatically synced between my laptop and whichever other computers I installed dropbox on. I’m now up to 11Gb free usage and I find it to be a really good service.

10. Manage your supervisor.

I was really lucky in that my supervisor was great, but I know some people who had problems with theirs. When I say “manage” your supervisor I mean get to know the things they are picky about and work around them.
For instance, if you know that your supervisor gets very picky about minute changes to content in papers and isn’t going to be satisfied until you’ve done 20 drafts, then make sure you get to work in plenty of time for a conference deadline and be speedy with any corrections.

That’s about all I can think of for now. I hope that these tips will come in handy for future PhD students. I wish I’d read this before I started my PhD.