M&E

Analyzing Survey Monkey Data in LongJump

I recently spent a few days analyzing the data from our global Survey Monkey survey of participatory local democracy for our project with UNDEF. It’s awkward. Both systems are wonderful for what they do (collect and hold data), but are not terribly user friendly at complex data manipulation.

(Update 8/14) The wonderful Alice Patrick at LongJump has let us know that there is a third way to do analysis in LongJump – formulas in fields – that will be ideal for us. see http://lj.platformatyourservice.com/wiki/Formula_Fields for details)

Here is the approach I took. I’m writing it up so that (a) those wiser than me can make suggestions and (b) those struggling with it for the first time can avoid my mistakes.

Mistakes we should have avoided with Survey Monkey

1) The only good way to handle the “I don’t know” option is to make the question non-required, and let “no answer” mean “I don’t know.” For example, if you give people the options:

How good is it? [ ] Poor, [ ] Mediocre, [ ] Good, [ ] Great, [ ] I don’t know

then when you download the data in numerical (rather than word) mode, you’ll get a score from 1-5 where “I don’t know” is 5.

2) Be very clear about single-answer choices and multi-answer check boxes. In 3 of our 4 languages, we asked for people’s employment sector with multi-answer check boxes, which results in one data column per option, rather than a simple 1 column answer. VERY tedious to analyze.

3) Doing multiple languages? Make SURE the columns align! We did surveys in four languages, and each one had a few subtle changes resulting in a different number of columns, making it a pain to combine the data into one database.

4) Using text boxes for numbers. We asked for people to fill in approximately how many days does it take to get a response from the government. Some people were obliging and put “7” or “150” – but some wrote “forever!!!” or “it depends” which is very hard to analyze mathematically. Anything with a range of numbers should be made into a multiple choice.

5) Using text boxes for indicating country. Similar problem. We should NEVER use text boxes in surveys. How many ways can people spell Senegal? Plenty! Things like name, country, email should be handled with their demographics tools.

6) Understanding the difference between complete and incomplete survey. I would think there would be a way to download ONLY completed surveys, but apparently not, so you need to manually bring the data first into Excel, delete lines that are incomplete, and then upload to the database.

Long Jump Import

You export data from SurveyMonkey (or IFormBuilder) as an excel spreadsheet, using the “compressed” and “data only” options.

Open the spreadsheet, and create database field names you want in the first row.

Survey Monkey data has two rows of headers, containing all the text for the survey. You will want to insert a new line for the database column names – one short word – I used the question number, like Q01, Q02a, Q02b etc to align simply with the form. If you’ve analyzed this survey before, you can simply copy and paste that first line into the new spreadsheet.

Longjump will ask you whether data are numbers, numbers with decimals, text, dates, emails, etc. Set each response column to number. If you are importing dates, they MUST be in American format (mm/dd/yyyy).

Long Jump Analysis

One thing I learned from 45 years of computer programming is “don’t be clever” – be boringly simple and consistent.

You analyze data in Long Jump be creating a report. Just as you can fill excel cells with formulas, you can put formulas into columns in longjump.

There are two places to do this FIELDS and COMPUTE. FIELDS is row by row, and COMPUTE is for GROUPS of rows (in our case, grouped by country).

FIELDS TAB

In the “fields” tab, you first pull all the database fields you will be using in calculations into the list, and then you click on the link that says “New Computed Field.” If you cannot use the raw response as it is, you can create a new version of the field. For example, in the “Good/Great/Don’t know” example above, I created a new field R5 with this calculation:

if(q05=5,0,q05)

which means if the person said I don’t know, treat it as zero, otherwise use what you’ve got.

You can use this powerfully with iFormBuilder if you forgot to set it up to pass numbers instead of words, like this

if(q05=’I don’t know’,0,if(q05=’Great’,4,if(q05=’Good’,3,if(q05=’Mediocre’,2,1))))

If the answers are long, you can use the “Contains” function to trigger numbers on key words

You need to create another computed field – the number of non-zero answers. Why? Because if you want to average the answers, you don’t want to include the zeros. For example, if two people say great and two don’t know, the average needs to be great (4) not mediocre (2).

LongJump really should offer the NZA (non-zero-average) function automatically – I’m lobbying for it – but until then, for any column like this, you need to create something like N5

if(q05=0||q05=5,0,1)

That is, if it’s zero or 5 don’t count it, else count it.

Good vs. Bad: some times our scales were for things that were bad for our index – it was at this level that I created new computed fields that turned everything into a similar positive scale.

COMPUTE TAB

After you tell it how to group the data, then you can tell what to compute. In my case, I wanted to create averages each cluster of answers. Let’s say the raw data runs from 1-4 in each of 5 answers (for a total range of 0-20), we would do the following calculation…

Cluster1:

5*(total(R5)/total(N5)+total(R6)/(total(N6)+total(R7)/total(N7)+total(R8)/total(N8)+total(R9)/(total(N9))

Unfortunately, LongJump won’t let you simply type this (yet, I’m lobbying…) – it gives them a long name like custom_column_r5_14 which you get from a pull-down menu. But it’s not bad – just annoying.

BUGS IN LONGJUMP

I found several instances where, after filling in formulas in the COMPUTE tab, if I needed to go back and correct something in the FIELDS tab, it wouldn’t let me do it. This doesn’t always happen. I had to copy the COMPUTE formulas on a piece of paper, delete them, fix the FIELDS, then go back and put the COMPUTE formulas back.

I also had trouble empyting and re-importing data. It creates indexes that need to get deleted after data is deleted – not sure how or why -and then it doesn’t automatically align columns – you have to create a mapping table using pull downs for every field, which is REALLY tedious.  Easier to just create a whole new object.