So, I'm going through the paper, and suddenly realize that it's going to look really, really flimsy without some data and analysis. Not to mention, mighty hard to write 40 pages of case studies based on Google searches. Ideally, I'd have created and distributed surveys at some time in the last year, analyzed them, and have had something more substantial to report. But that's almost dissertation-level annoying.
So I rummage through some online lists of "College towns" and compile a list of 100 candidates. We're not really on the list, so I throw in Albany for some local appeal. But here's the catch: I'm limiting it to single towns, so notions like the "Capital District" won't work-- we have a few colleges in Albany (students are about 20% of our population), but lots more within a 10-mile sweep of here. I'm not looking at any of those. It just brings it up to a new level. But, the beauty of research is that all the stuff you didn't have time or motivation to do can be mentioned in the section entitled "Future Research". (Not quite that easy, but close enough for now.)
I did collect simple stuff, like number of residents, number of students, and population density. And latitude/longitude, in case I have to go back and do some GIS, which I'd rather not. Worst case scenario: throw it into a database, do a Cartesian join to make adjacency pairs of all combinations, calculate the distance between them with the aforementioned coordinates, and then just sort out by proximity.
Not as bad as it sounds, just a little annoying. I did databases for a decade, which included lots of auditing / data mining. Unfortunately, I'm too lazy to do linear regression (I don't think it'll be productive here), and opt instead for eyeballing scatter plots and doing Chi Square probabilities.
Except I haven't done chi square analysis since my new batch of students next week were wearing diapers. Somehow, I found that my old stats book in the basement, despite my various moves over the past dozen years. It was written pre-Excel, so some more Googling found connecting explanations between greek letters and spreadsheet references. However, it still took me way, way too long to figure out whether the =CHITEST() result of .0007 was good or bad for hypothesis bashing.
Turns out, it was bad. Crap, I have nothing to write about. I just threw out seven pages of explanation about my results. There are no patterns in the data, and the best outcome was that there's a 69% chance that the relationship between free student buses and there being a Geography and Planning department is randomly distributed. Ideally, you'd want that to be 5% or less.
But, on this gross level, there is absolutely no relationship (like a 0.1% chance of one) between some of the other combinations of factors I was testing, like city population density, percentage of students in the city, and a bunch of other factors. I had planned to create a more elaborate survey about parking spaces, mean distance to campus, etc. But now I suspect that I can just scrap that whole line of inquiry. Maybe it just won't tell me anything.
Or maybe that is the real result of all this. Maybe those surveys I had thought about are just a waste of time. Having a Geography and Planning dept to influence free student busing means there'll be more of it. But there's no setting cause/effect relationships here, just seeing how likely the distribution could be random. But, it'll give me some data to bash around when I try to plan a new research angle. Like a more elaborate data model, and how to add a time component, among other things.
Now I know why dissertations using surveys take so damn long: low response rates and inconclusive results. Now I have to do another pilot study. Dammit, and I'm back to page 1, though with a good ten pages of charts, graphs, and calculations...
