Saturday, August 25, 2007

College Town Scatter

I'm trying to finish the last of my papers this summer (well, I have one more to revise, but that'll be next week). Ironically, it's about transportation in college towns, now that all the college students are heading back into town for the start of classes on Monday.

So, I'm going through the paper, and suddenly realize that it's going to look really, really flimsy without some data and analysis. Not to mention, mighty hard to write 40 pages of case studies based on Google searches. Ideally, I'd have created and distributed surveys at some time in the last year, analyzed them, and have had something more substantial to report. But that's almost dissertation-level annoying.

So I rummage through some online lists of "College towns" and compile a list of 100 candidates. We're not really on the list, so I throw in Albany for some local appeal. But here's the catch: I'm limiting it to single towns, so notions like the "Capital District" won't work-- we have a few colleges in Albany (students are about 20% of our population), but lots more within a 10-mile sweep of here. I'm not looking at any of those. It just brings it up to a new level. But, the beauty of research is that all the stuff you didn't have time or motivation to do can be mentioned in the section entitled "Future Research". (Not quite that easy, but close enough for now.)

I did collect simple stuff, like number of residents, number of students, and population density. And latitude/longitude, in case I have to go back and do some GIS, which I'd rather not. Worst case scenario: throw it into a database, do a Cartesian join to make adjacency pairs of all combinations, calculate the distance between them with the aforementioned coordinates, and then just sort out by proximity.

Not as bad as it sounds, just a little annoying. I did databases for a decade, which included lots of auditing / data mining. Unfortunately, I'm too lazy to do linear regression (I don't think it'll be productive here), and opt instead for eyeballing scatter plots and doing Chi Square probabilities.

Except I haven't done chi square analysis since my new batch of students next week were wearing diapers. Somehow, I found that my old stats book in the basement, despite my various moves over the past dozen years. It was written pre-Excel, so some more Googling found connecting explanations between greek letters and spreadsheet references. However, it still took me way, way too long to figure out whether the =CHITEST() result of .0007 was good or bad for hypothesis bashing.

Turns out, it was bad. Crap, I have nothing to write about. I just threw out seven pages of explanation about my results. There are no patterns in the data, and the best outcome was that there's a 69% chance that the relationship between free student buses and there being a Geography and Planning department is randomly distributed. Ideally, you'd want that to be 5% or less.

But, on this gross level, there is absolutely no relationship (like a 0.1% chance of one) between some of the other combinations of factors I was testing, like city population density, percentage of students in the city, and a bunch of other factors. I had planned to create a more elaborate survey about parking spaces, mean distance to campus, etc. But now I suspect that I can just scrap that whole line of inquiry. Maybe it just won't tell me anything.

Or maybe that is the real result of all this. Maybe those surveys I had thought about are just a waste of time. Having a Geography and Planning dept to influence free student busing means there'll be more of it. But there's no setting cause/effect relationships here, just seeing how likely the distribution could be random. But, it'll give me some data to bash around when I try to plan a new research angle. Like a more elaborate data model, and how to add a time component, among other things.

Now I know why dissertations using surveys take so damn long: low response rates and inconclusive results. Now I have to do another pilot study. Dammit, and I'm back to page 1, though with a good ten pages of charts, graphs, and calculations...

Tuesday, August 21, 2007

Morning Coffee with GPS

Here's me drinking coffee with my loaner GPS Logger. Of course, it's not as simple to get this image. It involves:
  1. pushing button on GPS until it locks a satellite
  2. Buying coffee
  3. mixing in cream and sugar
  4. drinking coffee
  5. {wait 10 minutes}
  6. plug GPS into laptop
  7. start "Datalog" program
  8. download GPS data
  9. "save as" Google Earth KML file
  10. clicking on it
  11. Waiting a few minutes for everything to load
  12. zooming in on reading points, wondering where you were really standing when it shows you flying around all over the place
  13. trying to figure out what use you're really going to make of it all
Really, I had at least two large coffees, which might be another factor in the GPS "Error Budget" previously unaccounted for. Or, that I was sitting in a steel-roofed building with a big HVAC unit on top... Either way, the second coffee is probably why I'm still up at 3 am writing this. (Or maybe it was the fourth, or the diet coke that followed it.)

Monday, August 20, 2007

Fun with GPS - Getting your bearings

So after the last meeting in NYC for the GPS transit project, I found myself in Little Korea, since it's near Penn Station and I had over an hour to wait. And since we hadn't had it in a long time, I got Kimchee and ate a quick dinner at the pay-per-pound Korean cafeteria-type place on 32nd st, between 5th and 6th Ave.

Finally, 15 minutes before the train departure, I rushed down the escalator at Penn Station, carrying a bag of garlic and chili picked Kimchee and a backpack full of GPS units. As I passed the heavily armed National Guardsmen, I realized that, maybe, I should slow down, and stop looking harried and sweating profusely while carrying lots of flashing lights in an backpack and a large, smelly bag of something largely unrecognizable.

So, upon returning, I found that the units hadn't been logging anything after all.

But I did go through old data, and take a shot at actually deciphering the data. It's a trip, so to speak. Either these things are a bit inaccurate, or I'm Superman. Sometimes, it logs me flying around in circles, over buildings and trees. What I read is the location of where the signal hits, and if there's a building or tree, it reads where the signal came bounced from, rather than where I am. And there are a lot of heavily overgrown trees in Albany, just as there are lots of tall buildings in NYC. But, once you accept that you're really on the sidewalk and not on top of a 30-story building, you're ok.

Then the fun part is this: the relational database doesn't work well for analyzing a lot of the data, since those are set-based, and you really need to know the readings in relation to adjacent records. You could do recursive joins based on prior records, but that's no picnic.

Then the task of calculating "bearings" is a lot harder than at first glance. Namely, I want to know from a pair of readings whether I'm heading East, for instance. After some effort and the sad realization that my math sucks, I finally find it in Google. The formula is this, converted to Excel syntax:
=MOD(ATAN2(COS(LatA)*SIN(LatB) -SIN(LatA)*COS(LatB)* COS(LongB-LongA), SIN(LongB-LongA) *COS(LatB)),2*PI()) *180/PI()

where LatA, LongA is origin and LatB, LongB is destination.

Oh yea, the answers are also backwards, until you realize that Western hemisphere readings should be negative. Then North = 0, East = 90, South=180, West = 270.

Of course,the readings still skip all over the place, but when you realize that road aligments are going to be roughly at 90 degree angles, along some kind of offset, it's slightly easier. I guess.

Wednesday, August 15, 2007

Two down... one to go


I submitted my second paper for the summer. One left, to finish in the next two weeks. The long one, about comparing transportation systems in college towns. (The start of a new paper is the fun part for me, believe it or not. Maybe I'm just not a finisher by nature...)

UC Davis is really cool. They have a dome house commune. And a student-run bus service that they share with the rest of the community. And their own student wiki, no less, founded in 2004, not even a year before our own ill-fated department wiki at school.

The creator of the DavisWiki went on to make one about Rochester, NY. I never thought I'd have WikiEnvy. (Or bus envy, for that matter.)

But it makes me wonder: is a collaborative, activist culture that would start their own community bus service the same one that would actively embrace a wiki? When I did my original transit web site, transit information and a wiki seemed like a natural fit to build a community information resource that included transportation. Is this the case in other college towns?

Sunday, August 12, 2007

Overly Subjective Scholarship (and other alternatives to OSS)

Just trying to nail the lid on another research paper. It feels like an uphill fight, having to throw out sentences, move sections, and once in a while, I manage to write something I didn't already say two pages back. It's about Open Source Software.

I don't know why this is so hard to write.

I don't know why so many of my journal articles seem to miss the whole f-ing point about Open Source software. Then again, I used to write code for almost a decade. I have lots of first-hand knowledge about how much it sucks to code for The Man.

Seriously, it's really about trying to get the other guy write some of the code for you, so you get something better for free (except for the labor and emotional costs). And sometimes, you just write the thing yourself, because other people aren't as smart as you, and other developers' bugs are a whole lot uglier than your own.

Seriously.

Thursday, August 09, 2007

Coffeehouse chat with one of my advisors

I was just thinking about the idea of insurance, government bailouts, and moral hazards. Normally one thinks of moral hazard as a bad thing, but given the costs of securing against low probability events, it may not always make sense to try to prevent property value destruction through more expensive construction, but rather spend money on responsiveness to failure / damage.

Can we anticipate up front all possible sources of threats? I worked in the WTC in the late 90's, where they were responding to the first WTC bombing by blocking road access, stationing guards, inspecing packages, securing elevators, removing all trash bins, bombproofing the mailboxes, closing the underground parking garage, and had (I believe) upgraded their ventilation systems to rapidly dissipate a poison gas attack. (I was told they could change the air in the towers in 90 seconds). Then, hijacked planes were flown into the buildings.

Due to the recent news about infrastructure vulnerabilities, information is highly critical, and investing more in smarter/faster responses to system failures might save more lives than simply trying to construct against all possible sources of failure.

Also, I wonder whether a distributed peer architecture for emergency notifications would make more sense. The message size to be transmitted is tiny in comparison to the number of requests, and I'm wondering if there's a good of broadcast-only text communications to cell users within a geographic region. Part of the value of information is in reducing uncertainty, and reducing stress.

Maybe when I get more headway on my dissertation proposal, I'll have something more coherent to say about that...

Saturday, August 04, 2007

Overcaffeinated Brain Part II

It just occurred to me-- with the interest in geothermal heating and cooling for energy conservation, has it ever occurred to anyone to use the temperature difference between the outside air and water / sewage systems to help heat/cool houses? The fact that both systems are piped through the ground should help sink whatever waste heat that results from our use... and it won't require much change to the infrastructure.

If you use water in your building, who needs to pump water 400 ft underground to dissipate heat? You're already going to just flush waste water down the sewer anyway... why not use it to dissipate unwanted heat as well? It'll cool off underground if it needs to.

Aw- just a thought. That'll look really dumb when I read it again tomorrow.

Open Sourced Ranting

I just wrote this in some over-caffeinated stream-of-consciousness. I have to delete it from the paper, but WTH, I'll just post it here. No claim will be made regarding its suitability for any purpose. You're free to use it and modify it as you see fit, just so long as you don't charge money for it or obtain any other compensation without my permission. (i.e. you can go copyleft it.)

----

One famous case in complimentary software was Microsoft Corporation’s decision to bundle, or provide complimentary copies, of the web browser Internet Explorer, with its Windows operating systems. Until this point, web browsers had been separate commercial products which were distributed for charge. An anti-trust suit ultimately was initiated in part because of this bundling of a free (complimentary) product that directly competed with another commercial product released by Netscape Corporation, namely its Netscape browser.

However, Microsoft has in turn fought back against Open Source offerings that compete with its own products, such as Linux and OpenOffice, funded in part by its competitors IBM and Sun Microsystems, respectively. Though Microsoft asserted that Internet Explorer that it had bundled (complimentarily) with its operating system, it has yet to clarify how an application whose primary purpose to render HTML document into a user-friendly presentation would be essential to the task of listing the content of file directories. (In fact, it accomplishes this task poorly, in that web browsers to this day are ill-suited in displaying tree-structured data that is the essence of how PC file management systems have worked for nearly three decades.)

However, IBM, SAP, and Sun Microsystems have in turn supported Open Source software products that compete against the bread and butter of Microsoft’s revenue streams: operating systems, database servers, and Office software, though these products are just as essential to their project sponsors’ business strategies as Internet Explorer was to Microsoft’s.

Turnabout is fair play. Or: as you make your bed, so shall you sleep. Or: tough cookies.

Friday, August 03, 2007

GPS will be cool... someday

I finally managed to get my first set of logs from my GPS unit and plot them on Google Earth. Unforunately, I didn't realize that you have to "Resume Logging" to collect more data, so I don't have anything for Albany yet, just lower Mantattan, and the Penn Station (Empire State Building) neighborhood. The GPS looks like a clamshell cell phone closed, but only has one button and three LED indicators. There's not much you can tell about the unit's status without plugging it into your computer and using the problematic software to extract data.

I suspect it'll take a while before a good code base for GPS units is built, to make it as easy to connect as a digital camera. But, there's already a bit of Open Source work being done to build universal drivers, so there's hope yet... The above (right) photo was taken from my first digital camera (circa Summer 2003, before moving back to Albany) that had some non-standard memory management and likewise had a really, really clunky driver/interface... I should check sometime whether there's a new OSS driver for it as well, when I'm not busy.

Like my camera, I'd like it to have a removable Memory Stick (or SD card if you're not a SONY household). That way, you'd be able to just swap cards when the memory is full, and read it from some other device-- not even necessarily your computer, but maybe a PDA or something more specialized. (Long Live SneakerNet!) The GPS is the most accurate time-keeping device you can have, you can combine it with your digital photo timestamps, getting a better match between the place and time you took each photo. Or your call records. Or the nearest bus/taxi/mailbox/caffeine/whatever. Or a bunch of other stuff. Eventually, once the software gets a lot better-- if there are open (source) standards to allow interoperability.

Now to work on the IT plumbing for my own tests: the MySQL database, and some interface for Google's KML files. I'm thinking about a web-based upload tool to dump in data remotely. Not like I have much else to do this month. :)

Wednesday, August 01, 2007

It's now August...

Looking over this, I noticed that somehow I got the month wrong... even the headline proudly declared June 22nd... when in fact it's no longer even July. Wow, the first time I was ever off by a month!

I took a daytrip to NYC on Monday for a meeting. I'm now on a research project for using GPS to monitor the travel habits of some volunteers. The point is to determine where people go, and how accurate they self-report their travels. When I coax the data out of my test GPS, I'll post it.

So far, the data it output reported me having breakfast in Taipei before the meeting near the South Seaport. But it did capture my little walk around Penn Station afterwards while I was waiting for my train. (There's a little Korean cafeteria/buffet on 32nd Street I like-- you pay by the pound, and you can get a pretty good variety for under $6. I used to do that after work when I was on my own for dinner.)

The GPS unit seems to have died after getting on the train back to Albany.. there's no data after entering Penn Station. Today, I planned to take it out for another test run, but I'm staying home most of the day-- especially after realizing that it's not only not June, but now August, and my to-do list from May hasn't gotten much shorter.