_____                   _                  _____            _____       _ 
  |     |___ _____ ___ _ _| |_ ___ ___ ___   |  _  |___ ___   | __  |___ _| |
  |   --| . |     | . | | |  _| -_|  _|_ -|  |     |  _| -_|  | __ -| .'| . |
  |_____|___|_|_|_|  _|___|_| |___|_| |___|  |__|__|_| |___|  |_____|__,|___|
  a newsletter by |_| j. b. crawford               home archive subscribe rss

>>> 2020-12-30 I trained a computer to do math (PDF)

Here's an experiment: a briefer message about something of interest to me that's a little different from my normal fare.

A day or two ago I was reading about something that lead me to remember the existence of this BuzzFeed news artice, entitled "We Trained A Computer To Search For Hidden Spy Planes. This Is What It Found."

I have several naggles about this article, but the thing that really got me in a foul mood about is their means of "training a computer." To wit:

Then we turned to an algorithm called the random forest, training it to distinguish between the characteristics of two groups of planes: almost 100 previously identified FBI and DHS planes, and 500 randomly selected aircraft.

The random forest algorithm makes its own decisions about which aspects of the data are most important. But not surprisingly, given that spy planes tend to fly in tight circles, it put most weight on the planes turning rates. We then used its model to assess all of the planes, calculating a probability that each aircraft was a match for those flown by the FBI and DHS.

To describe this uncharitably: They wanted to identify aircraft that circle a lot, so they used machine learning, which determined that airplanes that circle a lot can be identified by how much they circle.

I try not to be entirely negative about so-called "artificial intelligence," but the article strikes me as a pretty depressing misapplication of machine learning techniques. They went into the situation knowing what they were looking for, and then used ML techniques to develop an over-complicated and not especially reliable way to run the heuristic they'd already come up with.

Anyway, this is an interesting problem for other reasons as well. The article Police helicopters circling a city are environmental terrorism makes a case for the harm caused by persistent use of aerial surveillance by police. Moreover, If you'd seen NextDoor around here, you'd know that the constant sound of the Albuquerque Police Department's helicopters is one of the greatest menaces facing our society. This increasingly common complaint has got some press, and although they're no longer keeping me up at night with loudspeaker announcements the frequency with which helicopters circle over my house has been notably high.

Moreover, late last night I went for a walk and there was an APD helicopter circling over me the entire time. You know, being constantly followed by government helicopters used to be a delusion.

So, I decided to explore the issue a bit. I dropped a few dollars on FlightAware's API, which they excitedly call "FlightXML" even though it returns JSON by default[1], in order to retrieve the last week or so of flights made by all three of APD's aircraft[2]. I then trained a computer to identify circling.

No, actually, I wrote a very messy Python script that essentially follows the aircraft's flight track dragging a 1.5nm x 1.5nm square around as the aircraft bumps into the edges. Any time the aircraft spends more than six minutes in this moving bounding rectangle, it deems the situation probable circling. Experimentally I have found that these threshold values work well, although it depends somewhat on your definition of circling (I chose to tune it so that situations where the aircraft makes a single or only two revolutions are generally excluded). I plan to put this code up on GitHub but I need to significantly clean it up first or no one will ever hire me to do work on computers ever again.

On the upside, maybe recruiters will stop emailing me because they "loved what they saw on GitHub." Actually, maybe I should put it up right now, with a readme which declares it to be my best work and a specimen of what I can achieve for any employer who cold-calls me ever.

You can see the result here. Incidents of circling actually seem more evenly distributed through the city than I had expected, although there is a notable concentration in the international district (which would be unsurprising to any Burqueño on account of longstanding economic and justice challenges in this area). Also interesting are the odd outliers in the far northwest, almost Rio Rancho, and the total lack of activity in the South Valley. I suspect this is just a result of where mutual aid agreements are in place, Bernalillo County has its own aviation department but I don't think the Rio Rancho police do.

This is all sort of interesting, and I plan to collect more data over time (I only seem to be able to get the last week or so of tracks from FlightAware, so I'm just going to re-query every day for a few weeks to accumulate more). Maybe the result will be informative as to what areas are most affected, but I think it will match up with people's expectations.

On the other hand, it doesn't quite provide a full picture, as I've noticed that APD aircraft often seem to fly up and down Central or other major streets (e.g. Tramway to PdN) when not otherwise tasked. This may further complaints of low-flying helicopters from residents of the downtown area, but isn't quite circling. Maybe I need to train a computer to recognize aircraft flying in a straight line as well.

It would also be interesting to apply this same algorithm to aircraft in general and take frequent circling as an indicator of an aircraft being owned by a law enforcement or intelligence agency, which is essentially what BuzzFeed as actually doing. I made a slight foray into this, the problem is just that, as you would expect, it mostly identified student pilots. I need to add some junk to exclude any detections near an airport or practice area.

Anyway, just a little tangent about something I've been up to (combined with, of course, some complaining about machine learning). Keep using computers to answer interesting questions, just please don't write a frustrating puff piece about how you've "trained a computer" to do arithmetic and branching logic[3].

With complete honesty, I hear a helicopter right now, and sure enough, it's APD's N120PD circling over my house. I need to go outside to shake my fist at the sky.

[1] I would have preferred to use ADSBExchange, but their API does not seem to offer any historical tracks. FlightAware has one of those business models that is "collect data from volunteers and then sell it," which I have always found distasteful.

[2] Some context here is that APD recently purchased a second helicopter (N125PD). This seems to have originally been positioned as a replacement to their older helicopter (N120PD), but in practice they're just using both now. This has furthered complaints since it feels a little but like they pulled a ruse on the taxpayers by not selling the old one and instead just having more black helicopters on the horizon. This is all in addition to their fixed-wing Skylane (N9958H).

[3] I will, from here on, be proclaiming in work meetings that I have "trained a computer to rearrange the fields in this CSV file."