In this blog I’ve often tried to emphasise how important it is to grasp the basic principles of investigation and learn how to apply the right method to a problem instead of just relying on a few tools to do the job. There are many reasons for this but the main reason is that almost all the tools and techniques we like to use are relatively short-lived and become obsolete quickly. The demise of Facebook’s graph search is a well known recent example among many.
That said there are some very old techniques that aren’t likely to disappear soon and when combined with modern technology they are a useful part of any OSINT investigator’s arsenal. The mathematical logic behind Boolean Searching dates back to the 19th Century and it underpins modern computer technology, programming, and web searches, among other things. It also provides a logical structure that can be used to evaluate propositions and determine whether something is true or false, which is ultimately the purpose of investigations of any kind.
The rest of this post will look at how to use Boolean-style logic to solve investigative challenges and how they can also be applied to conduct highly targeted Google Image searches. I’ll be using Philipp Dudek’s recent Quiztime challenge as an example.
OR, AND, NOT
The goal of any investigation, no matter how big or small, is to arrive at a series of propositions that can be shown to be true, with any incorrect propositions having been eliminated and shown to be false. For example if I investigate a bank robbery and the witnesses describe the suspect as a white man, aged under 25, with a blue jacket and local accent, then I know that the suspect will be in a pool of people for whom all those conditions are true. If any of those conditions are not true of a suspect, then they can be eliminated. (This ignores the issue of witness reliability and other real-world factors, but I’m just illustrating a point).
So if I had a magic database that I could search to find who committed the bank robbery, I would use Boolean logic to set my parameters to find the suspect. The search parameters would be something like this:
white male AND under 25 AND blue jacket AND local accent
Assuming that every possible person was in my magical database, then the suspect would be in the pool of people that the search brought back. Anyone who did not meet those criteria could be eliminated.
In Google searches the AND operator (more commonly displayed as a + symbol) is used to achieve the same result. Searching for white male AND under 25 will bring back different search results to just white male under 25. Combining multiple AND queries will only bring back search results where all the criteria are true. It is easy to see how the more true criteria that can be applied to a search, the more likely it will be that we will find the specific information that we are looking for.
Now supposing that I spoke to another witness to the bank robbery. This witness does not think that the suspect wore a blue jacket, and she is fairly certain that it was a green jacket. This complicates my search a little bit, because now it is no longer possible to be certain that the suspect wore a blue jacket. This causes a little uncertainty and means that I have to change the search query on my magical database a little bit. Enter the OR operator.
The Boolean OR operator is helpful to construct search terms when there are several possible answers to a query. Now that there is a little bit of uncertainty about what colour jacket my suspect was wearing, I can amend the search by adding an OR operator:
white male AND under 25 AND local accent AND blue jacket OR green jacket
So now this search would return details for suspects who wore blue jackets as well as green jackets. For investigative purposes, OR operators allow a little bit of uncertainty to be factored in to search parameters.
A result like this would still leave too many results in my pool of suspects. I need to add more parameters to narrow down the results. The more conditions I can add where the result is true means that I have a higher chance of finding the suspect. I decide to question the witnesses again to get more information. They all agree that the suspect was very small in height for an adult man, certainly no more than 160cm tall. This allows me to tweak the search in my magical database a little more and introduce the Boolean NOT operator. This is used to filter out results where a condition is not true. I could apply it to the magic search engine query as follows:
white male AND under 25 AND local accent AND blue jacket OR green jacket NOT taller than 160cm
By adding the NOT operator to the OR operator I already have in place, I can eliminate every suspect over 160cm tall and make my pool of suspects even smaller (no pun intended).
So the more facts we know must be true of a subject in an investigation, the more criteria we can apply to eliminate incorrect answers and find the correct one. Boolean operators are the best method for doing this, either when conducting search queries or more generally when thinking through a problem or conducting something like gap analysis. It helps keep an investigation focused and stops people from guessing or just making things up.
In the rest of this post I will show how to apply this idea and construct a Boolean search term that will find the answer to a photo geolocation challenge.
The Challenge
This geolocation challenge took the form of a short video recording posted by Philipp Dudek. Here’s the original tweet containing the link to the video:
Welcome to the #fridayquiz. Where was this video taken?
🤝 Reply all = collaboration
👍 Msg/reply to me = answers
🔃 Retweet = invite others
👣 Follow @quiztime = more #verification pic.twitter.com/6ymZoJiKcW— Philipp Dudek 🕴 (@dondude) August 30, 2019
It’s important to listen to the sound as well as watching the video before deciding what criteria must be true of this location, and therefore what boolean search terms have to be used to find it. I should stress that this method is not the only way to solve this particular challenge, but if you understand the importance of the underlying logic can be it will be easier to apply to other scenarios if you get a little stuck.
Which Search Criteria To Use?
At first it is not entirely clear what this is. Philipp is moving along a tunnel of some kind. Could it a train tunnel, or an underground station? There is a rhythmic mechanical sound as he moves along – could it be a train, a lift, travelator or an escalator perhaps?
The only visually distinguishing feature is the dot pattern all over the wall. Is this enough to geolocate Philipp? It will be in the end, but the more parameters we can add, the more certain we can be. From previous quizzes we know Philipp lives in Germany, so we could add ‘Germany’ as one of the search parameters too.
So a search term that takes all these variables into account would look something like this. I’ve substituted the + symbol for AND, but it means the same thing.
Germany + tunnel OR escalator OR elevator + dots
This search combines criteria about the location that we can be fairly certain are true (Germany, dots) and then uses the OR operator to search for things that are likely to be true of the correct location, but where there is still a little doubt ( tunnel, elevator, escalator etc). The search result will show all matches that are true for these parameters, and therefore the correct result will be in there somewhere.
Google Images or Yandex?
The choice of search engine can make a difference when conducting boolean image searches of this kind. Usually I prefer Yandex for image searches because it works very well for finding visual matches and so is useful for geolocation. Google Image search is generally better at determining what kind of object the image shows, and then suggesting matches based on the relevant keywords.
Consider this recent image location challenge by Paul Fennell:
When I conduct a reverse image search with Yandex, it immediately finds a match for the photo:
However Google does not find a match for the image. Instead it works out that the image is of a statue, and then presents some search results which it thinks are relevant based on the fact that the object is a statue:
This is a significant way of presenting image search results. It’s why Yandex is usually more reliable for direct reverse image matches, but in this case we are going to do an image search based on a boolean keyword list, so Google Image search is a much better tool for finding the images we are looking for.
Results
So entering the search term Germany + tunnel OR escalator OR elevator + dots gives the following results:
On the first page there is a strong visual match for the location in Philipp’s video. The Elbphilharmonie in Hamburg:
It looks like it could be the right location. If Philipp was filming as he went along this escalator, it would be a pretty good match. But is there a way to make sure? At this point it would be possible just to find lots of images of the Elbphilharmonie for a direct comparison, but in some cases this is not always possible and it becomes necessary to tweak the Boolean search a little. The initial parameter in the search was “Germany”. What happens if we tighten it up a bit and replace “Germany” with “Hamburg”?
Even better. By refining the parameters to check the hypothesis that the location is in Hamburg, there are now additional matches for the Elbphilharmonie. The first result is from a video that we can watch to verify that this is the same place that Philipp was travelling along. Perfect!
For more Quiztime related posts, click here.
Pingback: Dressed NOT for Success – We are OSINTCurio.us
Pingback: One Search To Rule Them All. Использование логических операторов в расследовании - Osint Library
I found this post interesting enough that I read it completely despite having had it come up quite tangentially, the unexpected result of my efforts to identify which stock photo services allow users to utilize Boolean searches.
In my case, I am looking for an image depicting the phrase “needle in a haystack” for my blog post, “The Needles Haven’t Gotten Any Smaller, the Haystacks Have Just Gotten Bigger: How the Quantity of Content Has Muddied the Waters of Quality.”
Unfortunately, the usual suspects are all returning photos of either needles or haystacks, but not of those two things together.
OK. A simple AND between the terms, and we should be good… Not so fast. I am in low-grade shock, but my modifier didn’t change my results!
Then I figured it was just dumb programming, so I tried “needle” AND “haystack” figuring that would surely do it. Nope.
It is unreal to me that these companies, which are essentially in business to manage giant databases, have not incorporated this SUPREMELY OBVIOUS tool into their efforts. In fact, as of this moment, I figure it must be user error at play, for though that is hard for me to believe/accept, that makes more sense to my rational brain than the explanation that the Adobe people having missed this.
Off to Google Image search, it seems. Wish me luck!
Cheers!
Kenneth Daniels
Keyway Insights
PS – Two quick things:
1. I didn’t know what OSINT was an initialism for, and since that ignorance did not compromise my understanding, I didn’t bother looking it up until just now. I wonder if the fact that it isn’t mentioned is an oversight, or if it is something y’all are fine with simply because more or less everyone who comes to the site already knows.
In any event, in my experience writing (I have a journalism degree and write professionally), the first mention would go like this… Open-Source Intelligence (OSINT).
2. I had an intriguing little idea flash cross my brain recently, and it was gone as quickly as it had come, likely never to reemerge. But the above mention of Dr. Fennell’s “image location challenge” brought it back to me…
Do you know, are there any existing contests in which participants use Internet search tools to find a predetermined bit of obscure information in the least amount of time?
I have seen events in which people wrap unusual items with Christmas paper (not surprisingly a tape company was the sponsor) head-to-head, so it is not beyond possibility. In fact, were I a member of Microsoft’s marketing department, I would absolutely sponsor such a thing to promote Bing. Or if I was responsible for getting the relaunch of Dogpile some much needed attention, this would be a great way to go about it. And doing so would be fairly easy and inexpensive, after all, as it is an entirely digital affair.
The organizers could set it up such that it would be conducted internationally. They would simply model the event after existing sporting tournaments, holding qualifiers to be followed by the traditional bracketed match-ups, all leading to a live-streamed head-to-head final battle of the World’s Best Data-Diggers.
This would have wide appeal in our digital world, and be such a breeze to promote. It would produce oodles of statistics, and the “play at home” component is a marketer’s dream.
Come to think of it, the whole thing reminds me of the game from back in the Internet’s infancy where we would compete to see who could do a web search which resulted in a single result the quickest. Good times, though it would probably a virtual impossibility in today’s world!
Anyway, seems to me this concept might be worth looking into. It might not be lucrative (as always that would depend on the execution), but if nothing else, it could be fun.
Maybe stopped by and check out my blog. It has a number of interesting topics, at least I would like to think it does, and feedback & shares are always appreciated!
Musings on Marketing and More
https://kennethdaniels.substack.com/