One Search To Rule Them All – Boolean Searches For Images2nd September 2019
In this blog I’ve often tried to emphasise how important it is to grasp the basic principles of investigation and learn how to apply the right method to a problem instead of just relying on a few tools to do the job. There are many reasons for this but the main reason is that almost all the tools and techniques we like to use are relatively short-lived and become obsolete quickly. The demise of Facebook’s graph search is a well known recent example among many.
That said there are some very old techniques that aren’t likely to disappear soon and when combined with modern technology they are a useful part of any OSINT investigator’s arsenal. The mathematical logic behind Boolean Searching dates back to the 19th Century and it underpins modern computer technology, programming, and web searches, among other things. It also provides a logical structure that can be used to evaluate propositions and determine whether something is true or false, which is ultimately the purpose of investigations of any kind.
The rest of this post will look at how to use Boolean-style logic to solve investigative challenges and how they can also be applied to conduct highly targeted Google Image searches. I’ll be using Philipp Dudek’s recent Quiztime challenge as an example.
OR, AND, NOT
The goal of any investigation, no matter how big or small, is to arrive at a series of propositions that can be shown to be true, with any incorrect propositions having been eliminated and shown to be false. For example if I investigate a bank robbery and the witnesses describe the suspect as a white man, aged under 25, with a blue jacket and local accent, then I know that the suspect will be in a pool of people for whom all those conditions are true. If any of those conditions are not true of a suspect, then they can be eliminated. (This ignores the issue of witness reliability and other real-world factors, but I’m just illustrating a point).
So if I had a magic database that I could search to find who committed the bank robbery, I would use Boolean logic to set my parameters to find the suspect. The search parameters would be something like this:
white male AND under 25 AND blue jacket AND local accent
Assuming that every possible person was in my magical database, then the suspect would be in the pool of people that the search brought back. Anyone who did not meet those criteria could be eliminated.
In Google searches the AND operator (more commonly displayed as a + symbol) is used to achieve the same result. Searching for white male AND under 25 will bring back different search results to just white male under 25. Combining multiple AND queries will only bring back search results where all the criteria are true. It is easy to see how the more true criteria that can be applied to a search, the more likely it will be that we will find the specific information that we are looking for.
Now supposing that I spoke to another witness to the bank robbery. This witness does not think that the suspect wore a blue jacket, and she is fairly certain that it was a green jacket. This complicates my search a little bit, because now it is no longer possible to be certain that the suspect wore a blue jacket. This causes a little uncertainty and means that I have to change the search query on my magical database a little bit. Enter the OR operator.
The Boolean OR operator is helpful to construct search terms when there are several possible answers to a query. Now that there is a little bit of uncertainty about what colour jacket my suspect was wearing, I can amend the search by adding an OR operator:
white male AND under 25 AND local accent AND blue jacket OR green jacket
So now this search would return details for suspects who wore blue jackets as well as green jackets. For investigative purposes, OR operators allow a little bit of uncertainty to be factored in to search parameters.
A result like this would still leave too many results in my pool of suspects. I need to add more parameters to narrow down the results. The more conditions I can add where the result is true means that I have a higher chance of finding the suspect. I decide to question the witnesses again to get more information. They all agree that the suspect was very small in height for an adult man, certainly no more than 160cm tall. This allows me to tweak the search in my magical database a little more and introduce the Boolean NOT operator. This is used to filter out results where a condition is not true. I could apply it to the magic search engine query as follows:
white male AND under 25 AND local accent AND blue jacket OR green jacket NOT taller than 160cm
By adding the NOT operator to the OR operator I already have in place, I can eliminate every suspect over 160cm tall and make my pool of suspects even smaller (no pun intended).
So the more facts we know must be true of a subject in an investigation, the more criteria we can apply to eliminate incorrect answers and find the correct one. Boolean operators are the best method for doing this, either when conducting search queries or more generally when thinking through a problem or conducting something like gap analysis. It helps keep an investigation focused and stops people from guessing or just making things up.
In the rest of this post I will show how to apply this idea and construct a Boolean search term that will find the answer to a photo geolocation challenge.
Welcome to the #fridayquiz. Where was this video taken?
🤝 Reply all = collaboration
👍 Msg/reply to me = answers
🔃 Retweet = invite others
👣 Follow @quiztime = more #verification pic.twitter.com/6ymZoJiKcW
— Philipp Dudek 🕴 (@dondude) August 30, 2019
It’s important to listen to the sound as well as watching the video before deciding what criteria must be true of this location, and therefore what boolean search terms have to be used to find it. I should stress that this method is not the only way to solve this particular challenge, but if you understand the importance of the underlying logic can be it will be easier to apply to other scenarios if you get a little stuck.
Which Search Criteria To Use?
At first it is not entirely clear what this is. Philipp is moving along a tunnel of some kind. Could it a train tunnel, or an underground station? There is a rhythmic mechanical sound as he moves along – could it be a train, a lift, travelator or an escalator perhaps?
The only visually distinguishing feature is the dot pattern all over the wall. Is this enough to geolocate Philipp? It will be in the end, but the more parameters we can add, the more certain we can be. From previous quizzes we know Philipp lives in Germany, so we could add ‘Germany’ as one of the search parameters too.
So a search term that takes all these variables into account would look something like this. I’ve substituted the + symbol for AND, but it means the same thing.
Germany + tunnel OR escalator OR elevator + dots
This search combines criteria about the location that we can be fairly certain are true (Germany, dots) and then uses the OR operator to search for things that are likely to be true of the correct location, but where there is still a little doubt ( tunnel, elevator, escalator etc). The search result will show all matches that are true for these parameters, and therefore the correct result will be in there somewhere.
Google Images or Yandex?
The choice of search engine can make a difference when conducting boolean image searches of this kind. Usually I prefer Yandex for image searches because it works very well for finding visual matches and so is useful for geolocation. Google Image search is generally better at determining what kind of object the image shows, and then suggesting matches based on the relevant keywords.
Consider this recent image location challenge by Paul Fennell:
When I conduct a reverse image search with Yandex, it immediately finds a match for the photo:
However Google does not find a match for the image. Instead it works out that the image is of a statue, and then presents some search results which it thinks are relevant based on the fact that the object is a statue:
This is a significant way of presenting image search results. It’s why Yandex is usually more reliable for direct reverse image matches, but in this case we are going to do an image search based on a boolean keyword list, so Google Image search is a much better tool for finding the images we are looking for.
So entering the search term Germany + tunnel OR escalator OR elevator + dots gives the following results:
On the first page there is a strong visual match for the location in Philipp’s video. The Elbphilharmonie in Hamburg:
It looks like it could be the right location. If Philipp was filming as he went along this escalator, it would be a pretty good match. But is there a way to make sure? At this point it would be possible just to find lots of images of the Elbphilharmonie for a direct comparison, but in some cases this is not always possible and it becomes necessary to tweak the Boolean search a little. The initial parameter in the search was “Germany”. What happens if we tighten it up a bit and replace “Germany” with “Hamburg”?
Even better. By refining the parameters to check the hypothesis that the location is in Hamburg, there are now additional matches for the Elbphilharmonie. The first result is from a video that we can watch to verify that this is the same place that Philipp was travelling along. Perfect!
For more Quiztime related posts, click here.