Bay Area Computer Forensics Expert, Investigator & Witness
  • Home
  • Services
    • CLE
    • Intellectual Property Issues
    • Civil Litigation
    • Criminal Defense
  • About Us
    • Jon Berryhill
    • Katie Berryhill
    • Clients
    • Client Testimonials
  • FAQ
    • Hiring A Computer Forensics Expert
    • Resources
  • News
  • Contact
  • Home
  • Services
    • CLE
    • Intellectual Property Issues
    • Civil Litigation
    • Criminal Defense
  • About Us
    • Jon Berryhill
    • Katie Berryhill
    • Clients
    • Client Testimonials
  • FAQ
    • Hiring A Computer Forensics Expert
    • Resources
  • News
  • Contact

News & Computer Forensics Blog

Author Jon Berryhill

Computer Forensics Investigative Expert and Certified Expert Witness for Military, State and Federal Courts

What is a Hash Value?

7/15/2019

12 Comments

 
What is a hash value?
​By Jon Berryhill

If you’
ve encountered a matter involving computer evidence, you may have heard the term “hash value” and wondered what in the world a hash value is. A hash tag “#” (otherwise known as the pound symbol or, originally, an octothorpe), brought to you by Twitter in 2007, is not what this post is about. A hash value and a hash tag are two completely different things. Let’s take a quick dive into this somewhat esoteric term for a critical tool.
 
A hash value is a common feature used in forensic analysis as well as the cryptographic world. The best definition I’ve seen is that a hash is a function that can be used to map data of an arbitrary size onto data of a fixed size. The word “function” is used in its truest form from mathematics. The hash value is the result of the function. Standard hash algorithms are sets of complex but public mathematical steps. There is nothing secret about them.
 
Some people equate a hash value to a fingerprint. It provides a way of identifying and verifying a chunk of digital data. You can have a hash value for a single file, groups of files, or even an entire hard drive. A hash value is a harmless looking string of hexadecimal values, generally 32 to 64 characters long, depending on the hash algorithm used. There is absolutely nothing in a hash value that will tell you anything about what was hashed or how big it was. The way the algorithms work, the length of the hash value is always the same no matter the quantity of the data processed.
 
So what do they look like?

f5fbace98ed8829dc705191f18321d18 C:\TEMP\file-110738171218L001.pdf
935a569281046198ec9256da83b5fcd4 C:\TEMP\file-110739171218L001.pdf
d852a07c1a3065d42be9b119fd92091e C:\TEMP\file-110751171218L002.pdf
eac04333af784bc2094d55bd0b233173 C:\TEMP\file-110766171218L001.pdf
76f5af6dc1a97facc1f830d7a66cfd35 C:\TEMP\file-144727171111L001 (1).pdf
76f5af6dc1a97facc1f830d7a66cfd35 C:\TEMP\file-144727171111L001 (2).pdf
76f5af6dc1a97facc1f830d7a66cfd35 C:\TEMP\file-144727171111L001.pdf
Above are the computed hash values for 7 files. Note that the last 3 files have different names but the hash values match. The content of these 3 files is exactly the same. In this case the hash values were computed with a standard algorithm called MD5 (the “MD” is short for Message Digest, the “5” is a version number).
 
The same files can be processed with the SHA256 algorithm and the results look like this.
a23e46b2e341d2b9f9bf291a67c9e207c70d796d70d0c6973cf46b0c2156f5ee C:\temp\file-110738171218L001.pdf
285aea0e4e4605f28c89ea20253456e98c5fb999d3988084b8ad1ed82f36fb2e C:\temp\file-110739171218L001.pdf
62e0c4e16b9ed0d23354a9973783958bb93fd3d93524fa5f49ee88663d086ba2 C:\temp\file-110751171218L002.pdf
4985619b30c4ef8dd100cc76810d50dbed9e2ee568281a843b49f75812730420 C:\temp\file-110766171218L001.pdf
95df48581de075511e44aceb2417a0cc125c593dfbc904fcb9ceaa3fefbd30c5 C:\temp\file-144727171111L001 (1).pdf
95df48581de075511e44aceb2417a0cc125c593dfbc904fcb9ceaa3fefbd30c5 C:\temp\file-144727171111L001 (2).pdf
95df48581de075511e44aceb2417a0cc125c593dfbc904fcb9ceaa3fefbd30c5 C:\temp\file-144727171111L001.pdf
The hash value has nothing to do with the name of a file and different hash algorithms produce different hash values even when processing the same files. Just a hash value by itself is useless without identifying which hash algorithm was used to create it.

How are hash values used?

In the forensic analysis community, if I provide a copy of a forensic image file set to another examiner, I also provide the hash value associated with it. The other examiner can compute the hash value for what they received and compare that to the provided hash value. If they match, we know that we are both looking at exactly the same thing. If the hash values don’t match, we know that something is different. The hash value provides no clues as to what is different.
 
In the security and cryptographic community, a system does not store your password. It stores a computed hash value of your password. If someone is trying to break into your account, it is exceedingly complex for someone to come up with a password that results in the same hash value as your password. The hash values of passwords don’t really need the same level of protection as the actual passwords. In real terms you simply cannot reverse engineer a password from a given hash value. 
 
All that being said, some hash algorithms are more secure than others. In a lab setting, the MD5 hash has been “cracked.” It is possible, with a modest amount of computing power, to create two files that are different that result in the same MD5 hash value. This is what is called a hash collision. I know of no instance of a hash collision in the “wild.” That’s not to say the MD5 algorithm is useless. You simply have to understand its appropriate uses and limitations.
 
One of the common uses of hash values in the forensics and law enforcement communities is in child pornography cases. Law enforcement maintains a database of hash values of known child pornography. This way they can share the hash values without having to share, transport or otherwise handle actual contraband material. An examiner can use tools to search seized evidence for files that have matching hash values. If there is a match the examiner can further examine the highlighted file. The benefit is that an examiner can automate much of the otherwise very tedious and time-consuming process of reviewing what could be millions of pictures or videos on a computer when searching for contraband. It’s not a perfect solution. It can miss contraband items, but it does save a lot of time and resources. There isn’t a danger of someone being arrested for a false positive because no case is made on just a matching hash value. Someone still has to look at any matches and decide if it is a valid hit or not. It’s just a tool.
 
Similarly, there are hash value sets of known files that can be used to filter out otherwise known or uninteresting files among groups of millions of files, so an examiner can focus on the unique data.
 
There are many other uses of hash values in both the forensic and cryptographic communities, but these examples should give you an idea of some of what is going on the next time you hear “hash value” in reference to an item of digital evidence.
12 Comments
Rebecca Gardner link
10/19/2020 11:45:02 am

Thanks for explaining that a hash value is like a fingerprint in that it identifies and verifies digital data. My brother might have to find a computer forensics service to investigate the behavior of an employee who was recently fired. I'll have to pass along this info in case it's helpful to him when working with a forensics service soon.

Reply
Steve Adams link
10/22/2020 09:45:33 am

Thanks for sharing this information. It is quite important for ensuring quick and efficient discovery of evidence in cybercrime .

Reply
Curious
10/31/2021 11:26:38 am

I am here because I am curious, hexadecimal has finite limits on variables used to quantify its values. I believe a complete destruction of modern cryptography is approaching. Once this becomes reality we will have to use quantum cryptography. But seeing as that only utilizes a set of defined variables for its value, tis only a matter of time. even though time is not matter ; )

Reply
Click here link
12/16/2021 12:10:08 pm

Can be thought of as fingerprints for files. The contents of a file are processed through a cryptographic algorithm, Thank you for the beautiful post!

Reply
Woodbridge link
12/28/2021 12:07:46 pm

An examiner can use tools to search seized evidence for files that have matching hash values.Thank you for sharing your great post!

Reply
Cat
1/9/2022 02:17:50 am

Very helpful for my security class. Thank you.

Reply
Nandha Gopal
4/5/2022 04:30:16 am

nice explanantion thanks for a detailed thing to make us better understanding the hash value

Reply
wondering link
6/8/2022 02:20:14 am

Hi There,

I'm wondering if there are hash programs that a lay person to forensics can use to try and find a particular video or something like that? Also, will photos or videos taken around the same time period and on the same device have the same hash value or similar enough that it will come up as a potential match when doing the search? Thank you

Reply
Jon Berryhill
6/8/2022 09:56:54 am

A change of a single byte in a file will totally change the hash value. There can be no comparison to similar files based on hash values. Either they match and its the same file or it doesn't. If you want to look for similar files there are other tools to employ that do not involve hashing.

Reply
Wondering
6/8/2022 10:04:38 am

Thank you for the information and for getting back so quickly! I'm going to email you here shortly to see about getting a consultation appointment with you.

Dr Peter
6/30/2022 07:13:27 am

Thanks. It seems to me that putting a downloaded image file through a local "edit" operation would alter its hash value, by whatever hash method. Presumably an automatic batch-edit method could be devised for ask downloaded files. How could you compensate for this? Best wishes.

Reply
Jon Berryhill
6/30/2022 11:18:15 am

It is the case that any edit to a file will change its hash value. In the three broad categories of uses of hash values 1) if being used for authentication purposes you wouldn't want to change the hash value so this is not something that's going to happen 2) if it's being done in some attempt to hide the file from being discovered via a hash matching search, the file has already been downloaded and most of the hash search matching is done on the Internet side so changing the value after it's been downloaded won't make any difference 3) when broad hashing is done for investigative purposes it is just one tool of many that can be used in an attempt to streamline the analyst's work. We have many other tools and techniques the use of which go beyond the time and space available here on this topic.

Reply



Leave a Reply.

demonstrated experience . proven results


Home

About

Services

Contact

Berryhill Computer Forensics, Inc.   TX 6-853-249  All Rights Reserved.
Text and content on this site may not be used without written permission.
Copyright © 1997-2023