Find Duplicate files using Vb.Net using MD5 Hash
This article is written by Pon Saravanan
on 08-Aug-09
Last modified on :04-Nov-09
Find Duplicate files using Vb.Net
Finding a duplicate file by file name may not be sufficient. Hence we need to compare by file data. The file might be renamed and may look like a different file. But when comparing with content of the file, we may notice that the file is a duplicate.
To find duplicate files even renamed, the content has to be compared after the content of files fetched. Once the file content is in data format, the data can be encoded with MD5 hash algorithm. The string result after hash can be used for comparing.
.NET Framework has very rich support for encrypting and decrypting, such as computing hashes and encrypting data using a variety of algorithms. Use the ComputeHash() method to compute the MD5 Hash. yes it can not be simpler than this.
This same function can be used in a recursive call to check all the duplicates. Once all the files in the directory are scanned and compared it is much easier to delete the duplicate files found.
Source Code
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
If (CompareFiles("f:\Velli Malare.mp3", "f:\Mercury poove.mp3")) Then
MsgBox("duplicate")
Else
MsgBox("diff")
End If
End Sub
Public Function CompareFiles(ByVal FirstFile As String, _
ByVal SecondFile As String) As Boolean
Return ReadFile(FirstFile) = ReadFile(SecondFile)
End Function
Private Function ReadFile(ByVal Path As String) As String
Dim ReadFileStream As FileStream
Dim FileEncoding As New System.Text.ASCIIEncoding()
Dim FileReader As StreamReader
Dim HashData As New MD5CryptoServiceProvider()
ReadFileStream = New FileStream(Path, FileMode.Open)
FileReader = New StreamReader(ReadFileStream)
Dim FileBytes = FileEncoding.GetBytes(FileReader.ReadToEnd)
Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(FileBytes))
FileReader.Close()
ReadFileStream.Close()
Return FetchedContent
End Function