Find Duplicate files using Vb.Net using MD5 Hash


This article is written by Pon Saravanan  on 08-Aug-09 Last modified on :04-Nov-09

Ads by Lake Quincy Media



Find Duplicate files using Vb.Net

Finding a duplicate file by file name may not be sufficient. Hence we need to compare by file data. The file might be renamed and may look like a different file. But when comparing with content of the file, we may notice that the file is a duplicate.

To find duplicate files even renamed, the content has to be compared after the content of files fetched. Once the file content is in data format, the data can be encoded with MD5 hash algorithm. The string result after hash can be used for comparing.

.NET Framework has very rich support for encrypting and decrypting, such as computing hashes and encrypting data using a variety of algorithms. Use the ComputeHash() method to compute the MD5 Hash. yes it can not be simpler than this.

This same function can be used in a recursive call to check all the duplicates. Once all the files in the directory are scanned and compared it is much easier to delete the duplicate files found.

Source Code

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        If (CompareFiles("f:\Velli Malare.mp3", "f:\Mercury poove.mp3")) Then
            MsgBox("duplicate")
        Else
            MsgBox("diff")
        End If
    End Sub
    Public Function CompareFiles(ByVal FirstFile As String, _
        ByVal SecondFile As String) As Boolean
        Return ReadFile(FirstFile) = ReadFile(SecondFile)
    End Function
    Private Function ReadFile(ByVal Path As String) As String
        Dim ReadFileStream As FileStream
        Dim FileEncoding As New System.Text.ASCIIEncoding()
        Dim FileReader As StreamReader
        Dim HashData As New MD5CryptoServiceProvider()
        ReadFileStream = New FileStream(Path, FileMode.Open)
        FileReader = New StreamReader(ReadFileStream)
        Dim FileBytes = FileEncoding.GetBytes(FileReader.ReadToEnd)
        Dim FetchedContent = FileEncoding.GetString(HashData.ComputeHash(FileBytes))
        FileReader.Close()
        ReadFileStream.Close()
        Return FetchedContent
    End Function









Comments

Comments
   
Captcha Image
For you specially:  
Captcha Text Enter the text in the image.(Not Case sensitive)