首页猿问 Java：如何不仅按名称而且按大小...

Java：如何不仅按名称而且按大小和内容搜索文件夹中的重复文件？

Java

慕斯王 2023-03-02 15:14:08

我想创建一个 Java 应用程序来识别重复项。到目前为止，我只能通过名称找到重复项，但我还需要大小、文件类型，也许还需要内容。到目前为止，这是我的代码，使用HashMap：public static void find(Map<String, List<String>> lists, File dir) { for (File f : dir.listFiles()) { if (f.isDirectory()) { find(lists, f); } else { String hash = f.getName() + f.length(); List<String> list = lists.get(hash); if (list == null) { list = new LinkedList<String>(); lists.put(hash, list); } list.add(f.getAbsolutePath()); } }}

查看完整描述

4 回答

白衣染霜花

TA贡献1796条经验获得超10个赞

我使用 MessageDigest 并检查了一些文件，并根据我在标题和描述中列出的所有标准找到了重复项。谢谢你们。

private static MessageDigest messageDigest;

static {

try {

messageDigest = MessageDigest.getInstance("SHA-512");

} catch (NoSuchAlgorithmException e) {

throw new RuntimeException("cannot initialize SHA-512 hash function", e);

}

这是在重复搜索代码中实现后的结果

public static void find(Map<String, List<String>> lists, File dir) {

for (File f : dir.listFiles()) {

if (f.isDirectory()) {

find(lists, f);

} else {

try{

FileInputStream fi = new FileInputStream(f);

byte fileData[] = new byte[(int) f.length()];

fi.read(fileData);

fi.close();

//Crearea id unic hash pentru fisierul curent

String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);

List<String> list = lists.get(hash);

if (list == null) {

list = new LinkedList<String>();

}

//Adăugați calea către listă

list.add(f.getAbsolutePath());

//Adauga lista actualizată la tabelul Hash

lists.put(hash, list);

}catch (IOException e) {

throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);

}

反对回复 2023-03-02

动漫人物

TA贡献1815条经验获得超10个赞

如果 2 个文件具有相同的扩展名和相同的文件大小，则认为它们相等，这只是创建一个代表这种“平等”的对象的问题。所以，你会做这样的事情：

public class FileEquality {

private final String fileExtension;

private final long fileSize;

// constructor, toString, equals, hashCode, and getters here.

}

（并填写所有缺失的样板文件：Constructor、toString、equals、hashCode 和 getter。如果您愿意，请参阅Project Lombok 的 @Value以简化此操作）。fileName.lastIndexOf('.')您可以使用和从文件名获取文件扩展名fileName.substring(lastIndex)。使用 lombok，您只需编写：

@lombok.Value public class FileEquality {

String fileExtension;

long fileSize;

}

然后使用FileEquality对象作为哈希图中的键而不是字符串。但是，仅仅因为你有，比如说，'foo.txt' 和 'bar.txt' 两者的大小恰好都是 500 字节并不意味着这 2 个文件是重复的。所以，你也想要涉及内容，但是，如果你扩展你的FileEquality类以包含文件的内容，那么会出现两件事：

如果您无论如何都要检查内容，大小和文件扩展名有什么关系？foo.txt如果和的内容bar.jpg完全相同，那么它们就是重复的，不是吗？何必。您可以将内容传达为 a byte[]，但请注意，编写适当的hashCode()和equals()实现（如果您想将此对象用作哈希映射的键，则需要这样做）变得有点棘手。幸运的是，lombok@Value会做对，所以我建议你使用它。
这意味着整个文件内容都在 JVM 的进程内存中。除非您正在检查非常小的文件，否则您将耗尽内存。您可以通过不存储文件的全部内容，而是存储内容的散列来稍微抽象一下。Google 关于如何计算 java 文件的 sha-256 散列。将此哈希值放入您的中FileEquality，现在您可以避免内存问题。理论上可能有 2 个文件具有不同的内容，但它们哈希到完全相同的 sha-256 值，但这种情况的可能性是天文数字，更重要的是，sha-256 的设计使得故意在数学上不可行制作 2 个这样的文件来扰乱您的应用程序。因此，我建议您只信任哈希 :)

当然，请注意，散列整个文件需要读取整个文件，因此如果您在包含 500GB 文件的目录上运行重复查找器，那么您的应用程序将至少需要读取 500GB，这将花一些时间。

反对回复 2023-03-02

偶然的你

TA贡献1841条经验获得超3个赞

我很久以前就做了这个应用程序，如果你想学习的话，我找到了它的一些源代码。

此方法通过比较两个文件字节来工作。

public static boolean checkBinaryEquality(File file1, File file2) {

if(file1.length() != file2.length()) return false;

try(FileInputStream f1 = new FileInputStream(file1); FileInputStream f2 = new FileInputStream(file2)){

byte bus1[] = new byte[1024],

bus2[] = new byte[1024];

// comparing files bytes one by one if we found unmatched results that means they are not equal

while((f1.read(bus1)) >= 0) {

f2.read(bus2);

for(int i = 0; i < 1024;i++)

if(bus1[i] != bus2[i])

return false;

}

// passed

return true;

} catch (IOException exp) {

// problems occurred so let's consider them not equal

return false;

}

将此方法与名称和扩展名检查结合起来，您就可以开始了。

反对回复 2023-03-02

慕码人8056858

TA贡献1803条经验获得超6个赞

复制粘贴示例

创建一个扩展类File

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;

import java.util.Arrays;

public class MyFile extends File {

private static final long serialVersionUID = 1L;

public MyFile(final String pathname) {

super(pathname);

}

@Override

public boolean equals(final Object obj) {

if (this == obj) {

return true;

}

if (this.getClass() != obj.getClass()) {

return false;

}

final MyFile other = (MyFile) obj;

if (!Arrays.equals(this.getContent(), other.getContent())) {

return false;

}

if (this.getName() == null) {

if (other.getName() != null) {

return false;

}

} else if (!this.getName().equals(other.getName())) {

return false;

}

if (this.length() != other.length()) {

return false;

}

return true;

}

@Override

public int hashCode() {

final int prime = 31;

int result = prime;

result = (prime * result) + Arrays.hashCode(this.getContent());

result = (prime * result) + ((this.getName() == null) ? 0 : this.getName().hashCode());

result = (prime * result) + (int) (this.length() ^ (this.length() >>> 32));

return result;

}

private byte[] getContent() {

try (final FileInputStream fis = new FileInputStream(this)) {

return fis.readAllBytes();

} catch (final IOException e) {

e.printStackTrace();

return new byte[] {};

}

读取基本目录

import java.io.File;

import java.util.HashMap;

import java.util.Iterator;

import java.util.List;

import java.util.Map;

import java.util.Map.Entry;

import java.util.Vector;

public class FileTest {

public FileTest() {

super();

}

public static void main(final String[] args) {

final Map<MyFile, List<MyFile>> duplicates = new HashMap<>();

FileTest.handleDirectory(duplicates, new File("[path to base directory]"));

final Iterator<Entry<MyFile, List<MyFile>>> iterator = duplicates.entrySet().iterator();

while (iterator.hasNext()) {

final Entry<MyFile, List<MyFile>> next = iterator.next();

if (next.getValue().size() == 0) {

iterator.remove();

} else {

System.out.println(next.getKey().getName() + " - " + next.getKey().getAbsolutePath());

for (final MyFile file : next.getValue()) {

System.out.println(" ->" + file.getName() + " - " + file.getAbsolutePath());

}

private static void handleDirectory(final Map<MyFile, List<MyFile>> duplicates, final File directory) {

final File dir = directory;

if (dir.isDirectory()) {

final File[] files = dir.listFiles();

for (final File file : files) {

if (file.isDirectory()) {

FileTest.handleDirectory(duplicates, file);

continue;

}

final MyFile myFile = new MyFile(file.getAbsolutePath());

if (!duplicates.containsKey(myFile)) {

duplicates.put(myFile, new Vector<>());

} else {

duplicates.get(myFile).add(myFile);

}

反对回复 2023-03-02

4 回答
0 关注
118 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Java：如何不仅按名称而且按大小和内容搜索文件夹中的重复文件？

Java：如何不仅按名称而且按大小和内容搜索文件夹中的重复文件？

4 回答

添加回答