Skip to content

Commit f5d77a3

Browse files
refactor 609
1 parent 20b5e9c commit f5d77a3

File tree

1 file changed

+20
-0
lines changed
  • src/main/java/com/fishercoder/solutions

1 file changed

+20
-0
lines changed

src/main/java/com/fishercoder/solutions/_609.java

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,4 +80,24 @@ public List<List<String>> findDuplicate(String[] paths) {
8080
return result;
8181
}
8282

83+
/**Answers to follow-up questions:
84+
* 1. Imagine you are given a real file system, how will you search files? DFS or BFS ?
85+
* A: Both BFS and DFS could do the work, but BFS will use extra memory, however, BFS takes advantage of memory locality, so BFS could be faster.
86+
*
87+
* 2. If the file content is very large (GB level), how will you modify your solution?
88+
* A: We'll fist map all files according to their sizes, since files with different sizes are guaranteed to be different, then
89+
* we can hash a small part of the files using MD5, SHA256, etc. Only when their md5 or sha256 is the same, we'll compare the contents byte by byte.
90+
*
91+
* 3. If you can only read the file by 1kb each time, how will you modify your solution?
92+
* A: This is not going to change the solution, we can hash this 1kb chunk, and then also only compare byte by byte when it's necessary.
93+
*
94+
* 4. What is the time complexity of your modified solution? What is the most time consuming part and memory consuming part of it? How to optimize?
95+
* A: O(n^2*k), in the worst time, we'll have to compare the file with every other file, k is the length of the file.
96+
* Comparing the file (by size, by hash and eventually byte by byte) is the most time consuming part.
97+
* Generating hash for every file will be the most memory consuming part.
98+
*
99+
* 5. How to make sure the duplicated files you find are not false positive?
100+
* A: Size comparision, hash detection, byte by byte check, etc. will pretty sure to rule out false positive.
101+
* */
102+
83103
}

0 commit comments

Comments
 (0)