Welcome Guest ( Log In | Register )

> Hashing in DC++
Grey
post Mar 26 2010, 06:20 PM
Post #1
Player

Posts: 4
Joined: 21-June 08
Next event:



I downloaded DC++ today to poke around in it in preparation for the next respawn and holy shit hashing takes a long time. So this post is a warning, do not leave hashing until the last minute! I'll be sharing roughly 1.1 terabytes of 'legit backups' and it's predicting hashing with be done in...75 hours!

Just a word of caution happy.gif

-Grey
Profile CardPM
Go to the top of the page
+Quote Post
2 Pages V   1 2 >  
Start new topic
Replies (1 - 21)
Alioth
post Mar 26 2010, 07:36 PM
Post #2
Player

Posts: 233
Joined: 28-June 07
From: Geelong
Next event:



So I was going to make some smart-arse reply to this thread regarding the topic of hashing as it has been thoroughly covered in another thread when to my surprise, said thread has seemed to disappeared. I'm talking about the big long one with the debate on which version of DC++ to use and the whole hashing thing. Something about DC++ and hashing should probably go in Tank's awesome Jumbo Q & A thread if the lords and ladies of Respawn do wish it biggrin.gif

TL;DR Yes we know about hashing. Use version 0.674 or higher plz.
Profile CardPM
Go to the top of the page
+Quote Post
deano
post Mar 26 2010, 08:49 PM
Post #3
Player

Posts: 47
Joined: 13-September 09
From: Regional Victoria
Next event:



Alioth: We all do not know about hashing.

If I see one person hashing at the event, you owe me pizza. Deal?
Profile CardPM
Go to the top of the page
+Quote Post
Alioth
post Mar 26 2010, 08:58 PM
Post #4
Player

Posts: 233
Joined: 28-June 07
From: Geelong
Next event:



QUOTE (deano @ Mar 26 2010, 09:49 PM) *
Alioth: We all do not know about hashing.

If I see one person hashing at the event, you owe me pizza. Deal?


I owe you a kick in the nuts! biggrin.gif
You didn't read my all my post, did you?...

----------------
Now playing: Bluejuice - Hunnamunnafeeb
Profile CardPM
Go to the top of the page
+Quote Post
R4N
post Mar 26 2010, 09:08 PM
Post #5
Hueg.

Posts: 634
Joined: 4-June 07
From: In a packing crate till next respawn.
Next event:



The DC++ thread dissapeared for a reason, feel free to discuss the various methods of hashing, but if this thread delves IN ANY WAY into illegal file sharing, i delete it

fair warning smile.gif


--------------------

Voodoo: i5 3570k, Asrock Z77, 16GB DDR3, Ati 280x 3GB, 120GB 520 SSD + 2x 2TB, Lian-Li PC A04, Win8.1.
Banshee: N2820 Intel NUC, 4GB DDR3l, 60GB Kingston V300 SSD, 1080P Projector :3
Rush: HP Microserver, 8GB DDR3, 4x 2TB
Profile CardPM
Go to the top of the page
+Quote Post
Grey
post Mar 27 2010, 01:47 AM
Post #6
Player

Posts: 4
Joined: 21-June 08
Next event:



Well well, it seems poor predictive computing has struck again. After coming home from a night out, I found the hashing complete, only a few hours after I posted. I looked around for a DC++ thread and there are quite a few links to dead forum posts but I couldn't for the life of me find an active post. All in all, hashing takes only a few, about 5ish hours, to do for ~1tb across multiple drives and gogo sticky DC++ topic

-Grey
Profile CardPM
Go to the top of the page
+Quote Post
Chopz
post Mar 27 2010, 08:20 AM
Post #7
Player

Posts: 49
Joined: 24-February 09
Next event:



Hashing all depends on the speed of your computer. for instance my 1.4tb took about 2 hours, Please use .674 - it hashes, .;750 doesnt and therefore it cant be taken from multiple computers. with hashed files, if someone else has the same file and they are both hashed, dc will download it from both places and give you a faster speed.
Profile CardPM
Go to the top of the page
+Quote Post
Redback
post Mar 27 2010, 08:56 AM
Post #8
Player

Posts: 87
Joined: 30-September 07
Next event:



I usually just leave dc++ running, so my shares are always hashed


--------------------
Profile CardPM
Go to the top of the page
+Quote Post
Chopz
post Mar 27 2010, 04:25 PM
Post #9
Player

Posts: 49
Joined: 24-February 09
Next event:



BTW dont share your folder that you are receiving files on, it makes your computer run slow as you will always be hashing.
Profile CardPM
Go to the top of the page
+Quote Post
Andrewzor
post Mar 27 2010, 04:31 PM
Post #10
Player

Posts: 44
Joined: 2-July 09
From: Wantirna South
Next event:



Theres always bound to be someone who forgot to hash at the lan sad.gif I always do it a week or two before the lan, I just leave it on overnight and its usually done.

Yay smile.gif


--------------------
Flawless Victory
Profile CardPM
Go to the top of the page
+Quote Post
Redback
post Mar 27 2010, 05:37 PM
Post #11
Player

Posts: 87
Joined: 30-September 07
Next event:



QUOTE (Chopz @ Mar 27 2010, 05:25 PM) *
BTW dont share your folder that you are receiving files on, it makes your computer run slow as you will always be hashing.


dc is supposed to remember the hash of the files it downloads


--------------------
Profile CardPM
Go to the top of the page
+Quote Post
priorax
post Mar 27 2010, 06:03 PM
Post #12
TF2 Comp Guy

Posts: 490
Joined: 23-August 09
From: Niddrie - Melb
Next event:



QUOTE (Chopz @ Mar 27 2010, 09:20 AM) *
Please use .674 - it hashes, .;750 doesnt


Umm, from my experience with .75 it hashes, it's the builds before .674 that don't
Profile CardPM
Go to the top of the page
+Quote Post
oohms
post Mar 27 2010, 07:13 PM
Post #13
Server/Comp Gopher

Posts: 302
Joined: 13-June 07
Next event:



673, which is 674 without an annoying CPU usage bug, is the best version to use, and can get files off people with both 306 era and 0.7xx clients.

If you don't have time to hash, or something breaks, you can use 306, and you only miss out on people with the new clients (But can still leech off people with 673/674)

(0.7xx users can't leech or share with 0.306 and vice versa, but 673/674 does both)
Profile CardPM
Go to the top of the page
+Quote Post
Maelstrom
post Mar 27 2010, 08:16 PM
Post #14
Player

Posts: 117
Joined: 17-June 09
Next event:



I upgraded to 0.761

I figure most people have worked out by now to pre-hash tongue.gif

Plus it has some nice features, like being able to grab parts of the same linux iso from multiple people biggrin.gif


--------------------
"The Maelstrom is a dark force that seeks the destruction of all imagination"
Profile CardPM
Go to the top of the page
+Quote Post
oohms
post Mar 28 2010, 10:29 AM
Post #15
Server/Comp Gopher

Posts: 302
Joined: 13-June 07
Next event:



QUOTE (Maelstrom @ Mar 27 2010, 09:16 PM) *
I upgraded to 0.761

I figure most people have worked out by now to pre-hash tongue.gif

Plus it has some nice features, like being able to grab parts of the same linux iso from multiple people biggrin.gif


There are still a lot of people on 306.. so i guess you'll have to outweigh the new features vs. what variety of linux ISOs you want biggrin.gif
Profile CardPM
Go to the top of the page
+Quote Post
Redback
post Mar 28 2010, 04:49 PM
Post #16
Player

Posts: 87
Joined: 30-September 07
Next event:



Should ban non hashing clients from the hub imo.

Hash or die.


--------------------
Profile CardPM
Go to the top of the page
+Quote Post
wolfmother
post Mar 28 2010, 04:54 PM
Post #17
Blame him.

Posts: 171
Joined: 14-April 07
Next event:



QUOTE (oohms @ Mar 27 2010, 08:13 PM) *
673, which is 674 without an annoying CPU usage bug, is the best version to use, and can get files off people with both 306 era and 0.7xx clients.

If you don't have time to hash, or something breaks, you can use 306, and you only miss out on people with the new clients (But can still leech off people with 673/674)

(0.7xx users can't leech or share with 0.306 and vice versa, but 673/674 does both)

0.401 will also read both styles of file list but won't hash, but I believe it has that CPU usage bug.


--------------------
http://www.prolapsoft.com - Wolfmother's AV & Net Software
Profile CardPM
Go to the top of the page
+Quote Post
wolfmother
post Mar 28 2010, 04:55 PM
Post #18
Blame him.

Posts: 171
Joined: 14-April 07
Next event:



QUOTE (Redback @ Mar 28 2010, 05:49 PM) *
Should ban non hashing clients from the hub imo.

Hash or die.

To be honest, if anything, we should ban hashing clients. None of the advantages that it gives on the internet are applicable to LANs. I haven't yet seen a single hardcore proponent of hashed-only clients that fully understands why you want to hash in the first place.


--------------------
http://www.prolapsoft.com - Wolfmother's AV & Net Software
Profile CardPM
Go to the top of the page
+Quote Post
Redback
post Mar 28 2010, 05:40 PM
Post #19
Player

Posts: 87
Joined: 30-September 07
Next event:



If you are trying to get something off a busy sharer, you can often get some of it from someone else. End result is the busy sharers are less busy, spreading the load somewhat.


--------------------
Profile CardPM
Go to the top of the page
+Quote Post
crenn
post Mar 28 2010, 07:14 PM
Post #20
Player

Posts: 31
Joined: 24-July 09
Next event:



QUOTE (Redback @ Mar 28 2010, 06:40 PM) *
If you are trying to get something off a busy sharer, you can often get some of it from someone else. End result is the busy sharers are less busy, spreading the load somewhat.

*selects users using 0.304-0.678, right clicks and selects match queue* Don't know what you mean
Profile CardPM
Go to the top of the page
+Quote Post
wolfmother
post Mar 28 2010, 08:12 PM
Post #21
Blame him.

Posts: 171
Joined: 14-April 07
Next event:



Wolfmother's big DC++ hashing writeup
Background - What is hashing?
Hashing is a way of uniquely identifying the contents of any file. After processing the entirety of the file's contents, DC++ will spit out a small piece of text that can be used as a "fingerprint" for that file. This allows a lot of neat things like being able to verify the integrity of a file or being able to find other computers on the network that also have that file. However, it is not without its downsides.

The primary problem is the fact that it takes a long time to process large amounts of files, often up to days for large shares. The other is that since hashing is very important on the internet, the developers of DC++ intentionally removed compatibility with non-hashing clients in newer versions. This becomes a problem, as often someone comes to a LAN without hashing or knowing that they should hash, and cannot share their files with people using the newer versions until they spend a few days hashing their files (by which time the LAN is often over).

About TTH
DC++ uses a method of hashing called TTH - Tiger Tree Hashing. Tiger is the name of the hashing algorithm used; it's optimized for high performance on desktop systems and while it doesn't have as large a margin of security as many other popular systems, it has not been broken and there is no reason to believe it'll be broken in the near future. Instead of hashing the whole file in one go, DC++ will hash small (64kb) pieces of the file and then hash the hashes together until it comes up with a final result. This takes about 10% longer than just hashing the file outright, but gives the advantage that it's possible to identify an incomplete part of a file.

However, keeping track of all of those hashes (there can be tens of thousands for a high definition video file) uses a substantial amount of memory, and every hash lookup request means looking through this. This file is saved as HashData.dat and will generally grow at a rate of about 50-60mb per terabyte shared. It must be loaded into memory when DC++ is launched and stay there, and a lot of important operations like searching, downloading or uploading will require it to be accessed frequently. This can hamper performance somewhat on systems with slower CPUs or small amounts of RAM.

How long will it take?
The amount of time it takes to hash files depends on the amount to be shared and the hardware your system is running. The read speed (from the hard disk) depends on the speed of your hard disk(s), hard disk controller(s), and motherboard. The hashing speed depends on the speed of your CPU and RAM. A rule of thumb I use for my server is about 6 hours per terabyte, but it can vary wildly depending on your setup.

DC++ has been designed to hash on a single core at a time, so it is not uncommon to see one core being used 100% while hashing in the Task Manager. This is normal and you can't really improve performance much more if this is what you're seeing.

Why it's good on the internet
Hashing has a lot of advantages on the internet:
  • Fake files are much less of a problem.
  • You can download from multiple users at once without the risk of one of their copies being corrupt (even if they have different filenames).
  • Resuming incomplete downloads, particularly from different users


Why LANs are very different to the internet
  • Fake/dodgey files are extremely uncommon at a LAN (compared to the total share size) and you can generally download a different copy in seconds anyway.
  • Bandwidth is much higher, so there is nowhere near as much unique data on the network (meaning that there are probably only a handful of unique copies of a given file but a few dozen mirrors of each with filenames intact, meaning you can just Match Queue and it'll probably work better than a hash request)
  • At any given time, a substantial number of people will be hitting the network's transfer limit*. Unless the person you're downloading off has a crapton of slots you'll be hitting that bottleneck long before they max out their link.
  • Total sharesizes and data throughput are much, much higher, meaning that those tiny little CPU and memory hits that hashing causes suddenly become much, much bigger.
  • Bandwidth is so high that verifying and resuming incomplete files is usually a waste of time; it's just about quicker to just download it again from scratch.
  • And of course, there's the processing period at the start.


That said, those advantages are not completely moot at a lan; they just become no longer worth the downsides. DC's approach to hashing is quite hamfisted and not suited to the environment at all.

*This is part of how the network is laid out; each table only gets a few gigabits to the core.
Version compatibilities
There are two types of file list; hashed, and non-hashed (they're actually different formats). However, actually choosing a version of DC++ is not that simple: there are some versions that can read one but not the other, and there is no version which will generate both. If someone wants to download a hashed file list from you and your version only generates a non-hashed file list, they won't be able to see your files and thus can't download from you. However, it's possible to be running a version which will readtheir hashed file list even if it doesn't generate one, in which case you can download from them.

DC++ is open source, so there are a lot of variations of it and variations of those variations available. Without getting into the complexities of ApexDC, StrongDC, IceDC etc etc what really matters is the version of DC++ they're based on. Here's a list of the major versions of DC++ with the versions worth using in bold:

  • Before .306 - generates only non-hashed, reads only non-hashed.
  • .306 - generates only non-hashed, reads only non-hashed.
  • .401 - generates only non-hashed, reads both hashed and non-hashed. Has a bug involving large file sizes.
  • .673 - generates only hashed, reads both hashed and non-hashed.
  • .674 - generates only hashed, reads both hashed and non-hashed. Has a CPU usage bug.
  • After .674 - generates only hashed, reads only hashed.


Also, there's a version called LANDC++ which I believe is based on a fairly recent version of IceDC++. It generates a file list in the same format as a hashed file list, but without the hashes. Since no version of DC++ will read such a list, it's not compatible with any of the above versions. Avoid.


Also
No, hash requests do not put any extra load on the network. Also you should have a minimum of one upload slot per hard disk plus an extra; any less and you'll make baby jesus cry.


--------------------
http://www.prolapsoft.com - Wolfmother's AV & Net Software
Profile CardPM
Go to the top of the page
+Quote Post
R4N
post Mar 28 2010, 08:50 PM
Post #22
Hueg.

Posts: 634
Joined: 4-June 07
From: In a packing crate till next respawn.
Next event:



I think thats enough of an explaination to keep everyone informed tongue.gif

case closed


--------------------

Voodoo: i5 3570k, Asrock Z77, 16GB DDR3, Ati 280x 3GB, 120GB 520 SSD + 2x 2TB, Lian-Li PC A04, Win8.1.
Banshee: N2820 Intel NUC, 4GB DDR3l, 60GB Kingston V300 SSD, 1080P Projector :3
Rush: HP Microserver, 8GB DDR3, 4x 2TB
Profile CardPM
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Closed TopicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:

 

Lo-Fi Version Time is now: 19th June 2025 - 12:05 AM