Posts

Showing posts with the label mining

Accelerating cryptocurrency Mining with Intel ISPC

Daniel Lemire  and Maxime Chevalier  just had a great exchange on Twitter  about the state of compilers and being able to automatically vectorize code, such as a scalar product.  Of course, hoping the compiler can correctly vectorize your code can be a bit fragile, and as Maxime points out, writing raw intrinsics results in an unreadable pain in the editor, and a GPU-style implementation in CUDA or OpenCL might be more scalable and maintainable. A few years ago, some folks at Intel wrote a compiler called ISPC, the Intel SPMD Program Compiler .  A possibly unfairly simple way to describe ISPC is that it's OpenCL-style programming for x86 vector units.  You can write code that looks like this: export uniform float scalar_product(uniform float a[],                                     uniform float b[],                ...

Minting Money with Monero ... and CPU vector intrinsics

Image
I woke up on May 28th, 2014, on vacation with my family in the middle of the desert, to find a copy of my private source code plastered across the bitcointalk message board.  Announced as a "new optimized version" of the Monero currency miner, it was enthusiastically adopted by cryptocurrency miners across the world.  And in the process of doing so, my daily profit from the Monero Mining Project dropped by over five thousand dollars per day. But let's start at the beginning, when I started getting in to a loose collaboration with three people I've never met---one whose name I'm not even confident I really know---with hundreds of thousands of dollars worth of a nebulous new cryptocurrency at stake. It started with a cryptic note from someone I'd met online, with a link to a bitcointalk.org message board discussion for a new currency called "bitmonero".  His note said only:  "this looks interesting." From prior collaborations with him...

A Public Review of Cuckoo Cycle

In the ongoing experiment that is crypto-currencies, several "Proof-of-Work" functions are commonly used as mechanisms to decentralize control of the currency.  These PoW functions, as they are typically referred to, require a participant to prove that they expended (in expectation) a certain amount of computational effort, such as finding a partial collision in a cryptographically strong hash function.  Many have been introduced to counter the gradual ASIC-ificiation of the dominant PoW, partial collisions in SHA256, used in Bitcoin.  I've discussed several in prior posts that try to require random memory accesses in order to favor general-purpose CPUs over specialized devices such as GPUs or ASICs. In this post, I'll take a look at one called Cuckoo Cycle that combines two of my interests:  Cuckoo Hashing and memory-hard PoW functions.  Its author and a few other people asked me to take a look, and it finally seemed like enough people were looking at it t...

Fast prime cluster search - or building a fast Riecoin miner (part 1)

Image
Introduction In the wake of Bitcoin's success, many "alternative" cryptocurrencies, or "alt-coins" have emerged that change various aspects of the coin in an attempt to improve upon Bitcoin.  (We'll ignore the many that have emerged with the goal of simply duplicating it to put money in their creators' pockets). I've been particularly interested in variants that change something with the "proof-of-work" function.  This sits at the heart of Bitcoin's decentralization:  If a node in the Bitcoin network wishes to be the one permitted to sign a block of transactions, and receive payment for doing so, it must prove that it did a certain amount of computational effort.  This mechanism spreads the authority for the transaction history over all of the participants in the protocol in rough proportion to their computational horsepower.  The goal of this mechanism is to ensure that no one entity can control the currency without spending an eno...

Blockchain explorations: The Riecoin Miner Arms Race

Image
I've continued to stay interested in alt-currencies, and, recently, a new one caught my eye:   Riecoin .  Named after Riemann, the proof-of-work in this currency is finding dense clusters of primes starting at a number n , where there are six primes "in a row":   n, n+4, n+6, n+10, n+12 , and n+16 .   To solve a block, you do some hashing of the block contents just as in Bitcoin, and use it to generate a "target" number.  The proof-of-work is then to find a prime chain within a certain numerical distance above the target number. I plan on a more technical post about the process of optimizing the search for these (you can see some discussion of it on the Riecoin bitcointalk thread if you really want).  It's quite fun, but at this point, it still boils down to relatively straightforward prime sieving for huge numbers (1200-1300 bits). There's one very important thing to understand about such sequences, though, before going further:  They only appear ...

Gaining Momentum: Duplicate Detection in CUDA for better mining

I've been having fun finding "memory-hard" proof-of-work functions and engineering better solutions to solving them, with a focus on doing so for Nvidia's cards.  (Why?  Because I have some of them, because CUDA's more fun to program in, and because Nvidia's cards are widely regarded as the underdog in the crypto-currency game because of their slower shift and rotate instruction throughput.  It's fun to have a challenge and it's nice to use good tools.) I recently stumbled across a new one, called "Momentum", which requires as its proof of work finding a collision in a 50-bit hash function, given a 26-bit nonce space to play in for each attempt.  In other words, to mine a Momentum coin:   for i = 0; i < 2^26; i++ {     h = hash(i)     noncelist = nonces[h]     noncelist = append(noncelist, i)     nonces[h] = noncelist    }   for h, noncelist := range nonces {     if len(noncelist) ...

Scrypt mining changes incorporated into CudaMiner

Just a quick followup to the previous two posts about scrypt-based mining on Nvidia GPUs :  The improvements I made have been incorporated (extremely rapidly!) into the existing full-featured mining client CudaMiner . I'm leaving my keplerminer code up on Github , because it serves my pedagogical purpose of helping to explain the parallelization of scrypt more simply than having to understand a more complex miner codebase, but for most people who just want code that runs fast now, go grab CudaMiner and enjoy.  Kudos to  +Christian Buchner  for getting it all incorporated so quickly, which required updating the rest of the CudaMiner codebase to CUDA 5.5. Some quotes from folks on the bitcointalk.org message board: " With a 670 GTX and the same settings as before, I bumped from 160ish khash to 190 khash" " Went from 152 per card to 212!!! Actually uses the max power limit now" ( GTX 660) " Looks like my GTX580 doesn't like the latest build"...

Inside a better CUDA-based scrypt miner

In my previous post , I discussed how I'd written a more-efficient NVidia-based scrypt coin miner and took advantage of the competitive advantage it conferred to (briefly) mine profitably on Amazon EC2 instances.  In today's post, I'll break down the algorithmic and engineering details of that improved mining.  You can read along in the code that I've released on github . Some terms:  GPUs are big vector processors, but they can, at high expense, let the individual items in the vector "diverge" and take different paths through the code.  As a result, NVidia refers to this as a "CMT" machine:  Concurrent Multi-Threading.  The "Kepler" architecture GPUs my code targets execute 32 threads at a time (in groups of 192 in total) on a single vector unit. The ideal candidate GPU code looks something like:   process_vector(vec) {      for i := 0; i < vecsize; i++ {         do_something_expensive(vec[i])   ...