Batch ECDSA in SNARKs

September 12, 2022

Batch ECDSA in SNARKs

by John Guibas and Uma Roy. With gratitude to Yi Sun for writing the original spec, Jonathan Wang for circuit design help, and the Optimism & Arbitrum teams for fruitful discussions.

We present circom-batch-ECDSA (link to repo), a proof-of-concept implementation building on top of circom-ECDSA (previous work done by others in the 0xPARC community) with inspiration from halo2-batch-ECDSA that allows for dramatically faster verification of a batch of ECDSA signatures in a single SNARK.

Introduction

ECDSA signatures are widely-used cryptographic primitive that allows for one to be convinced that a message could have only come from someone who owns a specific public key. In the context of blockchains, ECDSA signatures are often used to verify that submitted transactions came from whoever owns the account emitting the transaction.

The circom-ECDSA project by 0xPARC implements ECDSA signature verification inside a zkSNARK (circom-ecdsa blog post). This primitive has led to a small explosion of applications that leverage ECDSA verification in a SNARK to anonymously verify that you control one Ethereum address in a set of addresses. Such projects include cabal, zkmessage, and heyanon.

circom-batch-ECDSA implements a primitive allowing for optimized verification of a batch of ECDSA signatures. Like circom-ECDSA, circom-batch-ECDSA has a wide variety of applications ranging from speeding up ZK rollups to enabling new identity primitives. Batch ECDSA could also potentially be used to reduce calldata costs in applications like optimistic rollups in the future--with the current gas schedule, rough benchmarks suggest that replacing ECDSA signatures with SNARK proofs could reduce calldata costs by up to 18%.

Benchmarks

The below table compares our optimized batch ECDSA circuit with serial circom-ECDSA verification in a SNARK for batch verification of B ECDSA signatures. circom-ecdsa requires ~1.5 million constraints for verifying 1 signature, so naive use of circom-EDSA for B signatures would result in ~1.5B million constraints.

The relevant rows to pay attention to are constraints, witness generation time and proving time. Because generating a zkSNARK proof is very computationally expensive (and scales linearly with the number of constraints and total number of signals), reducing the constraint count means that it takes less time to generate a proof.

We also explicitly measure the proof generation time (which is the sum of the witness generation time and proving time).

As the table below demonstrates, as batch size B increases, speedup (typically) increases as well[1].

verify2verify4verify8verify16verify32
Constraints1.9M2.8M4.6M8.1M15.3M
Witness generation44s68s130s211s436s
Proving time4s7s14s40s86s
Total proof generation time48s75s144s251s522s
Proof verification time1s1s1s1s1s
Naive Witness Gen83s167s328s663s1360s
Naive Proving Time35s42s110s211s332s
Total naive proof generation time118s209s438s874s1692s
Speedup2.45x2.78x3.04x3.48x3.24x

Circuit metadata statistics

In the table below we provide some information about time taken to compile the circuit and generate the proving and verifying key.

verify2verify4verify8verify16verify32
Circuit compilation162s186s345s579s1101s
Trusted setup phase 2 key generation641s923s1485s2715s5352s
Trusted setup phase 2 contribution120s196s366s498s987s
Proving key size1.1GB1.8GB2.9GB4.8GB9.0GB
Proving key verification709s1050s1769s3198s6450s

Motivation: Reducing Transaction Fees for Optimistic Rollups?

In this section we detail one of the most compelling potential use cases of batch ECDSA verification--helping L2 rollups save on gas costs! Note that we have not implemented this with our circuit, but this use-case was a large motivation behind implementing this primitive.

Optimistic rollups, such as Optimism or Arbitrum, process batches of transactions offline with a "sequencer", then post state root updates to Ethereum mainnet periodically. As part of their update, they post all the transactions in this batch to calldata so that this information is available to all users (and can be used by people wanting to challenge a state update by issuing a fraud proof). You can see the contracts which implement this for Optimism and Arbitrum respectively. The key to reducing costs on optimistic rollups is to compress this data as much as possible.

Our key insight is that ECDSA signatures are "unforgeable" data. If we can prove that someone knows of a valid signature for a message, this can replace the signature itself. Our idea is to replace all the signatures on a batch of transactions with a single batch-ECDSA SNARK that proves that sequencer knows of valid signatures for all these transactions. Alongside the SNARK proof, we have to also include the sender address for each transaction, as ecrecover typically relies on the presence of the transaction to recover the sender.

ECDSA signatures are 64 bytes each, whereas groth16 proofs are 131 bytes. If we are able to replace 50 ECDSA signatures with 1 batch ECDSA SNARK and 50 addresses (20 bytes each), then we replace 3200 bytes with 1131 bytes, resulting in significant cost savings. In general, we can replace N 64-byte signatures with N 20-byte addresses, by incurring a fixed overhead of 131 bytes (the size of the SNARK proof).

In each commitment to the Canonical Transaction Chain, Optimism puts ~200 ECDSA signatures, which costs  200,000=1664200 gas. If we replace this with a single SNARK proof (131 bytes) and sender addresses (20 bytes per transaction), this gas cost is reduced to  66,000 gas. At one Canonical Transaction Chain block per 5m and 30gwei gas cost, this is ~1.1 ETH per day savings!

While the size of a SNARK proof and verification cost is always the same no matter the complexity of the circuit, having an optimized circuit is helpful for meeting latency requirements of rollups, as a SNARK proof has to be generated for each block.

Optimizoooooors

Having motivated the need for an efficient batch ECDSA verification scheme inside a SNARK and demonstrated our benchmarks, we dive into some of the theoretical foundations behind the optimizations that we implemented.

Mathematical Recap

We give a quick mathematical recap of ECDSA signature generation and verification (mostly following terminology in the Wikipedia article). This is important context to have to understand the optimizations which make circom-batch-ECDSA more efficient than wrapping the circom-ECDSA circuit in a for loop.

We fix an elliptic curve group with generator G of order n. We let the private key be d and the corresponding public key Q=dG (where dG denotes G added to itself d times). The given a message m, we generate a signature of message m signed by the public key Q as follows

  • Calculate H(m) as the hash of the message m, where we assume H(m)[1,n)
  • Select a random k[1,n)
  • Calculate the curve point (x1,y1)=kG
  • Calculate rx1 mod n
  • Calculate sk1(h+rd) mod n

To verify a signature (r,s) for message m and public key Q, we have to check the following:

  • Q is a valid public key
  • Verify that if (x,y)=(H(m)s1)G+(rs1)Q then (x,y)O (i.e. is not the point at infinity) and rx mod n.

For batch ECDSA, we will also need to be familiar with a variant of ECDSA known as ECDSA*, in which the user provides an r (in addition to r,s) such that R=(r,r)=(x1,y1) where the verification conditions are as follows

  • Q is a valid public key
  • Verify that R=(H(m)s1)G+(rs1)Q.

In ECDSA*, the user is essentially simplifying the verification process by providing the y coordinate of the elliptic curve point that r is the x-coordinate of. Given a regular ECDSA signature (with only r), it is possible to derive what r is from the equation of the elliptic curve. From now on, we let R denote the full elliptic curve point in our ECDSA signatures, as we will need this for batch verification. Note that if the signature is valid, such an r making up R=(r,r) always exists, so existance of R is guaranteed.

Batch ECDSA Verification

Naive Method

Naive ECDSA batch verification takes in a batch (of size B) ECDSA signatures (ri,si) and verifies that all of the signatures are valid for messages mi and public keys Qi respectively. A naive method for batch verification is simply to loop through all ECDSA signatures in the batch, individually verify each signature by computing out the equations above and then return true if all signatures verify. With the circom-ecdsa library, we can naively implement batch ECDSA verification in a SNARK just by looping over all B signatures and applying the ECDSA verification primitive to each of them serially.

Can we do better?

It turns out that we can! As outlined in this note, it is in fact possible to reduce the verification of B ECDSA signatures to a single algebraic equation while retaining proper security guarantees. Given t chosen as a random field element, verification of a batch of signatures is equivalent to checking the following elliptic curve point equality (Equation (1)):

itiRi=(iti(hisi1))G+(iti(risi1))Qi

As a reminder, Ri denotes (ri,ri), i.e. the ECDSA* signature.

Why is this equivalent to verifying all B signatures individually? Consider the following polynomial equality (Equation (2)):

ixiRi=ixi((hisi1)G+(risi1)Qi)

Notice that this polynomial equality holds if and only if all B signatures are valid (as the coefficients of xi on both the LHS and RHS are simply the LHS and RHS of ECDSA* verification). But by the Schwartz Zippel lemma, if we evaluate 2 polynomials at some random t and the evaluations are equal, it suggests that the polynomials are in fact identical with very high probability. You'll notice that Equation (1) is simply the evaluation of Equation (2) at the random point t (with some distributing of coefficients). Thus by the Schwartz Zippel lemma, if we choose a random t and show the equality in Equation (1), then it is equivalent to verifying all B signatures.

Note that t must be "random" for the Schwartz Zippel lemma to hold! In a SNARK circuit, there is no "native" access to randomness (like we have on a CPU), so we construct a deterministic psuedorandom t by hashing all of the inputs into the SNARK.

Window Method & Amortized Doubling

In the context of implementing these cryptographic protocols in SNARKS, scalar multiplies and additions on elliptic curve points are quite expensive, since the elliptic curves that we work with for ECDSA signatures in Ethereum (e.g., secp256k1) are "out of field" for the arithmetic done in SNARKs.

While computing the algebraic expression in Equation (1) may look expensive, there are several fairly effective optimizations for computing linear combinations of elliptic curve points. To compute the scalar multiple d of an elliptic curve point P efficiently, we can use the windowed method to compute the scalar multiple efficiently.

In the windowed method, one selects a window size w and computes all 2w values of dP for d=0,...,2w1. The algorithm now uses the representation (d0,...,dm) of d in base 2w to compute dP efficiently:

Q = 0
for i in reversed(range(m+1)):
  Q = point_double_repeat(Q, w)
  if d[i] > 0:
      Q = point_add(Q, d[i]*P) # using pre-computed value of d[i]*P
return Q

Note that point_double_repeat here computes 2wQ with iterative doubling.

We can modify this algorithm to compute the linear combination of elliptic curve points d(1)P1+...+d(B)PB while retaining only m calls to point_double_repeat:

Q = 0
for i in reversed(range(m+1)):
    Q = point_double_repeat(Q, w)
    for j in range(B):
        if d[i,j] > 0:
            Q = point_add(Q, d[i,j]*P) # using pre-computed value of d[i][j]*P
return Q

Intuition

It is helpful to have an intuitive understanding of what operations in the above algorithms are "expensive". It turns out that both point_add and point_double_repeat are expensive, with the latter being w times more expensive than point_add. point_add takes around X constraints, point_double_repeat takes  wX constraints, because it requires doubling an elliptic curve point w times.

If we were to implement a naive method of verifying each signature individually, we'd have to do m adds and m point doubles for each signature, which means that we would need mB adds and mwB point doubles in total. But with this optimization, because the additions occur in an inner loop, we're able to get away with mw point doubles for the linear combination of B points. This leads to a savings of (B1)mw point doubles, which is substantial!

Our benchmarks show that this optimization can save up to 68% in terms of circuit size (# of constraints) when compared to a naive for-loop wrapper around circom-ecdsa.

Code

The code for our circuits can be found here.


  1. Note that this pattern only holds up to a point: the speedup for verify32 is slightly lower than the speedup for verify16 (3.24x vs. 3.48x). As the circuit gets larger, the amount of memory required for proof generation also grows, and we hypothesize overflowing into swap on our profiling machine played a role in this. ↩︎