Batch ECDSA in SNARKs
September 12, 2022
Batch ECDSA in SNARKs
by John Guibas and Uma Roy. With gratitude to Yi Sun for writing the original spec, Jonathan Wang for circuit design help, and the Optimism & Arbitrum teams for fruitful discussions.
We present circom-batch-ECDSA (link to repo), a proof-of-concept implementation building on top of circom-ECDSA (previous work done by others in the 0xPARC community) with inspiration from halo2-batch-ECDSA that allows for dramatically faster verification of a batch of ECDSA signatures in a single SNARK.
Introduction
ECDSA signatures are widely-used cryptographic primitive that allows for one to be convinced that a message could have only come from someone who owns a specific public key. In the context of blockchains, ECDSA signatures are often used to verify that submitted transactions came from whoever owns the account emitting the transaction.
The circom-ECDSA project by 0xPARC implements ECDSA signature verification inside a zkSNARK (circom-ecdsa blog post). This primitive has led to a small explosion of applications that leverage ECDSA verification in a SNARK to anonymously verify that you control one Ethereum address in a set of addresses. Such projects include cabal, zkmessage, and heyanon.
circom-batch-ECDSA implements a primitive allowing for optimized verification of a batch of ECDSA signatures. Like circom-ECDSA, circom-batch-ECDSA has a wide variety of applications ranging from speeding up ZK rollups to enabling new identity primitives. Batch ECDSA could also potentially be used to reduce calldata costs in applications like optimistic rollups in the future--with the current gas schedule, rough benchmarks suggest that replacing ECDSA signatures with SNARK proofs could reduce calldata costs by up to 18%.
Benchmarks
The below table compares our optimized batch ECDSA circuit with serial circom-ECDSA verification in a SNARK for batch verification of circom-ecdsa requires ~1.5 million constraints for verifying 1 signature, so naive use of circom-EDSA for
The relevant rows to pay attention to are constraints, witness generation time and proving time. Because generating a zkSNARK proof is very computationally expensive (and scales linearly with the number of constraints and total number of signals), reducing the constraint count means that it takes less time to generate a proof.
We also explicitly measure the proof generation time (which is the sum of the witness generation time and proving time).
As the table below demonstrates, as batch size
| verify2 | verify4 | verify8 | verify16 | verify32 | |
|---|---|---|---|---|---|
| Constraints | 1.9M | 2.8M | 4.6M | 8.1M | 15.3M |
| Witness generation | 44s | 68s | 130s | 211s | 436s |
| Proving time | 4s | 7s | 14s | 40s | 86s |
| Total proof generation time | 48s | 75s | 144s | 251s | 522s |
| Proof verification time | 1s | 1s | 1s | 1s | 1s |
| Naive Witness Gen | 83s | 167s | 328s | 663s | 1360s |
| Naive Proving Time | 35s | 42s | 110s | 211s | 332s |
| Total naive proof generation time | 118s | 209s | 438s | 874s | 1692s |
| Speedup | 2.45x | 2.78x | 3.04x | 3.48x | 3.24x |
Circuit metadata statistics
In the table below we provide some information about time taken to compile the circuit and generate the proving and verifying key.
| verify2 | verify4 | verify8 | verify16 | verify32 | |
|---|---|---|---|---|---|
| Circuit compilation | 162s | 186s | 345s | 579s | 1101s |
| Trusted setup phase 2 key generation | 641s | 923s | 1485s | 2715s | 5352s |
| Trusted setup phase 2 contribution | 120s | 196s | 366s | 498s | 987s |
| Proving key size | 1.1GB | 1.8GB | 2.9GB | 4.8GB | 9.0GB |
| Proving key verification | 709s | 1050s | 1769s | 3198s | 6450s |
Motivation: Reducing Transaction Fees for Optimistic Rollups?
In this section we detail one of the most compelling potential use cases of batch ECDSA verification--helping L2 rollups save on gas costs! Note that we have not implemented this with our circuit, but this use-case was a large motivation behind implementing this primitive.
Optimistic rollups, such as Optimism or Arbitrum, process batches of transactions offline with a "sequencer", then post state root updates to Ethereum mainnet periodically. As part of their update, they post all the transactions in this batch to calldata so that this information is available to all users (and can be used by people wanting to challenge a state update by issuing a fraud proof). You can see the contracts which implement this for Optimism and Arbitrum respectively. The key to reducing costs on optimistic rollups is to compress this data as much as possible.
Our key insight is that ECDSA signatures are "unforgeable" data. If we can prove that someone knows of a valid signature for a message, this can replace the signature itself. Our idea is to replace all the signatures on a batch of transactions with a single batch-ECDSA SNARK that proves that sequencer knows of valid signatures for all these transactions. Alongside the SNARK proof, we have to also include the sender address for each transaction, as ecrecover typically relies on the presence of the transaction to recover the sender.
ECDSA signatures are 64 bytes each, whereas groth16 proofs are 131 bytes. If we are able to replace 50 ECDSA signatures with 1 batch ECDSA SNARK and 50 addresses (20 bytes each), then we replace 3200 bytes with 1131 bytes, resulting in significant cost savings. In general, we can replace
In each commitment to the Canonical Transaction Chain, Optimism puts ~200 ECDSA signatures, which costs
While the size of a SNARK proof and verification cost is always the same no matter the complexity of the circuit, having an optimized circuit is helpful for meeting latency requirements of rollups, as a SNARK proof has to be generated for each block.
Optimizoooooors
Having motivated the need for an efficient batch ECDSA verification scheme inside a SNARK and demonstrated our benchmarks, we dive into some of the theoretical foundations behind the optimizations that we implemented.
Mathematical Recap
We give a quick mathematical recap of ECDSA signature generation and verification (mostly following terminology in the Wikipedia article). This is important context to have to understand the optimizations which make circom-batch-ECDSA more efficient than wrapping the circom-ECDSA circuit in a for loop.
We fix an elliptic curve group with generator
- Calculate
as the hash of the message , where we assume - Select a random
- Calculate the curve point
- Calculate
mod - Calculate
mod
To verify a signature
is a valid public key - Verify that if
then (i.e. is not the point at infinity) and mod .
For batch ECDSA, we will also need to be familiar with a variant of ECDSA known as ECDSA*, in which the user provides an
is a valid public key - Verify that
.
In ECDSA*, the user is essentially simplifying the verification process by providing the
Batch ECDSA Verification
Naive Method
Naive ECDSA batch verification takes in a batch (of size
Can we do better?
It turns out that we can! As outlined in this note, it is in fact possible to reduce the verification of
As a reminder,
Why is this equivalent to verifying all
Notice that this polynomial equality holds if and only if all
Note that
Window Method & Amortized Doubling
In the context of implementing these cryptographic protocols in SNARKS, scalar multiplies and additions on elliptic curve points are quite expensive, since the elliptic curves that we work with for ECDSA signatures in Ethereum (e.g., secp256k1) are "out of field" for the arithmetic done in SNARKs.
While computing the algebraic expression in Equation (1) may look expensive, there are several fairly effective optimizations for computing linear combinations of elliptic curve points. To compute the scalar multiple
In the windowed method, one selects a window size
Q = 0
for i in reversed(range(m+1)):
Q = point_double_repeat(Q, w)
if d[i] > 0:
Q = point_add(Q, d[i]*P) # using pre-computed value of d[i]*P
return QNote that point_double_repeat here computes
We can modify this algorithm to compute the linear combination of elliptic curve points point_double_repeat:
Q = 0
for i in reversed(range(m+1)):
Q = point_double_repeat(Q, w)
for j in range(B):
if d[i,j] > 0:
Q = point_add(Q, d[i,j]*P) # using pre-computed value of d[i][j]*P
return QIntuition
It is helpful to have an intuitive understanding of what operations in the above algorithms are "expensive". It turns out that both point_add and point_double_repeat are expensive, with the latter being point_add. point_add takes around X constraints, point_double_repeat takes
If we were to implement a naive method of verifying each signature individually, we'd have to do
Our benchmarks show that this optimization can save up to 68% in terms of circuit size (# of constraints) when compared to a naive for-loop wrapper around circom-ecdsa.
Code
The code for our circuits can be found here.
Note that this pattern only holds up to a point: the speedup for
verify32is slightly lower than the speedup forverify16(3.24x vs. 3.48x). As the circuit gets larger, the amount of memory required for proof generation also grows, and we hypothesize overflowing into swap on our profiling machine played a role in this. ↩︎