zkPairing (Part 2): Technical Explainer

August 24, 2022

zkPairing (Part 2): Technical Explainer

by Jonathan W., Vincent H., and Yi Sun

In a previous post, we presented zkPairing, a proof-of-concept implementation of elliptic curve pairings in Circom. This post explains some of the techniques and optimizations which went into our work.

What are elliptic curve pairings?

Pairing-based cryptography relies on the construction of a special function known as a bilinear pairing. A pairing is a function e(P,Q) which takes two points P and Q on an elliptic curve and outputs a scalar. To be useful for cryptography, we ask for two additional properties on the pairing which mean it is compatible with arithmetic on the elliptic curve:

  • bilinearity: e(uP,vQ)=e(P,Q)uv for elliptic curve points P,Q and non-zero integers u,v.
  • non-degeneracy: e(P,Q)1 unless P=Q=O.

These properties imply the decisional Diffie-Hellman property

e(uP,vP)=e(P,Q)Q=uvP.

Most elliptic curves used in cryptography are assumed to satisfy the computational Diffie-Hellman assumption: given any point P and the points uP and vP, it is hard to compute uvP. This means that pairings can be used to verify knowledge of uvP, though uvP itself remains hard to compute.

Luckily, pairings satisfying these properties exist on many elliptic curves used in cryptography. In this post, we will be interested in the BLS12-381 curve and the Tate and optimal Ate pairings on it. The construction of these pairings originates in algebraic geometry (the study of solutions to polynomial equations) and is quite involved. You can find the gory details here, but at a high level the optimal Ate pairing involves the following operations:

  • Miller loop: Using a recursive algorithm to compute a function fλ,Q(P) of the two input points.
  • Final exponentiation: Raising fλ,Q(P) to the large power (q121)/r.

How do BLS signatures work?

The BLS signature scheme is a signature scheme based on elliptic curve pairings. BLS signatures have several nice properties. First, unlike RSA or ECDSA signatures, they are unique and deterministic: given a key and message, there is only one possible valid signature. Second, BLS signatures can be efficiently adapted to create aggregate and threshold signatures, which involve checking that some number of people from a group have signed messages.

On an elliptic curve with group of points of order r, a BLS private key is simply a number x between 0 and r, and the corresponding public key is xG for some base point G. To sign a message m with this private key, the signer computes a hash H(m) of the message onto the elliptic curve and takes s=xH(m) to be the signature.

To verify a signature s against a public key xG, the verifier checks the pairing equality

e(G,s)=e(xG,H(m)).

By the decisional Diffie-Hellman property discussed earlier, this implies that s=xH(m). Because BLS signature verification only requires computing two pairings, SNARK-ing pairings naturally extends to SNARK-ing BLS signature verification.

Circuits and Techniques

Finite field extensions

Earlier in the post, we mentioned that the arguments to an elliptic curve pairing are points (x,y) on an elliptic curve x2=y3+ax+b. Though x and y are often taken to be integers modulo a prime q, for pairing based cryptography, we will need to choose x and y to lie in a more complicated mathematical object called a finite field extension.

The easiest way to understand finite field extensions is with an example. Consider the integers modulo 3, which form the set {0,1,2}. This is denoted by F3 and called a finite field. For any elements x,y in F3, we can add or multiply them to produce another element of F3, e.g. 2+2=1, meaning F3 is self-contained under addition and multiplication.

However, there is no element xF3 with x2+1=0; we can easily check this explicitly: 02+1=1,12+1=2,22+1=2. So to solve the equation x2+1=0, we must define a new element i outside of F3, with i2+1=0, and then we can extend the field F3 to a larger finite field F32 consisting of the nine elements a+bi for 0a,b<3. (It's referred to as F32 because there are 32=9 elements.) This is analogous to how, within the real numbers R, there is no solution to x2+1=0, so we augment R with some element i satisfying i2+1=0 to generate the complex numbers C consisting of the elements a+bi for a,bR.

The above example shows how, for a prime q, we can go from the finite field Fq to the field extension Fq2. For pairings on BLS12-381, we must consider points on the curve with coordinates in an even larger field Fq12 which is itself an extension of Fq2. As usual, q differs from the baby Jubjub prime p native to Circom, so we must extend the BigInteger arithmetic circuits from zk-ECDSA to apply to these field extensions. Because of the complexity of this arithmetic, we also adopt the technique of lazy reduction from zk-ECDSA.

Miller loop

Let q be the 381-bit prime order of the coordinate field of BLS12-381, and let r be the 255-bit order of the group of points on BLS12-381. The goal of the Miller loop is to compute an element fλ,Q(P) of Fq12 which is the evaluation of a certain rational function fλ,Q at the point P, where λ is the remainder when q is divided by r.

The rational function fλ,Q is defined recursively via Miller's algorithm, which constructs rational functions fi,Q for 0<iλ by the recursion

fi+j,Q=fi,Qfj,Qli,j,Qvi+j,Q,

where li,j,Q denotes the equation of a line passing through iQ,jQ and vi+j,Q denotes the equation of a vertical line passing through (i+j)Q. If we start with f1,Q=1, this allows us to iteratively compute larger functions fi,Q. This can be done most efficiently by considering the binary representation of i: for instance, to compute f13,Q=f11012,Q, we can use the recursive relation to compute f12,Q,f102,Q,f1002,Q,f1012,Q,f10102,Q,f10112,Q in order.

Our Miller loop circuit performs this process with a few optimizations also present in non-ZK implementations:

  • Instead of computing the points iQ directly, which requires elliptic curve arithmetic over Fq12, we use a twist to an elliptic curve over Fq2.
  • Because we will raise fλ,Q(P) to the (q121)/r power in the final exponentiation, we omit all of the vertical line evaluations vi+j,Q(P), as they end up becoming 1 when exponentiated.

Final exponentiation

The optimal Ate pairing is given by raising the outcome fλ,Q(P)Fq12 of the Miller loop to the q121r power, namely

e(P,Q)=fλ,Q(P)(q121)/r.

Because q is 381-bit and r is 255-bit, q121r is over 4000 bits, so naive exponentiation via repeated multiplication is extremely inefficient. Instead, we employ various optimizations that are again present in non-ZK implementations of the optimal Ate pairing.

We break the computation of f(q121)/r into two steps, based on the factorization

q121r=(q61)(q2+1)q4q2+1r,

where r divides q4q2+1.

Easy part (Frobenius operator)

In the easier first step, we compute feasy=f(q61)(q2+1). The reason that computing feasy is "easy" is because it turns out that computing fq is relatively cheap due to certain special properties of working with finite fields. This operator ffq is known as the Frobenius operator. By precomputing certain fixed parameters, the cost of a Frobenius operator is the same as a single Fq12 multiplication, as opposed to log2(q) of them. This allows us to constrain feasy at the cost of only a few multiplications.

Hard part (cyclotomic subgroup)

We complete the exponentiation by computing fhard=feasy(q4q2+1)/r. The number q4q2+1r is still 1268-bits, so a direct computation is suboptimal. The main idea behind the optimization here is again to use the Frobenius operator:

  • We write q4q2+1r=λ0+λ1q+λ2q2+λ3q3 for λi small using [Section 5, HHT]. Then we compute feasyλi and use the Frobenius operator to get fhard=(feasyλ0)(feasyλ1)q(feasyλ2)q2(feasyλ3)q3.

We use one further optimization to speed up computation of the feasyλi. Because of how we defined feasy, feasy belongs to the cyclotomic subgroup (meaning that feasyq4q2+1=1). By [Karabina], elements in the cyclotomic subgroup can be compressed into a format that uses less data, and squaring can be done more cheaply in the compressed format. We use this compressed format to perform optimized exponentiation via the ordinary square-and-multiply method.

BLS Signatures: Glued Miller loop and hashing to curve

Remember from earlier in the post that BLS signature verification requires checking that

e(G,s)=e(xG,H(m)).

The obvious way to do this is to compute two pairings: one on the left and one on the right. However we can do better: the bilinearity property of pairings tells us that e(xG,H(m))1=e(xG,H(m)), so we can equivalently check whether

e(G,s)e(xG,H(m))=1.

The left hand side is equal to (fλ,s(G)fλ,H(m)(xG))(q121)/r, so we only need to perform one final exponentiation at the end. Moreover, in the recursion step of the Miller loop, we do not need to keep track of fi,s(G) and fi,H(m)(xG) individually; we only care about their product. This saves us an Fq12 multiplication at each step of the "glued" Miller loop.

Hashing to curve

Up to now we have described how to verify BLS signatures given a hashed message H(m). However we have not talked about how to get from the message m, which is typically a byte string, to H(m). It turns out this is quite complicated -- there is an entire Internet Draft about how to do it. We refer to BLS12-381 For The Rest Of Us for an in-depth overview, but we will provide a summary for convenience:

  1. hashToField: Starting with m, one hashes it to two elements u[0], u[1] of Fq2 using a hash like SHA-256. This step depends on a DST (domain separation tag), which is highly implementation dependent, so be sure to check you are using the right one!
  2. Optimized simplified SWU map for BLS12-381: [Section 4, Wahby-Boneh] provide an optimized method to go from elements u in Fq2 to E2(Fq2) where E2 is the twist of BLS12-381. This is based on a simplified version of the Shallue-van de Woestijne-Ulas (SWU) mapping. (Footnote: the sgn0 function described in Wahby-Boneh is different from the one in the Internet Draft.)
  3. Cofactor clearing: We now have an element P in E2(Fq2). However it does not necessarily lie in the subgroup G2 of elements of order r. The size of E2(Fq2) is h2r where the cofactor h2 is quite large. To map P to G2, the naive thing to do is to scalar multiply P by h2. Since h2 is huge, this is extremely expensive. Instead, there is a different, faster method to map P to G2. This method uses an endomorphism ψ by Budroni-Pintore which is closely related to the Frobenius operator.

Subgroup checks

For full security, we also need to check that the public key xG belongs to a subgroup G1 and the signature s belongs to a subgroup G2. The naive ways to do this are expensive, and there are various optimizations involving the endomorphism ψ in the same spirit as in hashing to curve. We use the latest (2021) optimizations of Scott.

Outlook and roadmap

Adapting to other elliptic curves

Although our circuits target the BLS12-381 curve, they can be easily adapted to other curves. We have recently modified our circuits to work for the BN254 curve, which Ethereum supports with precompiles for elliptic curve arithmetic and pairings. This used the following modifications:

  • Field extension representation: The scalar field Fq of BN254 differs from that of BLS12-381, so we must change our representation for the field extensions Fq2 and Fq12 accordingly.
  • Curve coefficients: BLS12-381 and BN254 have different coefficients a and b in the Weierstrass normal form y2=x3+ax+b, so we must update these.
  • Optimal Ate definition: BN254 lies in the Barreto-Naehrig family of curves, for which we must modify the λ used for fλ,Q(P) in the definition of the optimal Ate pairing.
  • Final exponentiation optimization: Because of the smaller size of the scalar field Fq and the higher Hamming weight of a certain parameter involved in defining q and the curve order r for BN254, we must reassess whether compressing to the cyclotomic subgroup still offers an advantage over the direct square-and-multiply method.

Recursive verification / proof aggregation

Implementing the optimal Ate pairing on BN254 opens up the possibility of implementing zkSNARK verification within a zkSNARK. For the Groth16 proof system, verifying a proof π=(A,B,C) involves checking that

e(A,B)=e(gα,hβ)e(i=1kSiai,hγ)e(C,hδ),

where e is the optimal Ate pairing on BN254, (gα,hβ,hγ,hδ,{Si}i=1k) form the verification key, and {ai}i=1k are the public inputs. With the help of Nalin Bhardwaj, we have recently implemented ZK circuits for Groth16 verification by checking this equality. In a future blog post, Nalin will explain the implementation in more detail.

Recursive verification and aggregate verification with techniques like SnarkPack allows us to construct zkSNARKs in a modular fashion. To give an example, suppose that Alice wants to generate a zkSNARK for the composition h(x) = f(g(x)) of two function calls. She can either (1) generate a SNARK for the large combined function h(x) or (2) generate a SNARK proof π for the statement y = g(x) and a second SNARK proof for the two statements z = f(y) and "π is a valid proof for y = g(x)". The latter approach, enabled by recursion, allows Alice to prove that z = h(x) while only proving SNARKs smaller than the one for h(x). This is particularly useful in size-constrained settings where the composed SNARK has more constraints than the current largest trusted setup for Groth16 (~256M constraints).

PLONK and KZG commitment verification

Many of the most efficient modern ZKPs such as PLONK are based on the KZG polynomial commitment. Batch opening the value of a committed polynomial in the KZG commitment scheme at several points requires checking a single equality involving elliptic curve pairings. Our techniques for pairings thus open the path to verification for KZG commitments and thus verification for PLONK-type SNARKs which are based on batch openings of KZG commitments.

Circuit auditing and verification

As discussed in earlier posts on zk-ECDSA, these circuits are unaudited and not ready for production use. Due to the complexity and size of these circuits, ensuring their correctness is particularly difficult. We have implemented some basic unit tests for witness correctness, but a few directions we are exploring are:

  • Testing for witness uniqueness using automated tools like Ecne. Our circuits are substantially larger than previous circuits Ecne has been applied to, and we are excited to see whether such tools can scale to our use case.
  • We have created extensive documentation for each component of our circuits, including specifying pre- and post-conditions on each template. In addition to facilitating a human audit, this can form the first step towards a specification for formal verification of our circuits.

Conclusion

We invite you to check out our repo, and we'd love to hear your feedback or ideas for improvement. We are also actively exploring further applications of this work -- please reach out if you have an application of zkSNARKs for pairings!

This project was built during 0xPARC's ZK Identity Working Group with the support of the ZKxZK Gitcoin grant.

We use many optimization techniques from circom-ecdsa and xJsnark for big integer and elliptic curve arithmetic. We would also like to thank Jordi Baylina for his crucial help in teaching us how to use the Circom / snarkjs toolstack for large ZK circuits.