Patterns of Proofs of Infinity

At the core, the concept of infinity relies on the fact that no matter how many numbers we have, there’s always at least one more. This is formally expressed in the principle of induction.

Induction is the first and most widely-used method of proving that there is an infinite amount of some thing. Strong induction is a sub-pattern, where all the things that we have so far are taken together to show that there must be one we don’t have yet. Strong induction is the method used by Euclid to show there are infinitely many primes: take all the primes we have, multiply them all together, and add one to the result. This number must be relatively prime to all of those we have, and therefore its prime factors are not among those we already had, so we add these prime factors to our list. Now we are at the beginning again, and this process can be repeated indefinitely.

Another pattern is to “acknowledge the properties of a set.” Consider the set of Fermat numbers, defined by F_n = 2^2^n+1.  Each member of this set of numbers (for non-negative n) is pairwise relatively prime with all the others, and this property is relatively easy to prove.  At once we can see that if we consider the set of prime factors of all of the Fermat numbers, these must also be pairwise relatively prime with each other, and therefore they are all distinct.  We can immediately follow up by noting that there are an infinite number of Fermat numbers (as many as there are non-negative integers), and therefore an infinite number of primes.

Still another pattern is to show that the set of numbers we are considering has a count that is bounded below by a given sum, and then follow by showing that this sum goes to infinity.  This is the pattern used by Dirichlet in the proof that every arithmetic progression ax+b with (a,b)=1 and a=/=0 contains an infinite number of primes.

The pattern used by the recent proof that there are infinitely many twin prime pairs is a complicated sub-pattern of induction.  It is possible to argue that all proofs of infinity are sub-patterns of induction, but some are more immediate descendants than others.  This pattern uses the following logic:

  • Two functions f,g:Z^2 to Z+ with f having range A and g having range B are given with the following properties:
    1. Given u in Z+, we have |A^[u+1, 2u]|>=|B^[u+1,2u]|
    2. there exist a,b in Z : f(a,b) = 2x if and only if there exist c,d in Z : g(c,d) = x
    3. there exists z in Z+ : for all a,b in Z, f(a,b) =/= z
  • Property (1) allows half of an induction.  Property (2) allows the other half, where we can jump from one interval to the next one that is doubled (and non-overlapping).  Property (3) says that we can start the induction.

Other patterns that were attempted in the making of the twin primes theorem proof include making an “arrangement” of the known primes and their congruence classes around certain numbers to prove that examples can exist; creating “lattices” of integer-valued polynomials to offer comparisons between similar polynomials; showing that for all polynomials of the same degree having the same number of polynomial factors and constant factors, there exists one example in the “positive-x” direction.

This last attempt may continue to be a viable option.  It uses standard induction, but instead of being given a starting point, the starting point is “continuously created.”  In particular, if f(x) is a given polynomial where for a value x_0 > 0, f(x_0) produces an “example”, then g(x)=f(x_0+x) is also a polynomial.  If we have proven that all polynomials “similar” to f(x) have “examples” in the positive-x direction, and if we prove that this g(x) is “similar” to f(x), then we know that there exists x_1 > x_0 such that g(x_1)=f(x_0+x_1) is an example.  This particular approach has not borne fruit as of yet, but continues to show promise.


Claims and Proofs

Edit: it appears that the original paper has been updated with many of these claims directly addressed.  This post will remain as is for posterity.


It appears that many of the statements made in a recent linked post had little to no proof to back them up. Let’s try to give some extra details here…

The first couple sections don’t seem to have very many claims, or the claims present are either cited or reasonably-well shown.  Starting with section 3.3, there is the “single-function version” of the Twin Primes sieve, and there is very little in the way of proof for the statements made.

– Claim 1: x-1 and x+1 can represent a twin prime pair

  • proof: the twin primes are those pairs of primes having an absolute difference of 2, so p and p+2 or p and p-2 are common representations of these primes.  Adding 1 to each of p, p-2 or subtracting one from each of p, p+2 yields p-1, p+1, and changing the letter designation has no effect

– Claim 2: the smaller of two factors a, b of a number n can be no larger than the square root of n

  • proof: if a and b are factors of a number n such that a * b = n, then this is true, since if a and b are both larger than sqrt(n), then we have (sqrt(n)+q)*(sqrt(n)+r) = sqrt(n)*sqrt(n)+sqrt(n)(q+r)+q+r = n+sqrt(n)(q+r)+q+r.  So this claim would be true if it were clarified to apply only to when a*b=n

The rest of this section is hand-waving until we get to the end:

– Claim 3: all twin prime pairs get missed by the sieve represented by the function f(s,t)=(s+1)(s^2+s+t-1)

  • proof: let x-1, x+1 be given such that both are prime, and suppose that there exist an s and t such that x^2-1=(s+1)(s^2+s+t-1).  Then we have that s+1 must divide either x-1 or x+1.  Since s and t are positive integers, s+1 is never greater than s^2+s+t-1, which means that s+1 must divide x-1; since x-1 is prime and s+1 must be greater than 1, this forces that s+1=x-1, and that s+3 = x+1 and thus that s+3=s^2+s+t-1.  Continuing the algebra, we get s^2+t=4.  The only positive integer solutions here are s=1 and t=3, giving us the values x-1=2, x+1=4.  These are not a twin prime pair, contradicting our assumption that there are twin prime pairs which are picked up by the sieve.

– Claim 4: s+1 goes through every possible factor

  • proof: the not-clearly-stated claim here is that there are no pairs of integers with at least one composite which are different by 2 such that the sieve (s+1)(s^2+s+t-1) does not catch them, so assume that there is an x with either x-1 or x+1 non-prime such that for all positive integers s and t, (s+1)(s^2+s+t-1) does not equal x^2-1.  First, suppose that x-1 is non-prime with factors a,b where a*b = x-1 such that 1 is less than a is less than b.  Let s+1=a, then we have s^2+2s+1 = a^2 and s^2+s+t-1 = a^2-a+t-1.  But t can be any positive integer, and a^2-a-1 must be less than b*(x+1) since a is less than b and a is less than x+1, therefore s=a-1, t = b*(x+1)-a^2+a+1 is a solution and the sieve does catch this pair.  The same argument applies when x+1 is the composite number.

Section 3.4 appears to be hand-waving statements again, and does not appear to directly affect the primary claims in later sections.  Section 4.1 starts with a citation, and the algebra to change (2a+1)(2b+1) =/= 2k+1 into 2ab+a+b =/= k is very straightforward.  Then there is more hand-waving, and we are shown a related sieve for the twin primes.  There is another unspoken claim here:

– Claim 5: s^3+2s^2+st+t =/= 4x^2 if and only if 2s^3+2s^2+2st+t =/= x^2

  • proof: the given statement that (s+1)(s^2+s+t-1)=x^2-1 has the same odd-even parity as (s+1)(t-1)=x^2-1 is trivially true; it takes only a short leap to notice that x^2-1 has the same parity as x-1, giving us the form of the odd-even theorem

Section 4.2 has more hand-waving, and the side track of the so-called “modular images“.  It seems that these could be interesting, but all the details have been completely glossed over.  Section 5.1 begins with some more interesting things:

– Claim 6: when a_1 from a_1b_1+a_1+b_1 and a_2 from 2a_2b_2+a_2+b_2 are such that a_1+1=2a_2+1, then the count of numbers “sieved away” by a_1b_1+a_1+b_1 within a finite interval can be at most one different from the count of numbers “sieved away” by 2a_2b_2+a_2+b_2

  • proof: let a=a_1+1=2a_2+1, then we have arithmetic sequences ab_1+a-1 and ab_2+/-[(a+1)/2].  Then a-1 is greater than +/-[(a+1)/2] modulo a.  By the pigeonhole principle, among every a numbers a-1 is one of those numbers and [(a+1)/2] is another.  These are arithmetic sequences, so the a-1 value and the [(a+1)/2] value each always occur in the same location within each a consecutive integers.  Given finite interval [q,ra+s] for positive integers q,r,s, one of a-1, [(a+1)/2] occurs “first” within the interval, then the two congruence classes alternate in appearance until the end of the interval.  This alternating appearance causes there to never be more than a difference of 1 in the count of members of each congruence class appearing within a given interval

– Claim 7: there is no need to evaluate the count of results which are common to both sieves

  • proof: this claim should be self-evident given that the differences in the counts are being explicitly evaluated

– Claim 8: a_1+1=2 gives a naive result count of 2^(m-1)

  • proof: the interval length was identified (straightforward) as 2^m, and our sieve is currently limited to a_1=1 which “sieves away” every other number, either all the evens or all the odds.  Since the sieve offset is a_1, it is all the odds that are “sieved away”, and since an even number of consecutive integers contains an equal number of even and odd numbers, the number that we have “sieved away” is exactly half of the length of the interval, which is 2^(m-1) in this case

– Claim 9: the bound on the number of sequences which produce appropriate results is given by 2a_2+1<sqrt(2^(m+1))

  • proof: if we continue with the assumption that 2a_2+1 is the smaller of the factors of a hypothetical odd composite number, then we can consider that if q is the larger factor, then there was some a_2 such that (2a_2+1)q is the value of this composite number, and this number was already “sieved away” by this value of a_2.  In a prior claim it was established that the smaller factor is no larger than the square root of the original number, and in this case, the original number is even while the factor is odd, meaning that the factor cannot be the square root of the number, and therefore must be strictly less than the square root

Next there is a line about “determining whether this mapping is onto”, and right up until the end, it isn’t clear which sieve’s results provide the domain and which provide the range, up until the end of the paragraph where it appears that the (a_1+1) sieve’s results are mapping onto results from the (2a_2+1) sieve.

– Claim 10: the non-overlaps correspond to the set of primes from the previous interval multiplied by 2

  • proof: any composites from the previous interval must have at least one factor less than the square root of the endpoint of the current interval, meaning that such a factor would also be involved in a sieve arithmetic sequence that overlaps with some value in the a_1+1=2 arithmetic sequence.  The set of primes within the previous interval were missed by all arithmetic sequences, and therefore the multiples of two of each of these primes are now missed by these same sequences within the current interval

The number of primes within a given interval, while not necessarily accurate by the n/log n estimate, should be fairly close by the prime number theorem.  Then there is more hand-waving about having proven the infinity of primes by using a result that already shows there is an infinite number of primes.  This statement assumes that the claims highlighted here are plainly evident to the reader, and that the flow of an induction should be an understood part of the narrative.  If the induction were to be stated explicitly, we would need the “first element” bound prior to finding a first example, so this part makes sense why it would appear so long after many claims and statements.  The next piece would be the inductive step.  The text seems to imply that the inductive step relies on the “odd-even theorem” and the fact that a result in one sieve directly corresponds to a result in another sieve, then that the “onto” mapping implies that this same result implies the existence of a new result in the first sieve.  In short, “have a first one; if one is at hand, prove there is one after it.”

– Claim 11: s^2 == 1 mod s+1 and s^2 == +/-[(s+1)/2] mod 2s+1

  • proof: s == -1 mod s+1, and (-1)^2 == 1 mod k for all k > 0.  For the other result, consider whether s is odd or even with s=2t or s=2u-1: (2t)^2 == -t mod 4t+1, and (2u-1)^2 = 4u^2-4u+1 == u mod 4u-1

– Claim 12: by claim 11 we know that each sieve covers two numbers per consecutive p numbers when a_1+1=p and 2a_2+1=p

  • proof: this claim would be false unless p were taken as prime.  With prime p we know that there are at most two solutions to the congruence x^2 == a mod p, and as each congruence from the previous claim begins as a square, we know that there exist two solutions x in each of the congruences x^2 == 1 mod s+1 and x^2 == +/- [(s+1)/2] mod 2s+1, keeping in mind that the expression +/-[(s+1)/2] mod 2s+1 is only referring to a single congruence class, chosen based on the value of s^2 mod 2s+1

– Claim 13: claim 12 together with claim 11 means that the count of results per prime could be absolutely different by 2

  • proof: using the same logic as with a prior claim, there are a maximum of four results per consecutive p numbers in a single arithmetic sequence, and two are from each of the two sieves, showing that there can be a maximum difference in the result counts of 2

The arithmetic counting the number of squares in a given interval is straightforward.  The upper bound identified on s is also straightforward, having the same form as before.  The count of “extras”, being no more than the number of possible values of s multiplied by the maximum result count difference, is straightforward as well.  The formula for the count of non-overlaps in the s_1+1=2 arithmetic sequence is short on details, but really only needs to specify a few more primes to establish the pattern, as 2^(n-1) * (1-2/3)*(1-2/5)*(1-2/7)*…

– Claim 14: the count of non-overlapped values appearing in the arithmetic sequence identified by (s_1+1)(s_1^2+s_1+t_1-1)=x^2-1 where s_1+1=2 within the specified interval is given by 2^(n-1) * (1-2/3)*(1-2/5)*(1-2/7)*…

  • proof:  if 2^(n-1) is the count of results from s_1+1=2, then we can multiply by 1-2/3 to get the count that does not overlap with the results from s_1+1=3 since there are two results per p consecutive values for each prime p.  Then we can multiply by 1-2/5 to get the count that also does not overlap with the results from s_1+1=5, and we can continue through each prime in this way.  Since we only sieve over the prime numbers, we never need to consider whether there are extra solutions to our congruence x^2 == s^2 mod {s+1, 2s+1}

The “accounting for extras” bit seems like it is unnecessary due to the manner in which the overlaps are counted, but it makes the 4^((n+4)/3) term work out a bit more cleanly, and this term is not reduced by this accounting, so it should be okay.  The evaluation of the product should be straightforward given the citation, and we are only left needing to prove that the constants work out in the manner described.  Fortunately, said constants provide a near 4:1 ratio against the other numerator and denominator terms, making the bound 2^(n-1)/(n+1)^2 entirely appropriate.  With this done, the entire section is complete with the caveats listed above.

The rest of the content hand-waves through with a passing “the rest of the details are left as an exercise” and other similar statements.  We’ll save those for next time.

Correction to Polynomial Lattice

In a previous posting, I created a “polynomial lattice” out of structures found within the Pascal Triangle.  I have found some errors and boundary gaps within the definition I created.

Rather than post differences between the two, I claim that this post has Precedence.  The previous posting is now superseded.

Given a polynomial p(x) specified by a vector _b_ in Z^n as p(x) = b_0 + b_1(x | 1) + b_2(x | 2) + … + b_n-2(x | n-2) + b_n-1(x | n-1), the integer polynomial lattice surrounding p(x) is simply specified as q(x; y) = p`(x) + p(y); r(x; z) = p(x) + p`(z).  This lattice is simple, covers all the necessary bases, and achieves exactly what is required of it:  it is a lattice interleaving one set of polynomials with another; the balance of a polynomial and its parallels and the lattice-pairing of each of these is perfect; no gaps are admitted within the given parameters of the lattice, including boundaries such as when _b_ is an almost-zero vector.

The previous lattice definition allowed for many possibilities of error, and was also incorrectly specified.  Under this definition, it is possible to have _b_ = {0, 0, …, 0, 0, 1}, and to also have a viable lattice useful for determining many properties of p(x).


What is it that we hear every day?  Do we hear the radio, the voices from a favorite TV serial, the sounds of a commute along extended stretches of road, the familiar voices of loved ones?

What happens when that changes?  Do we experience subtle shifts in perspective?  Does the world turn upside down in protest?  If the radio is off during a particularly long commute, would that change everything?

Having recently been introduced to the first couple chapters of Born to Run, a journalist’s novel covering his experiences with a native tribe of Tamahumara in Mexico, and later finding that the tribe is turning from its roots in running from our modern society and every other influence that may cause annihilation amongst its people, I find that I wish they had never been discovered.

There is no possible way that such a culture could have truly remained both alive and hidden from the rest of the world permanently.  As the aforementioned book chronicled, the Mexican government was already seeking ways to put roads through the best portions of the Tamahumara hiding places.  Drug cartels already laid claim to much of the territory.  Eventually, their fate was almost guaranteed to be the same as that of the bushmen, and many other aboriginal cultures.

What does our modern culture offer to them?  Sugar, iPods, jeans, a sense of ownership of things of this world…  Noise.

To a Tamahumaran, I would imagine that a typical day, more than five years ago, would have consisted of a very different set of noises.  Mountain air, desert creature sounds, the feel of simple homespun cloth, the taste of iskiate and corn in every meal.  The culture of intimately knowing your neighbors, despite the long trek required for a visit.

Now, hearing news that this culture has been turning modern, wearing of jeans, eating sugar as a dietary staple, the younger generations adopting the common language of Mexico, being more receptive to outside influence, I wonder if the modernization arrived too quickly.  I wonder if anyone else regrets the losses that are so intangible, yet so central to what was.

I turned off my car radio six months ago.  I have experienced six months’ worth of relatively silent hour-long or longer commutes to job sites.  I have never experienced a greater amount of progress, innovation, or clarity of mind than the past six months.  I have (re-)invented designs for the internal combustion engine, started my own mathematics/coding blog, progressed farther than I ever thought possible along a train of mathematical thought, and become a much better husband and father.  I have begun to set aside those distractions that weighed me down, including the most significant, which was gaming.

I am not saying that total silence, total Gnosticism, total avoidance of the things that make this world our home is the best way.  I claim that research and guru-ism and the previous way of life of the Tamahumara all have things to teach us about how we should live.

In our human history, certainly for at least ~6000 years of it, there has been a great deal of silence.  Silence in the form of lack of advertising, silence in the form of lack of alternative spices, silence in the form of lack of alternative materials for clothing, silence in the form of lack of common entertainment.

150 years ago, if you wanted to see a show, listen to a musician or band, or taste a foreign cuisine, you would have to go Somewhere Else.  When you got There, the object of your journey would be the entire fulfillment of your adventure.  There would be silence along the journey, and silence on the return.  You would return to your common portion of existence.  You would have your memories of the experience, and any discussion with friends or family as reminders.

Today, you wouldn’t have to go There.  You could Download a copy of the song, or watch the show on YouTube.  You could get a recipe Online, and find the ingredients at a local grocery store.  You wouldn’t have to feel a bumpy road, whether through shoes, moccasins, stage-coach, or horseback.

Our modern society is not bad.  But the society we left behind has good in it as well.  We often forget the silence we have been bred into under the vast majority of our human existence.  This modern age of noises is new and foreign to our DNA.  We must not forget that as we transition ourselves and our fellow cultures into it.


Just as the dictionary is only as good as itself, this post will only survive as long as those who read it give its words definition within their own minds.

Comedians have often exposed the inconsistencies within our society; things like having an invisible fence, why not have an invisible dog?

Here is another inconsistency: definition.  Is it high definition, standard definition, normal definition, medium definition, or low definition?

Advertising is a major force attempting to compel common language.  Who would ever choose low definition, or even standard definition, given the option of high definition?  Why would anyone choose a payment up front, given the choice of delayed repayments, however frequent or inconvenient?  Why would anyone choose freedom, given the option of security or luxury or peace?

It’s not surprising that “high definition” is (becoming) a way of life.  We want the best.  As humans, we strive for the best.  Advertisements seek to entice us with “the best.”  And the culture of being advertised to also influences every decision.  It isn’t just about who our friends or parents or schoolfellows or coworkers are anymore.  It’s about who advertises to us.  This has been true as long as “Mad Men” has been a reality, and probably before that.  It’s about what Hollywood advertises, as much as what our families think of that perspective.  It’s about what our politicians and neighbors with political interest advertise, as much as about the real issues at hand.

But what *is* the best?  How shall we know that we really have what is best for us?  Is it possible to quantify the idea that “now, I really have what is best for me”?  If so, why would we seek anything else?  Expense (monetary, familial, friendship, etc.)?  Laziness?

In my opinion, it is always the latter.  We, as humans, are always lazy.  It does not matter whether we are type A, type B, or type Z.  Everyone is lazy.  It is always easier to commit to something familiar than what is unfamiliar.  It is always easier to listen to the advertisement voices that have been shouting for generations than to listen to the still, small, inner voices that have been there for eternity.  It is always easier to grasp the fad and the common thread than to search, and seek, and find the truth of every contrary voice that arrives.  It is always easier to “give in” than to strive.

So what is our definition?  What does it mean to be human?  Is it reasonable to give in, and allow something that is not “the best” to be a part of our experience?  Is it possible that something better, but much more difficult, is actually worth our time and consideration?  Do we have time and consideration to offer to something or someone that might be “the best” for us?  Will we someday “arrive,” or have we been attempting to “arrive” for centuries, and as humans we can never “arrive” on our own?

If it isn’t the best, why strive for it?  If it is better than anything else you can see or imagine, why not put forth every effort to make it a reality?

Integer Polynomial Lattice

We now know that any integer polynomial specified by a vector _a_ in Z^n as

(1) p(x) = a_(n-1) x^(n-1) + a_(n-2) x^(n-2) + … + a_2 x^2 + a_1 x + a_0

can also be specified by a vector _b_ in Z^n as

(2) p(x) = b_0 + b_1 x + b_2 (x | 2) + b_3 (x | 3) + … + b_(n-2) (x | n-2) + b_(n-1) (x | n-1)

where b_i = p^[i](0), the value at zero of the i-th pseudo-derivative of p(x).  It is also possible to demonstrate integer polynomials for any given _b_ for which there is no corresponding _a_, for example, _b_ = [0, 0, 1] in Z^3, i.e., p(x) = (x | 2) = x(x-1)/2.

Now we shall construct a polynomial lattice to demonstrate some of the continuity between polynomials of degree n and polynomials of the next higher or lower degree.

Begin with a given vector _b_ in Z^n, specifying a polynomial p(x) as in (2) above.  We further specify some new notation as follows:

_b`_ (read as “vector b downshift”) is a vector shift notation with values as follows: b`_0 = b_1, b`_1 = b_2, …, b`_n-3 = b_n-2, b`_n-2 = b_n-1, b`_n-1 = 0.

_b^(-k)_ (read as “vector b chopped by k”) contains values as follows: b^(-k)_0 = b_0, b^(-k)_1 = b_1, …, b^(-k)_n-k-1 = b_n-k-1, b^(-k)_n-k = 0, b^(-k)_n-k+1 = 0, …, b^(-k)_n-2 = 0, b^(-k)_n-1 = 0

With the _b`_ notation, we will also apply the _b^[i]_ notation to denote the i-th vector-shift of _b_.

The _b`_ or _b^[i]_ and _b^(-k)_ notations may be used simultaneously as _(b^[i])^(-k)_.  The order does not matter, the last k elements will be chopped to 0 prior to the downshift.

Let p(x; _a_, b, c) be an integer polynomial in x parametrized by _a_, b and c which is specified as follows:

p(x; _a_, b, c) = a_0 + a_1 (x | 1) + a_2 (x | 2) + … + a_n-2 (x | n-2) + a_n-1 (x | n-1) + b(c | n)

We use p(x; _b_, 0, 0) as the starting point of the lattice.

First, we have r(x; y) = p(x, _b^(-1)_, b_n-1, y) is the set of polynomials such that for any given x_0, p(x_0; _b_, 0, 0) = r(x_0; x_0).  This creates one axis of our lattice.

Next, we choose a value k to parametrize a second axis as follows:

q(x; k) = p(x; _b_ + (k | 1) _(b^[1])^(-1)_ + (k | 2) _(b^[2])^(-1)_ + … + (k | k-2) _(b^[k-2])^(-1)_ + (k | k-1) _(b^[k-1])^(-1)_ + _(b^[k])^(-1)_, 0, 0)

This specification identifies q(x; k) as having the same value as r(x+k; x).

Given these definitions, it is possible to identify every value of p(x; _b_, 0, 0) using polynomials of degree n-2, namely r(x; y), and it is further possible to specify the values of each of those polynomials by the values of q(x; k).  r(x; y) is degree n-2 and q(x; k) is degree n-1, demonstrating that the two sets of polynomials cover the result set differently, but in the end they both cover exactly the same result space.