One Trillion Is a Big Number

Published on:
May 13, 2019

Some of the most interesting questions I’ve ever encountered come from kids, and this is no exception. Recently a co-worker told me his kids wanted to know how long it would take a computer to count to a trillion. I’m exactly the kind of nerd he needed to dig in and find his kids an answer. As it turns out the answer is many orders of magnitude different than I thought it would be, but let’s not get ahead of ourselves.

If you’re not interested in reading a bunch of numbers and stuff, it takes about 30 minutes for my computer to count to 1,000,000,000,000.

Step One: Research


1,000,000,000,000 is a big number. It is, literally, astronomically large. I googled “how long to count to 1 trillion,” and the only answers were for how long it would take a human to count to 1 trillion (31 years, apparently!). Googling “how long for a computer to count to 1 trillion” leads to an interesting blog post, but the program used in the blog was in Java. Whatever the fastest way for a computer to count to a trillion is, it’s probably not Java. That blogger's program took over 4 hours. This blogger thinks he can do better.

Here's how to count to 1 trillion.

Step Two: The Bad Solutions


The fastest way I know to get a computer to count is to write a loop that increments a stack variable in C and doesn't do anything else. Here, note that we have to declare our variable a "long long" so that we have enough bits. 1 trillion is a big number.

trillion1.c:

#include <stdio.h>
int main(void) {
    for (long long i = 0; i < 1000000000000L; i++)
        ;
    printf("1 Trillion!\n");
}

That will definitely count to 1 Trillion, but I’m very impatient. Without knowing whether that’s going to be running for minutes or years I’m not sitting around staring at an empty terminal. I need to see something happen to keep my millinial attention.

trillion2.c

#include <stdio.h>
int main(void) {
    for (long long i = 0; i < 1000000000000L; i++)
        printf("%lld\n", i);
    printf("1 Trillion!\n");
}

Better, but all that printing will keep the program blocked for I/O very often. After about a minute that program was still in the low 10 millions. Unacceptable. I needed to separate the printing from the incrementing.

Step Three: Threading


Here I had to make a decision. What does it mean to count to 1 trillion? If 4 separate threads each count to 250 billion is that the same thing as counting to 1 trillion? I decided that doesn’t count. If me and four other people count to 5, none of us has counted to 20.

In this case, we can use one thread to increment a variable, and another one to print it repeatedly until the value hits 1 trillion. This example uses 1 posix thread to increment a global variable while the main thread continuously prints its value.

trillion.cpp

                           
#include <stdio.h>
#include <pthread.h>

long long i;
void *run(void *param);
using namespace std;

int main(void) {
    pthread_t tid;
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    i = 0;
    void *nothing = nullptr;
    pthread_create(&tid, &attr, run, nothing);

    while(i < 1000000000000L)
        printf("%lld\n\n", i);

    pthread_join(tid, nullptr);

    printf("%lld\n", i);
}

void *run(void *param) {
    while (i < 1000000000000L)
        i++;
    pthread_exit(0);

The run function gets called when the new thread is created, and its sole purpose is to increment i until it hits 1 trillion. Meanwhile our main thread keeps repeatedly printing i until it hits 1 trillion. This baby counts to a trillion in about a half an hour.

Can We Do Better?


You might be thinking “If you use one thread for incrementing, why not more?” When you compile the code above with g++'s -S flag and examine the assembly generated by the compiler, you'll notice this section.

trillion.s routine L11:

.L11:
	movl    i(%rip), %rdx
	movabsq $999999999999, %rax
	cmpq    %rax, %rdx
	jg      .L10
	movq    i(%rip), %rax
	addq    $1, %rax
	movq    %rax, i(%rip)
	jmp     .L11

This is the loop that increments the variable i. This loop loads the variable i into the rdx register, loads the constant (1trillion - 1) into the rax register, and compares them. If the rdx is greater it jumps to the routine that handles exiting the thread. It then moves the value of i into rax, adds one to rax, and moves the resulting value into i before repeating the loop.

This compilation does NOT work with multiple threads trying to use this value. If one thread has to wait on CPU time between adding 1 to the rax register and storing back into i, it could be overwriting another thread's work and slowing the whole process down. It is possible, though, to modify it to the following:

.L11:
	movq    i(%rip), %rdx
	movabsq $999999999999, %rax
	cmpq    %rax, %rdx
	jg      .L10
	incq    i(%rip)
	jmp     .L11

This increments i directly where it is in memory, but this really just shifts the same problem one level down. On the microcode level, the value stored at i still has to be loaded into the processor, incremented, and written back into memory. It might make it happen a little faster so race conditions show up less often, but 1 trillion is a big number and these collisions will still happen often enough to gum up the works.

So the answer to “Can we do better?” is no. Or at least I can’t. If you’re reading this and have an improvement I don’t know about please let me know!

Conclusion


It takes my computer about half an hour to count to 1 trillion the fastest way I could figure out how to make it. If you’ve read this far and have another question you’d want to read about, send me an email. I’d love to do way more work than necessary to answer your simple question.