Optimizing Google’s Compression Algorithm Zopfli (Stage 2)

Test Building

Post Notes: All tests and optimizations can be downloaded from my GitHub repo.

Re-cap

Zopfli, a compression algorithm library written in C designed with performance over speed.

I went back to the drawing board on my original “time benchmark” and felt I needed to implement something a little bit more flexible. To start off I created the following files to store my test declarations and implementations:

zopfli/src/zopfli/functionTimer.h
zopfli/src/zopfli/functionTimer.c

Which contains the following functions:

void setTotalTime(double tt); //sums the total execution time after each funciton call
void getFunctionTimerStats(); //displays the results of the tests
void startTimer(); //starts the timer
void stopTimer(); //stops the timer

In order to keep the testing as consistent and simple as possible, I had executed my tests on two identical copies of Zopfli.
One named zopfli (master branch) and the other zopfli_hacked (optimization_testing branch).

I used a 10MB test file from engineerhammad.com. This site is great for downloading files of various sizes to use during your testing.

Each test was run by compressing to a new file as opposed to appending an existing .gz. This ensures that the same routes are executing during testing.

To test my newly created tests (test inception) I wrapped a call to the ZopfliResetHash function with startTimer() and stopTimer().

Source:marcobeltempo/zopfli/blob/optimization_tester/src/zopfli/squeeze.c

if (instart == inend) return 0;
startTimer();
ZopfliResetHash(ZOPFLI_WINDOW_SIZE, h);
stopTimer();

In order to get the test results, execute:
time ./zopfli filename
Which will display every instance of the function and a summary of results upon succesfull compression of the file.

Below is the benchmark of the unmodified Zopfli program.

[mabeltempo@aarchie zopfli]$ time ./zopfli ../test1Mb.db
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 1 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 2 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 3 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 4 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 5 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 6 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 7 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 8 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 9 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 10 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 11 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 12 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 13 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 14 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 15 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 16 -[EXEC TIME#]: 0.000232 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 17 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 18 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 19 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 20 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 21 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 22 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 23 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 24 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 25 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 26 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 27 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 28 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 29 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 30 -[EXEC TIME#]: 0.000231 sec.
[TESTING FUNCTION]-(ZopfliResetHash){ORIGINAL}-[EXEC#]: 31 -[EXEC TIME#]: 0.000231 sec.
-------------RESULTS--------------
Average Execution Time: 0.000231 sec.
----------------------------------
Total Time: 0.007176 sec.

real 0m0.947s
user 0m0.926s
sys 0m0.020s

If we look at the results we can see that each function call would execute between 0.000231 - 0.000232 sec.Total time spent executing in the program was 0.007176 sec. I ran this multiple times and yielding fairly consistent results.

Optimizations

Now for the purpose of these test’s we’ll be focusing on the following function:

Source:marcobeltempo/zopfli/blob/master/src/zopfli/hash.c#L45-L89

/* Resets all fields of ZopfliHash. */
void ZopfliResetHash(size_t window_size, ZopfliHash* h) {
size_t i;

h->val = 0;
for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */
}
for (i = 0; i < window_size; i++) { h->prev[i] = i; /* If prev[j] == j, then prev[j] is uninitialized. */
h->hashval[i] = -1;
}

#ifdef ZOPFLI_HASH_SAME
for (i = 0; i < window_size; i++) { h->same[i] = 0;
}
#endif

#ifdef ZOPFLI_HASH_SAME_HASH
h->val2 = 0;
for (i = 0; i < 65536; i++) { h->head2[i] = -1;
}
for (i = 0; i < window_size; i++) { h->prev2[i] = i;
h->hashval2[i] = -1;
}
#endif
}

The purpose of this function is to ensure that the hash is in a safe state before continuing compression.

At first glance, we can pinpoint a few potential optimizations.
This loop simply loops through a static number of 65536 and resets all the elements in head to -1.


for (i = 0; i < 65536; i++) { h->head[i] = -1; /* -1 indicates no head so far. */ }

If we work our way down we notice that this same method is repeatedly used.

Since this value is statically declared, why not create a constant #define?


#define HASH_SHIFT 5 #define HASH_MASK 32767 #define HASH_LENGTH 65536

Now since we’ve already confirmed that this head array is of a fixed size, is it really necessary to loop through each element?

I replaced the loop with a memset function that sets the newly defined constant size to -1“`

memset(h->head, -1, sizeof(int) * HASH_LENGTH);


Now let's rebuild with the program using the `make` command and run our tests to see if we've gained any performance improvements.

—————-RESULTS—————–
Original Average Execution Time: 0.000231 sec.

Hacked Average Execution Time: 0.000184 sec.

Original Total Time: 0.007187 sec.

Hacked Total Time: 0.005693 sec.

Difference of : 0.001494 sec.

Optimized by: 20.79%


WOW! Just from those two minor changes, we were able to optimize execution time by 21%! At this point, I was curious to see what would happen if applied this same change to the other loops within the ZopfliResetHash function. To my surprise, it actually reduced performance around 3%. This could be due to duplicate calls since `memset` is called at the beginning of the function. Another optimization candidate that I discovered was in the <a href="https://github.com/google/zopfli/blob/master/Makefile#L4-L5" rel="noopener">Makefile</a>. By default, Zopfli was being built using the `-O2` optimization flag.

CFLAGS = -W -Wall -Wextra -ansi -pedantic -lm -O2 -Wno-unused-function
CXXFLAGS = -W -Wall -Wextra -ansi -pedantic -O2



-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options. Source
Let's see what happens if we bump it up to `-O3` along with our recent changes.

CFLAGS = -W -Wall -Wextra -ansi -pedantic -lm -O3 -Wno-unused-function
CXXFLAGS = -W -Wall -Wextra -ansi -pedantic -O3


I honestly couldn't believe the results at this point...

—————-RESULTS—————–
Original Average Execution Time: 0.000231 sec.

Hacked Average Execution Time: 0.000079 sec.

Original Total Time: 0.007187 sec.

Hacked Total Time: 0.002439 sec.

Difference of : 0.004748 sec.

Optimized by: 66.06%

“`

Simply changing the compile flag along with our previous optimizations, we experience a 66% increase!

Results

Being able to analyze a project such as Zopfli and pinpoint areas for potential optimization as definitely a valuable learning experience. This gave me the opportunity to explore unfamiliar code and combine various optimization methods to enhance overall performance.  I also learned that not all “enhancements” have to be huge changes. The smallest tweaks can make all the difference. Successfully optimizing Zopfli without hindering its functionality came with a sense of pride and reward. Considering there are plenty of other opportunities for optimization,  I plan to continue my research on Zopfli and experiement with new ideas.

Related Posts

Categories