Compress Strings

Using a Pairing Function, specifically the Cantor Paring Function, I was able to create a simple compression library:
https://snap.berkeley.edu/snap/snap.html#present:Username=joecooldoo&ProjectName=Cantor%20Compress

Unfortunately, you are limited to about the 750th or so Unicode because of how the custom block works. Maybe someone can write a base64 encoder/decoder (but it would only be useful for this project if it didn't modify the length of the string).

Very interesting :slight_smile: I'd never heard of the idea of number pairing before :slight_smile:

[edit] Info for others since stuff I've found on T'Iinternet is very mathy

So basically you can replace a pair of numbers with a single unique number number i.e

0 and 0 gives 0
0 and 1 gives 2
1 and 0 gives 1
1 and 1 gives 4

and this way you can reduce a pair of numbers to a single number (via a formula) so effectively compressing your data.

The compression is reversible so you can always get original pair of numbers back again

Glad you learned something from this! :slight_smile:

But the new numbers are wider! In your example the two input numbers are one bit wide, but the output number is three bits wide. (It'd be two bits if you turned (1,1) into 3 instead of 4. Then the number of bits would be exactly unchanged.)

Edit: Okay, I read the article. It seems to be interested, not in reducing the size of a particular finite string, but in setting an upper limit on the size of a particular infinite set by reducing its number of dimensions. For example, a rational number is a pair of integers. If we can exhibit an invertible function that turns a pair of integers into a single integer, then we know that the number of rational numbers is no larger than the number of integers. As far as compression is concerned, I think this will be helpful only if the original representation is very wasteful of space; in particular, if you take a text string and instead of representing each character as a byte you represent each character as a 32-bit integer, you can compress the string by putting four characters in each integer instead.