Compression Library

joecooldoo · May 5, 2022, 7:59pm

Compress and decompress strings!

owlsss · May 5, 2022, 8:33pm

uhm

joecooldoo · May 5, 2022, 8:39pm

Thats supposed to happen

pumpkinhead · May 5, 2022, 8:41pm

Yes i’m guessing it’s supposed to be a binary string.

Is your algorithm some sort of LZW related thing?

joecooldoo · May 5, 2022, 8:42pm

Sorry, I don't understand. What is LZW?

pumpkinhead · May 5, 2022, 9:17pm

It's a compression algorithm. I don't know how it works though.

bh · May 5, 2022, 10:54pm

Nice project, and a clever idea for compression.

I see two problems. One is that you can't decompress a text unless you have the COMPRESSION LIBRARY variable left over from having compressed it in the same instance of the project. In general, that's not the case; you save a compressed text in a file somewhere, and then a month later someone else comes along and wants to decompress it. In order for that to work, you have to include the library in the compressed file along with the compressed text. This is okay in principle; many real-world compression algorithms do save dictionaries along with compressed texts. But in your algorithm the size of the dictionary is roughly the same as the size of the text, which eliminates the size reduction.

The second problem is about Unicode. Although every Unicode character has a code, the reverse isn't true; not every Unicode code corresponds to a character. So I think compressing may not work.

joecooldoo · May 5, 2022, 11:02pm

A. I made this based on the idea of sending data over the cloud without taking up space.
B. You store the data in the variable, so when you save and loading the project, that data is still there.
C. Yeah, you can only have a certain amount of items on the list, but after that, some strings can't be compressed.

joecooldoo · May 5, 2022, 11:04pm

Sorry if this doesn't make sense, I'm working on hashing passwords, creating files, and all that hullabaloo for a server. And I haven't used Snap! in a while.

Snipet

@app.route('/account/create/<username>/<password>')
def newAccount(username, password):
  print('Checking for exsisting accounts...')
  try:
    file = open("accounts/" + username, "r")
    file.close()
    print("Account already exists!")
    print("Action Completed Successfully.")
    return "Unsuccessful"
  except:
    print("Account name is open.")
    print("Creating account...")
    key = str(hashlib.sha3_512(password.encode()).digest())
    file = open("accounts/" + username, "a+")
    file.write(key)
    file.close()
    print("Account Created.")
    print("Action Completed Successfully.")
    return "Successful"

bh · May 6, 2022, 12:38am

Ah, I see. That makes more sense then. Someone was talking about LZW compression in another thread and that got me thinking you wanted a general purpose compression algorithm.

There’s no limit on the size of a list; we routinely use lists of 100K to 1M items in data science projects. Of course there’s a browser-imposed limit on the total memory in use, but if you’re pushing that limit, compressing strings probably won’t be enough to save you anyway. So, don’t worry about limits to the size of the list.

So, I tried actually running your example. You compressed ≈10K characters to ≈5K, which is a 50% reduction. But the impressive part is that your dictionary’s size is ≈500 items, only 10% of the length of the compressed string. I didn’t expect that; I underestimated how much benefit you get from the commonness of common digrams. So I take it back about the size reduction.

Am I also wrong about the Unicode problem?

joecooldoo · May 6, 2022, 12:41am

What I meant about being

I meant that you can run out of Unicode numbers to represent an item in the list. It’s really hard to do it on accident though.

bh · May 6, 2022, 12:50am

Oh I see. Yes, and I'd worry about it if you didn't get such great compression of the dictionary. But I bet the size of the dictionary grows sublinearly in the length of the text. Maybe beyond a certain point it just doesn't grow at all; there are only 26*26 digraphs of letters, and probably fewer than 500*500 digraphs of commonly-used Unicode symbols altogether.

erikkoevsnap · May 13, 2022, 3:05pm

what have i done 0_0
apparently constantly decompressing and recompressing text results in gibberish

owlsss · May 13, 2022, 3:54pm

etdsgs idsyo dgeotyodsn!

what

erikkoevsnap · May 14, 2022, 12:33pm

i literally said, that's the result of

which apparently results in gibberish

joecooldoo · May 14, 2022, 7:54pm

You did decompress two times in a row. Maybe you should check your code before saying there is a problem.

As you can see, it works perfectly fine:

scratchmodification · May 14, 2022, 10:03pm

It is gibberish for me just doing it once:
Edit: Apparently it works the first time you do it...