Menu
News
All News
Dungeons & Dragons
Level Up: Advanced 5th Edition
Pathfinder
Starfinder
Warhammer
2d20 System
Year Zero Engine
Industry News
Reviews
Dragon Reflections
White Dwarf Reflections
Columns
Weekly Digests
Weekly News Digest
Freebies, Sales & Bundles
RPG Print News
RPG Crowdfunding News
Game Content
ENterplanetary DimENsions
Mythological Figures
Opinion
Worlds of Design
Peregrine's Nest
RPG Evolution
Other Columns
From the Freelancing Frontline
Monster ENcyclopedia
WotC/TSR Alumni Look Back
4 Hours w/RSD (Ryan Dancey)
The Road to 3E (Jonathan Tweet)
Greenwood's Realms (Ed Greenwood)
Drawmij's TSR (Jim Ward)
Community
Forums & Topics
Forum List
Latest Posts
Forum list
*Dungeons & Dragons
Level Up: Advanced 5th Edition
D&D Older Editions
*TTRPGs General
*Pathfinder & Starfinder
EN Publishing
*Geek Talk & Media
Search forums
Chat/Discord
Resources
Wiki
Pages
Latest activity
Media
New media
New comments
Search media
Downloads
Latest reviews
Search resources
EN Publishing
Store
EN5ider
Adventures in ZEITGEIST
Awfully Cheerful Engine
What's OLD is NEW
Judge Dredd & The Worlds Of 2000AD
War of the Burning Sky
Level Up: Advanced 5E
Events & Releases
Upcoming Events
Private Events
Featured Events
Socials!
EN Publishing
Twitter
BlueSky
Facebook
Instagram
EN World
BlueSky
YouTube
Facebook
Twitter
Twitch
Podcast
Features
Top 5 RPGs Compiled Charts 2004-Present
Adventure Game Industry Market Research Summary (RPGs) V1.0
Ryan Dancey: Acquiring TSR
Q&A With Gary Gygax
D&D Rules FAQs
TSR, WotC, & Paizo: A Comparative History
D&D Pronunciation Guide
Million Dollar TTRPG Kickstarters
Tabletop RPG Podcast Hall of Fame
Eric Noah's Unofficial D&D 3rd Edition News
D&D in the Mainstream
D&D & RPG History
About Morrus
Log in
Register
What's new
Search
Search
Search titles only
By:
Forums & Topics
Forum List
Latest Posts
Forum list
*Dungeons & Dragons
Level Up: Advanced 5th Edition
D&D Older Editions
*TTRPGs General
*Pathfinder & Starfinder
EN Publishing
*Geek Talk & Media
Search forums
Chat/Discord
Menu
Log in
Register
Install the app
Install
Community
General Tabletop Discussion
*Geek Talk & Media
in search for a ( decent ) analyst programmer, project : The Unlimited Compressor
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="le Redoutable" data-source="post: 9293442" data-attributes="member: 7031865"><p>the goal is to create from a homogeneous Winzip-type file a heterogeneous file which will then be eligible to a new Winzip-file, with a gain even as low as 1-2%;</p><p></p><p>what is a homogeneous file ?</p><p>it is a file where all Byte values are represented with a ratio of say 1/180 to 1/300 ( the ideal would be 1/256 if occurences of each value were perfectly homogeneous )</p><p>what is a heterogeneous file ?</p><p>it is a file where some values appear more often than some others;</p><p>for example, text files are essentially composed of occurences of values from 32 to 128 ( or so )</p><p></p><p>so, here's my method :</p><p></p><p>first, some statistics :</p><p>find the value with the most occurences;</p><p>as I printed above, the most occurent value will give a ratio ( for example ) of 1/180;</p><p>that means you can use offsets for each occurence of that value within a Byte ( because statistics say offsets shouldn't exceed 180, then 255 ( the max value you can print within a Byte ) should rarely get exceeded;</p><p>still sometimes you may end up with an offset of ( 280, 400, or even 850 ) , so you can easily rule that , if you print an offset of 255 it means the offset is equal to 254 + another Byte of 0 to 254 , which again if equal to 255 means you have an offset of 254+254+ another Byte etc</p><p>The only problem with appending too much offset values is it adds to the length of the output file ( well, beginning with the most occurent Value somehow mitigates this problem )</p><p></p><p>ok.</p><p></p><p>here's the idea :</p><p>in lieu of Byte values you use offsets for each Byte value ( in the order of from the most occurent Byte Value down to the less occurent Byte Value )</p><p>then, as you print offsets you put a flag where in the original file you located the said offset;</p><p>then, each time you check for occurences ( that is, because there are 256 values from 0 to 255 , you will do 256 times the job ) , each time you find a flag you don't add to the offset for the n-th value </p><p>quickly an example for a file of 20 Bytes , composed of 6 Values ( 39, 44, 11, 18, 74, 78 ):</p><p>01 39</p><p>02 44</p><p>03 39</p><p>04 11</p><p>05 18</p><p>06 18</p><p>07 11</p><p>08 78</p><p>09 39</p><p>10 11</p><p>11 44</p><p>12 39</p><p>13 18</p><p>14 11</p><p>15 11</p><p>16 11</p><p>17 74</p><p>18 44</p><p>19 78</p><p>20 39</p><p></p><p>first, the statistics :</p><p>39 5</p><p>44 3</p><p>11 6</p><p>18 3</p><p>74 1</p><p>78 2</p><p></p><p>sorted ( and printed to the output file ) :</p><p>1 11</p><p>2 39</p><p>3 44</p><p>4 18</p><p>5 78</p><p>6 74</p><p></p><p>now look at this :</p><p>01 39 +</p><p>02 44 +</p><p>03 39 +</p><p>04 11 + ( offset is 04 - 00 = 4 )</p><p>05 18 +</p><p>06 18 +</p><p>07 11 + ( offset is 07 - 04 = 3 )</p><p>08 78 +</p><p>09 39 +</p><p>10 11 + ( offset is 10 - 07 = 3 )</p><p>11 44 +</p><p>12 39 +</p><p>13 18 +</p><p>14 11 + ( offset is 14 - 10 = 4 )</p><p>15 11 + ( offset is 15 - 14 = 1 )</p><p>16 11 + ( offset is 16 - 15 = 1 )</p><p>17 74 </p><p>18 44 </p><p>19 78 </p><p>20 39 </p><p></p><p>so the output file looks like :</p><p>4</p><p>3</p><p>3</p><p>4</p><p>1</p><p>1</p><p></p><p>next value ( 39 ) :</p><p>01 39 +1</p><p>02 44 +</p><p>03 39 +2</p><p>04 11 . (here's a flag )</p><p>05 18 +</p><p>06 18 +</p><p>07 11 .</p><p>08 78 +</p><p>09 39 +4 ( 09 - 03 , -1-for-flag-at-04 ,-1-for-flag-at-07 )</p><p>10 11 .</p><p>11 44 +</p><p>12 39 +3-1 = 2</p><p>13 18 +</p><p>14 11 .</p><p>15 11 .</p><p>16 11 .</p><p>17 74 +</p><p>18 44 +</p><p>19 78 +</p><p>20 39 +8-3 = 5</p><p></p><p>adding to the output file :</p><p>1</p><p>2</p><p>4</p><p>2</p><p>5</p><p></p><p>next value ( 44 ) :</p><p>01 39 .</p><p>02 44 1</p><p>03 39 .</p><p>04 11 .</p><p>05 18 +</p><p>06 18 +</p><p>07 11 .</p><p>08 78 +</p><p>09 39 .</p><p>10 11 .</p><p>11 44 4</p><p>12 39 .</p><p>13 18 +</p><p>14 11 .</p><p>15 11 .</p><p>16 11 .</p><p>17 74 +</p><p>18 44 3</p><p>19 78 </p><p>20 39 </p><p></p><p>adding to the output file :</p><p>1</p><p>4</p><p>3</p><p></p><p>next value ( 18 ) :</p><p>01 39 .</p><p>02 44 .</p><p>03 39 .</p><p>04 11 .</p><p>05 18 1</p><p>06 18 1</p><p>07 11 .</p><p>08 78 +</p><p>09 39 .</p><p>10 11 .</p><p>11 44 .</p><p>12 39 .</p><p>13 18 2</p><p>14 11 </p><p>15 11 </p><p>16 11 </p><p>17 74 </p><p>18 44 </p><p>19 78 </p><p>20 39 </p><p></p><p>adding to the output file :</p><p>1</p><p>1</p><p>2</p><p></p><p>etc</p><p></p><p>note that as you advance in the less common values , the offsets become low ( and that's exactly what the program is for )</p><p>in a huge file you should end up with a lot more of low values ( like 001 , 050, 030 etc ) than big ones ( 220, 190 etc )</p><p>here's what I call a heterogeneous file <img src="https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f642.png" class="smilie smilie--emoji" loading="lazy" width="64" height="64" alt=":)" title="Smile :)" data-smilie="1"data-shortname=":)" /></p><p></p><p>if my vision is correct, you will be able to Winzip the output file, giving birth to a new zip file, which in turn will be re-heterogeneoused, for even a 1% gain ( but repeated 1.000 times ( or 1.000.000 times if you want to transfer a 1GB file to a floppy 720 ko lol )</p><p></p><p>so, where am I wrong ?</p></blockquote><p></p>
[QUOTE="le Redoutable, post: 9293442, member: 7031865"] the goal is to create from a homogeneous Winzip-type file a heterogeneous file which will then be eligible to a new Winzip-file, with a gain even as low as 1-2%; what is a homogeneous file ? it is a file where all Byte values are represented with a ratio of say 1/180 to 1/300 ( the ideal would be 1/256 if occurences of each value were perfectly homogeneous ) what is a heterogeneous file ? it is a file where some values appear more often than some others; for example, text files are essentially composed of occurences of values from 32 to 128 ( or so ) so, here's my method : first, some statistics : find the value with the most occurences; as I printed above, the most occurent value will give a ratio ( for example ) of 1/180; that means you can use offsets for each occurence of that value within a Byte ( because statistics say offsets shouldn't exceed 180, then 255 ( the max value you can print within a Byte ) should rarely get exceeded; still sometimes you may end up with an offset of ( 280, 400, or even 850 ) , so you can easily rule that , if you print an offset of 255 it means the offset is equal to 254 + another Byte of 0 to 254 , which again if equal to 255 means you have an offset of 254+254+ another Byte etc The only problem with appending too much offset values is it adds to the length of the output file ( well, beginning with the most occurent Value somehow mitigates this problem ) ok. here's the idea : in lieu of Byte values you use offsets for each Byte value ( in the order of from the most occurent Byte Value down to the less occurent Byte Value ) then, as you print offsets you put a flag where in the original file you located the said offset; then, each time you check for occurences ( that is, because there are 256 values from 0 to 255 , you will do 256 times the job ) , each time you find a flag you don't add to the offset for the n-th value quickly an example for a file of 20 Bytes , composed of 6 Values ( 39, 44, 11, 18, 74, 78 ): 01 39 02 44 03 39 04 11 05 18 06 18 07 11 08 78 09 39 10 11 11 44 12 39 13 18 14 11 15 11 16 11 17 74 18 44 19 78 20 39 first, the statistics : 39 5 44 3 11 6 18 3 74 1 78 2 sorted ( and printed to the output file ) : 1 11 2 39 3 44 4 18 5 78 6 74 now look at this : 01 39 + 02 44 + 03 39 + 04 11 + ( offset is 04 - 00 = 4 ) 05 18 + 06 18 + 07 11 + ( offset is 07 - 04 = 3 ) 08 78 + 09 39 + 10 11 + ( offset is 10 - 07 = 3 ) 11 44 + 12 39 + 13 18 + 14 11 + ( offset is 14 - 10 = 4 ) 15 11 + ( offset is 15 - 14 = 1 ) 16 11 + ( offset is 16 - 15 = 1 ) 17 74 18 44 19 78 20 39 so the output file looks like : 4 3 3 4 1 1 next value ( 39 ) : 01 39 +1 02 44 + 03 39 +2 04 11 . (here's a flag ) 05 18 + 06 18 + 07 11 . 08 78 + 09 39 +4 ( 09 - 03 , -1-for-flag-at-04 ,-1-for-flag-at-07 ) 10 11 . 11 44 + 12 39 +3-1 = 2 13 18 + 14 11 . 15 11 . 16 11 . 17 74 + 18 44 + 19 78 + 20 39 +8-3 = 5 adding to the output file : 1 2 4 2 5 next value ( 44 ) : 01 39 . 02 44 1 03 39 . 04 11 . 05 18 + 06 18 + 07 11 . 08 78 + 09 39 . 10 11 . 11 44 4 12 39 . 13 18 + 14 11 . 15 11 . 16 11 . 17 74 + 18 44 3 19 78 20 39 adding to the output file : 1 4 3 next value ( 18 ) : 01 39 . 02 44 . 03 39 . 04 11 . 05 18 1 06 18 1 07 11 . 08 78 + 09 39 . 10 11 . 11 44 . 12 39 . 13 18 2 14 11 15 11 16 11 17 74 18 44 19 78 20 39 adding to the output file : 1 1 2 etc note that as you advance in the less common values , the offsets become low ( and that's exactly what the program is for ) in a huge file you should end up with a lot more of low values ( like 001 , 050, 030 etc ) than big ones ( 220, 190 etc ) here's what I call a heterogeneous file :) if my vision is correct, you will be able to Winzip the output file, giving birth to a new zip file, which in turn will be re-heterogeneoused, for even a 1% gain ( but repeated 1.000 times ( or 1.000.000 times if you want to transfer a 1GB file to a floppy 720 ko lol ) so, where am I wrong ? [/QUOTE]
Insert quotes…
Verification
Post reply
Community
General Tabletop Discussion
*Geek Talk & Media
in search for a ( decent ) analyst programmer, project : The Unlimited Compressor
Top