It now uses custom alloc_aligned() wrapper for all allocations,
therefore all allocations are larger than expected by (64 + sizeof(void*)).
Further, they are seen as allocations of 1 element. Relevant calculations
were adjusted to reflect this, and state allocation is now protected
with a flag to avoid misinterpreting other allocations as the zlib
deflate_state allocation.
Further, it no longer forces window bits to 13 on compression level 1,
so the comment was adjusted to reflect this.