Enterprise Forever
:UK => Programming => Topic started by: ssr86 on 2015.November.10. 22:05:13
-
I thought that maybe I could put here some examples for drawing "16 color mode packed sprites" that I once wrote with cpc in mind but most should still be valid for the Enterprise too...
I don't think anyone will ever need to use the "techniques" described here, but nevertheless it was fun to do...
I'll try to put them in seperate posts for better readability...
We can divide the methods based on how many bits we will use per pixel... So for 16 color modes where each pixel is 4bits we could cut it to 3,2 or 1 bit per pixel...
Drawing will be considerably slower but we can save some memory... Although most of the time speed is the priority, so... not very useful I guess.
;;--------------------------------------------------
;; 3BPP PACKED SPRITES
;;--------------------------------------------------
For the 3bpp we assume that the sprite data is stored in three byte packages.
One package contains pixel data needed for drawing four pixel bytes.
This way we save 25% of memory needed to store the sprite data.
We zero one bit pair of each pixel byte, so the number of different colors that can be used for the sprite graphics is limited to eight.
If we count how many of the three significant bit pairs of the four packed pixel bytes we store in each byte of a package, we get the following combinations:
1. |3-1|3-1|3-1|
2. |3-1|2-2|3-1|
3. |1-1-1-1|2-2|2-2|
4. |1-1-1-1|1-1-1-1|1-1-1-1|
Where, for example, 1-1-2 means that in this particular byte we store one bit pair from one pixel byte, one bit pair from one other pixel
byte and two bit pairs form yet another pixel byte.
In the package format desacription, pij means j-th pixel bit pair of the i-th pixel byte.
I present only the actual depacking-drawing code...
These are not complete sprite routines - you would have to add loops and next_line code.
Note that for the 3bpp and 1bpp packing, the sprite's width should be dividable by four and for 2bpp it should be dividable by two.
;;-------------------------------------
;; 3BPP - STORING METHOD: |1-1-1-1|2-2|2-2|
;;-------------------------------------
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p00_p02_p01_p10 || p12_p11_p32_p31 || p20_p21_p22_p30 |
;;
;; hl=^sprite
;; de=^screen
;; b - used as an auxiliary register
;; c - preloaded with 11110000b bit mask
;;
ld b,(hl) ; get first byte of the package
; this byte contains all bit 0 pairs of the four pixel bytes
inc hl ; go to next byte of data
; extract first pixel byte
ld a,(hl) ; get second byte of package
; this byte contains bit 2 and 1 pairs of first two pixel bytes
and c ; mask to get only the data for the first pixel byte
rrc b ; now get the bit 0 pair for this pixel
; from the first group byte (stored in b)
; and get it into it's place in the accumulator
; we achieve this by rotating the b register right
; this loads it's least significant bit into carry
rra ; then we use rra to load the carry flag into
; the most significant bit of the accumulator
rrc b ; we have to do this twice
rra ; to move the pair of bits
ld (de),a ; save to screen
inc de ; go to next screen position
; extract second pixel byte
ld a,(hl) ; reload the second byte of package (we need to extract from it the bit pairs of the second pixel)
rlca ;
rlca ; rotate four times to swap the nibbles
rlca ; and get the pairs p12_p11 into place
rlca ;
and c ; mask to isolate bits for this pixel byte
rrc b ;
rra ; extract the bit 0 pair from the first byte of package
rrc b ; and load it to accumulator to complete the second pixel byte
rra ;
ld (de),a ; save to screen
inc de ; go to next screen position
inc hl ; go to next data byte
;
; for pixels three and four we repeat all the
; operations for the first two pixel bytes
;
; extract third pixel byte
ld a,(hl)
and c
rrc b
rra
rrc b
rra
ld (de),a
inc de
; extract fourth pixel byte
ld a,(hl)
rlca
rlca
rlca
rlca
and c
rrc b
rra
rrc b
rra
ld (de),a
inc de
inc hl
-
;;-------------------------------------
;; 3BPP - STORING METHOD: |3-1|2-2|3-1|
;;-------------------------------------
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p00_p02_p01_p10 || p12_p11_p32_p31 || p20_p21_p22_p30 |
;;
;; hl=^sprite
;; de=^screen
;; b, c used as auxiliary registers
;;
; unpack first pixel
ld c,(hl) ; load c with first byte of package
ld a,c ; we'll need it later;
and 11111100b ; mask to isolate the bits of the first pixel byte;
ld (de),a ; save to screen;
inc de ; next screen position;
inc hl ; go to next data byte;
; unpack second pixel
ld b,(hl) ; get second byte of package and
ld a,b ; store it in b for later use;
and 11110000b ; mask to isolate bit pair of the second pixel byte;
rrc c ; next four instructions
rra ; combine the p10 from first byte (stored in c)
rrc c ; with p12_p11 from second byte (in a)
rra ; to get the third pixel byte;
ld (de),a ; save to screen;
inc de
inc hl
; unpack third pixel
ld c,(hl) ; get third byte of package and
ld a,c ; store in c for later;
and 11111100b ; mask to isolate bits of the third pixel byte;
ld (de),a ; save to screen;
inc de
; unpack fourth pixel
ld a,b ; load a with the second byte
rlca ; rotate four times to swap the nibbles
rlca ; of the byte
rlca
rlca
and 11110000b ; isolate the bits of the fourth pixel;
rrc c ; move p30 from third byte
rra ; into bits 7 and 6 in accumulator
rrc c ; which contained bit pairs p32_p31;
rra ;
ld (de),a
inc de
inc hl
-
;;-------------------------------------
;; 3BPP - STORING METHOD: |3-1|3-1|3-1|
;;-------------------------------------
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p00_p02_p01_p21 || p10_p12_p11_p22 || p20_p30_p32_p31 |
;;
;; hl=^sprite
;; de=^screen
;; b, c used as auxiliary registers
;;
; extract first pixel byte
ld c,(hl) ; load c with the second byte of the package
ld a,c ; copy to accumulator
and 11111100b ; mask to isolate only the bits of the first pixel byte;
ld (de),a ; save to screen;
inc de ; next screen position;
inc hl ; go to next data byte;
; extract second pixel byte
ld b,(hl) ; load b with the second byte of the package
ld a,b ; copy to accumulator
and 11111100b ; mask to get only the bits of the second pixel byte;
ld (de),a ; save to screen;
inc de ; next screen position;
inc hl ; go to next data byte;
; extract third pixel byte
xor a
rrc c: rra: rrc c: rra ; get bit 1 pair from the first byte
rrc b: rra: rrc b: rra ; get bit 2 pair from the second byte
ld b,(hl) ; load b with the third byte of package
sla b: rra: sla b: rra ; get 0 bit pair from the second byte
; (we use sla to zero the rightmost bit pair of b)
ld (de),a ; save to screen;
inc de ; next screen position;
; extract fourth pixel byte
ld a,b ; copy the (now) rotated third byte to accumulator
ld (de),a ; save to screen;
inc de ; next screen position;
inc hl ; go to next data byte;
This could be done 2 nops quicker and without using the b register:
;;-------------------------------------
;; 3BPP - STORING METHOD: |3-1|3-1|3-1|
;;-------------------------------------
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p00_p02_p01_p21 || p10_p12_p11_p22 || p20_p30_p32_p31 |
;;
;; hl=^sprite
;; de=^screen
;; c used as an auxiliary register
;;
; unpack first pixel
ld c,(hl)
xor a
rrc c
rra
rrc c
rra
ex de,hl
ld (hl),c
ex de,hl
inc de
inc hl
; unpack second pixel
ld c,(hl)
rrc c
rra
rrc c
rra
ex de,hl
ld (hl),c
inc de
inc hl
; unpack third pixel
ex de,hl
ld c,(hl)
sla c
rra
sla c
rra
ld (de),a
inc de
; unpack fourth pixel
ld a,c
ld (de),a
inc de
inc hl
But a little later I found that it could be done another 2 nops quicker, if we change the package format a little.
However this time we make use of the b register...
;;-------------------------------------
;; 3BPP - STORING METHOD: |3-1|3-1|3-1|
;;-------------------------------------
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p21_p00_p02_p01 || p22_p10_p12_p11 || p20_p30_p32_p31 |
;;
;; hl=^sprite
;; de=^screen
;; b, c used as auxiliary registers
;;
; unpack first pixel
ld c,(hl) ; load c with the first byte of package
xor a ; zero the accumulator
sla c ; ...
rra ; shift bit pair p21 into the leftmost bit pair of the accumulator
sla c ; (a = p21-0-0-0, c = p00_p02_p01_0)
rra ; ...
inc hl ; go to next data byte;
; unpack second pixel
ld b,(hl) ; load b with the second byte of package
sla b ; ...
rra ; shift bit pair p22 into the leftmost bit pair of the accumulator
sla b ; (a = p22-p21-0-0, b = p10_p12_p11_0)
rra ; ...
inc hl ; go to next data byte;
; unpack third pixel
ex de,hl
ld (hl),c ; save 1st pixel byte to screen
inc hl ; go to next screen position
ld (hl),b ; save 2nd pixel byte to screen
inc hl ; go to next screen position
ex de,hl
ld c,(hl) ; load c with the last byte of package
sla c ; ...
rra ; shift bit pair p20 into the rightmost bit pair of the accumulator
sla c ; (a = p20-p21-p22-0, c = p30_p32_p31_0)
rra ; ...
ld (de),a ; save 3rd pixel byte to screen;
inc de ; next screen position;
; unpack fourth pixel
ld a,c ;
ld (de),a ; save last pixel byte to screen;
inc de ; next screen position;
inc hl ; go to next data byte;
-
;;--------------------------------------------------
;; 3BPP - STORING METHOD: |1-1-1-1|1-1-1-1|1-1-1-1|
;;--------------------------------------------------
;;
;; package format:
;; 1st byte 2nd byte 3rd byte
;; | p31_p21_p11_p01 || p32_p22_p12_p02 || p30_p20_p10_p00
;;
;; hl'=^sprite
;; hl=^screen
;; b', c' and d' used as auxiliary registers
;;
; load b,c,d with the package bytes
exx
ld b,(hl)
inc hl
ld c,(hl)
inc hl
ld d,(hl)
inc hl
; extract first pixel byte
xor a
rrc b: rra: rrc b: rra
rrc c: rra: rrc c: rra
rrc d: rra: rrc d: rra
exx
ld (hl),a
inc hl
; extract second pixel byte
exx
xor a
rrc b: rra: rrc b: rra
rrc c: rra: rrc c: rra
rrc d: rra: rrc d: rra
exx
ld (hl),a
inc hl
; extract third pixel byte
exx
xor a
rrc b: rra: rrc b: rra
rrc c: rra: rrc c: rra
rrc d: rra: rrc d: rra
exx
ld (hl),a
inc hl
; extract fourth pixel byte
exx
xor a
rrc b: rra: rrc b: rra
rrc c: rra: rrc c: rra
rrc d: rra: rrc d: rra
exx
ld (hl),a
inc hl
This one is very slow...
-
;;--------------------------------------------------
;; 1BPP PACKED SPRITES
;;--------------------------------------------------
Here we limit the number of colors used by the sprite to two.
This way, all we need for one pixel byte is one bit pair so we save 75% of needed memory.
So the "packages" are one byte and each stores data needed for four pixel bytes.
;;--------------------------------------------------
;; 1BPP
;;--------------------------------------------------
;; package format:
;; | p00_p10_p20_p30 |
;;
;; use inks 0 and 1 for sprite
;;
;; hl=^sprite
;; de=^screen
;; c - used as an auxiliary register
;;
; extract 1st pixel byte
xor a
ld c,(hl)
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 2nd pixel byte
xor a
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 3rd pixel byte
xor a
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 4th pixel byte
ld a,c
ld (de),a
inc de
inc hl
We could preload b with a "color offset" and then we could "choose" which of the possible color pairs we want to use for the sprite.
These possible pairs are:
0-1, 2-3, 4-5, 6-7, 8-9, a-b, c-d, e-f
But if we wanted to use transparency then we would have to set 0=2=4=6=8=a=c=e which is not possible on the Enterprise....
So only 8 different colors then.
;;--------------------------------------------------
;; 1BPP
;;--------------------------------------------------
;; package format:
;; | p00_p10_p20_p30 |
;;
;; hl=^sprite
;; de=^screen
;; b = "palette chooser" [possible values: 2,4,6,8,10,12,14]
;; c - used as an auxiliary register
; extract 1st pixel byte
ld a,b
ld c,(hl)
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 2nd pixel byte
ld a,b
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 3rd pixel byte
ld a,b
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 4th pixel byte
ld a,c
or b
ld (de),a
inc de
inc hl
Here's an idea for a 1 nop per package optimization of the first 1bpp example, but we must sacrifice some more colors...
Code: [Select]
;;--------------------------------------------------
;; 1BPP
;;--------------------------------------------------
;; package format:
;; | p00_p10_p20_p30 |
;;
;; hl=^sprite
;; de=^screen
;; b,c - used as auxiliary registers
;;
;; use inks 0=8 and 1=9 for sprite (so we must sacrifice 2 colors for this optimization)
;;
; extract 3rd pixel byte
xor a
ld c,(hl)
sla c
rra
sla c
rra
ld (de),a
inc de
; extract 3rd pixel byte
sla c
adc a,a
sla c
adc a,a
ld (de),a
inc de
; extract 3rd pixel byte
ld b,a
sla c
rra
sla c
rra
ld (de),a
inc de
ld a,c
ld (de),a
inc de
inc hl
All presented examples could maybe be slightly faster by using the stack as data source...
-
;;--------------------------------------------------
;; 2BPP PACKED SPRITES
;;--------------------------------------------------
NOTE: This was written with cpc "dual-playfield mode 0" in mind so should be checked for validity for the Enterprise (because of fixbias)...
Here's a method for 2-bits-per-pixel-packing of 16-color mode sprites. Saves half of the memory needed for the sprites, but (as always the case) at the cost of some speed...
I don't think it will be much useful for any of you, but here goes...
1. Configure the palette so that bits 0,1 or 2,3 of the ink number will be used for the background inks.
These would be the "SSBB" or the standard "BBSS" configuration [see next post].
I'll use the standard "BBSS" configuration.
2. Store sprite data in a 2-bits-per-pixel-packed and pre-rotated (one rlca/rrca) form.
This means storing 2 pixel pairs in one byte, so the memory needed for the sprite data is halved.
We use 0,4,8,c for background inks, so for sprite data pixels, bits 2 and 3 are always zero.
So if we have two sprite pixel bytes XY and ST of the bit form x0-y0-x2-y2-x1-y1-x3-y3 and s0-t0-s2-t2-s1-t1-s3-t3, then, after zeroing the redundant bits they become x0-y0-00-00-x1-y1-00-00 and s0-y0-00-00-s1-t1-00-00.
We combine them into one byte and get x0-y0-s0-t0-x1-y1-s1-t1.
Then we rotate all the bytes by one bit right (like rrca) or one bit left (like rlca).
Let's assume that we'll be using right rotation so our packed byte pair becomes t1-x0-y0-s0-t0-x1-y1-s1.
To get the left byte use rrca + and %11001100.
To get the right byte use rlca + and %11001100.
Storing sprites in such form saves us half the memory needed for the data and we can unpack the byte pairs during drawing.
Here's an example of a sprite routine doing this:
;; de=^sprite
;; hl=^screen
ld b,%11001100 ;; preload b with bitmask
ld iyh,sprite_height
draw_looph:
ld iyl,sprite_height*sprite_width
draw_loopw:
ld a,(de) ;; get the byte pair in byte-packed form
ld c,a ;; save for later
rrca ;; rotate to get them in place
and b ;; get left pixel bit pairs
or (hl) ;; combine with background
ld (hl),a ;; save to screen memory
inc hl ;; go to next screen position
ld a,c ;; restore the packed byte
rlca ;; rotate to get them in place
and b ;; get right pixel bit pairs
or (hl) ;; combine with background
ld (hl),a ;; save to screen memory
inc hl ;; go to next screen position
inc de ;; next sprite data byte
dec iyl
jp nz,draw_loopw
;;
;; get next line address code
;;
dec iyh
jp nz,draw_looph
ret
Storing the data in a prerotated state isn't necassary but makes the code somewhat more elegant - one rlca/rrca for one half-byte
You could store the data normally and have no rlca/rrca for one half-byte and have two rlca/rrca for the other half.
However for using this method for flipping or shifting the prerotated storage is better because it evens the times of drawing the two versions of the sprite.
We could use a similar approach for flipping/shifting dual-playfield mode 0 sprites.
Instead of storing every two consecutive byte pairs in one byte, store half-byte of the normal form and half for the flipped/shifted form.
So the XY above would be a normal form byte and ST could be a byte of the flipped/shifted form.
Note that you wouldn't even have to change the direction of storing/loading data.
draw_normal:
;; de=^sprite
;; hl=^screen
ld c,%11001100 ;; preload c with the bit mask
ld iyh,sprite_height
dnorm_looph:
ld b,sprite_width ;; load b with loop count
dnorm_loopw:
ld a,(de)
rrca
and c ;; get left pixel pair bits - the normal sprite version bits
or (hl)
ld (hl),a
inc hl
inc de
djnz dnorm_loopw
;;
;; get next line address code
;;
dec iyh
jp nz,dnorm_looph
ret
draw_flipped: ;; or shifted
;; de=^sprite
;; hl=^screen
ld c,%11001100 ;; preload c with the bit mask
ld iyh,sprite_height
dflip_looph:
ld b,sprite_height*sprite_width
dflip_loopw:
ld a,(de)
rlca
and c ;; get right pixel pair bits - the flipped (shifted) version bits
or (hl)
ld (hl),a
inc hl
inc de
djnz dflip_loopw
;;
;; get next line address code
;;
dec iyh
jp nz,dflip_looph
ret
Note that the only difference between the two routines is the direction of rotation (rlca/rrca)
The possibility to pack two versions of the same sprite (for flipping/shifting) in one seems especially interesting...
-------------------
List of possible palette configurations:
legend:
"BBSS" means that bits 3 and 2 are background ink bits and bits 0,1 are sprite ink bits
LBn - left pixel background ink bit n
RBn - right pixel background ink bit n
LSn - left pixel sprite ink bit n
RSn - right pixel sprite ink bit n
n=0,1
1. BBSS
- sprite data bytes of the form LS0 - RS0 - LB0 - RB0 - LS1 - RS1 - LB1 - RB1
- inks 0, 4, 8, c for background inks
- inks 1 = 5 = 9 = d, 2 = 6 = a = e and 3 = 7 = b = f for sprite (0 as transparent)
- use and %00110011 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 S1 S2 S3 B1 S1 S2 S3 B2 S1 S2 S3 B3 S1 S2 S3
2. BSBS
- screen data bytes of the form LS0 - RS0 - LS1 - RS1 - LB0 - RB0 - LB1 - RB1
- inks 0, 2, 8, a for background
- inks 1 = 3 = 9 = b, 4 = 6 = c = e and 5 = 7 = d = f for sprite
- use and %00001111 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 S1 B1 S1 S2 S3 S2 S3 B2 S1 B3 S1 S2 S3 S2 S3
3. BSSB
- screen data bytes of the form LB0 - RB0 - LS1 - RS1 - LS0 - RS0 - LB1 - RB1
- inks 0, 1, 8, 9 for background
- inks 2 = 3 = a = b, 4 = 5 = c = d and 6 = 7 = e = f for sprite
- use and %11000011 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 B1 S0 S0 S1 S1 S3 S3 B2 B3 S0 S0 S1 S1 S3 S3
4. SBBS
- screen data bytes of the form LS0 - RS0 - LB1 - RB1 - LB0 - RB0 - LS1 - RS1
- inks 0, 2, 4, 6 for background
- inks 1 = 3 = 5 = d, 8 = a = c = e and 9 = b = d = f for sprites
- use and %00111100 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 S1 B1 S1 B2 S1 B3 S3 S2 S3 S2 S3 S2 S3 S2 S3
5. SSBB
- screen data bytes of the form LB0 - RB0 - LS0 - RS0 - LB1 - RB1 - LS1 - RS1
- inks 0, 1, 2, 3 for background
- inks 4 = 5 = 6 = 7, 8 = 9 = a = b and 9 = b = d = f for sprites
- use and %11001100 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 B1 B2 B3 S1 S1 S1 S1 S2 S2 S2 S2 S3 S3 S3 S3
6. SBSB
- screen data bytes of the form LB0 - RB0 - LB1 - RB1 - LS0 - RS0 - LS1 - RS1
- inks 0, 1, 4, 5 for background
- inks 2 = 3 = 6 = 7, 8 = 9 = c = d and a = b = e = f for sprites
- use and %11110000 for erasing sprite bytes
ink: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
B0 B1 S1 S1 B2 B3 S1 S1 S2 S2 S3 S3 S2 S2 S3 S3
The remaining possibilities:
The following four give you 2 colors for background (you could use xor for drawing/erasing) and 8 colors for sprites:
7. BSSS
[this and SBBB seem to be the only "cpc dual-playfield-like" modes possible on the enterprise computers... (because of the fixbias)]
8. SBSS
9. SSBS
10. SSSB
and
11. SBBB
12. BSBB
13. BBSB
14. BBBS
Once more - note that this can be impossible on the enterprise...
I know I should check it before posting but...I don't want to sacrifice more time on it. I know it's not very useful...so sorry...
However maybe someone has other ideas on packing a sprite for unpacking during the drawing process?
-
What I will do in my games is the opposite.
Targeting even extended memory configurations if that is a must, but I will unroll the sprite drawing code to sequences with the graphics data within.
And the speed will be very low, in this way, too.
-
What I will do in my games is the opposite.
Targeting even extended memory configurations if that is a must, but I will unroll the sprite drawing code to sequences with the graphics data within.
And the speed will be very low, in this way, too.
And are absolutely right :D.
For games to be of decent speed and have some more or bigger sprites you have to use compiled sprites which take the most memory. If you run out of memory you use compressed sprites. However compiled and compressed sprites can't be clipped/shifted/flipped in an easy way as standard sprites... I use the latter for the frames that should be clipped... But if you have a lot (and I mean a lot) of memory then you could use preclipped frames but that would be a memory overkill. However not all games need clipping or you could just make the sprites disappear when they reach the end of the screen like in some games (or still have a few "preclipped" stages with that). But there still remains shifting and flipping of sprites...
Btw, drawing the sprites normally (up-bottom or bottom-up) you could clip compiled sprites easily and quickly but it's impossible to do horizontal clipping...
But if you use the video setup described here https://enterpriseforever.com/programming/video-config-that-allows-to-draw-sprites-vertically/, then you could clip them horizontally as you could vertically in the normal way... But the memory needed for such setup is huge...
Like I think I wrote at the beginning - these "packed sprites" aren't really useful. It's more of a curiosity. However on the cpc with dual playfield mode 0 palette setup and 2bpp packing you could "hide" two frames of a single sprite in one (for example have a flipped or shiftedd version of the sprite in the same memory area)... There's no need to save memory that bad to use these ;)