@IstvanV I saw that in your sprite example you were able to count the exact time of the drawing code. I hope it's not a stupid question to ask you for the principles of applying the waits to instruction timings... I guess it's based on the source/destination memory segment (if it has wait states) or is there another factor?
Wait states generated by DAVE (if enabled on port BFh) are simple, they add 1 cycle to all memory accesses (00h) or only M1 reads (04h). The latter are the first byte of simple instructions with no prefix, or the first two bytes with CB/DD/ED/FDh prefix. Also, according to tests by Zozosoft, it seems that DAVE waits 2 cycles on turbo machines. But in programs where speed is important, wait states are disabled, so all the above does not matter.
Video memory and NICK I/O port timing are more complex: within each 889846 Hz "slot" (1 character), the Z80 is allowed access to one byte at a specific time, and it always needs to wait at least a certain amount of time (a few ns more than 5/16 of a character, and there is also few ns difference depending on the type of access - normal memory, M1 memory, and I/O). This is implemented by the NICK chip halting the Z80's clock, which it can do in 1/2 clock cycles (1/8000000 s on normal non-turbo machines). There is a test run on real machines
here that measures the average time between video memory accesses at various intervals in Z80 cycles.
For the calculations in the sprite code, I used a simplified model: add 1.5 cycles to the time between the video memory accesses, and then round up the result to an integer multiple of 4.5 cycles, and that is the real amount of time, with the wait states (stopped Z80 clock) added. In other words, the best case is 1.5 cycles of wait, and the worst case is 5.5 cycles. This is not entirely accurate, but it gives a reasonable estimate.
When calculating video memory timing, it should also be taken into account exactly when the accesses occur within the instruction. This can be found in the Z80 documentation. For example, an LD (HL), A instruction is two memory accesses, an M1 read at 2 cycles, and a normal write at 6.5 cycles.