lgb your description of the data transfer, and your code snippets, agrees with my current understanding (perhaps not surprising as we have read the same datasheet!!) but I hope to be able to confirm for real very soon.
Well, with my misunderstanding ability, it would be no wonder to think about a completely different thing even by reading the same spec as you

It's a bit disappointing: at first glance of the datasheet with its auto-incrementing register I thought we'd be able to use INIR etc but on closer inspection it seems not. 
Hmm, maybe I missed something, but I have no knowledge about any auto-incrementing stuff, or do you mean about the internal counter of FIFO registers maybe? I wouldn't say that not having INIR is a so big problem. INIR is kinda slow, something like 21 cycles (well, about cycles I can be wrong!). At the other hand, a plain IN A,(nn) - let's say we use constant ports, and unrolled loops for big chunks - is 11. So it's faster this way, but for sure, you need _store_ data too then, so maybe not

If we optimize this to have EP-memory buffer at fixed 256 byte boundary, it can be eg an LD (BC), A - 7 cycles? - and a plain INC C (if no 256 byte boundary crossed) - 4 cycles, thus we have 11+7+4 ... well, one cycle slower than INIR and having just too much "good" conditions for being fast. Well, better than nothing ... The other possibility would be using memory mapped I/O, thus not real Z80-like I/O, but using a memory area for that, as SD card does it too. In that case, a bigger area (even 16K, etc, but for real it's even OK to decode a smaller part of the EPNET ROM area for that, of course) can be used to address basically IMD_DR0 or IMD_DR1 based on the lowest bit of the address. The other stuffs can be on I/O port. This sounds insane, but consider this: w5300 seems to be indirect for FIFO TX/RX transfer even in direct mode. Register addressing still indirect and needs IDM_AR on I/O ports (and MR too, of course). But when it is set up, IDM_DR0 and DR1 can be accessed via "normal RAM" thus you can use LDIR (as it increases the source address the toogling A0 line in our "DR segment" would read IDM_DR0, IDM_DR1 sequence as we should!), etc for the actual transfer than! Only some extra decoding should be used. I am not sure if it worth anyway

Sounds strange, but I've already started to like it

What I see here, that from view point of w5300 even in indirect mode, those are just signals, nothing says we can't map w5300 address pins in an "insane" way

One other thing I felt the void before, but I haven't realized: there is no way on the w5300 to detect the link status, speed, etc

If EPNET ROM would support something like doing DHCP it would only wait for timing out when even no cable plugged in. I don't like this. But what I have is an idea: w5300 has outputs to drive LEDs about some attributes we are interested in. What would happen if we also interpret those as logic levels, routing them onto a buffer IC which is enabled in case of reading of an I/O port after the w5300 IDM's ones? Then we would be able to "read the status of LEDs" basically, or so

But we can see the link status, speed, etc, from software too!