I'm a bit confused about the concept of memory alignment. So here's my doubt: What text says is that say if you wanna read 4 bytes of data, starting from an address that is not divisible by 4, you have the case of an unaligned memory access. Example, if I wanna read 10 bytes starting at address 05, this will be termed as an unaligned access (http://www.mjmwired.net/kernel/Documentation/unaligned-memory-access.txt).
Will this case be specific to a 4 byte word addressable architecture or does this hold valid for a byte addressable architecture as well? If the above case is unaligned for a byte addressable architecture, why is it so?
As a general rule, bit 0 in memory is gated onto a bus and bit 0 of that bus is connected to bit 0 of every register. It goes on like this until bit 31. There may be special hardware that directs each byte (bits 15:8, 23:16, and 31:24) onto the low order byte, bits 7:0. (When you get to bit "32", it's actually bit 0 of the 4-byte word at address 4.)
However, in the nominal case there is not any special hardware that moves bytes to any position other than the one they are nominally connected to in the natural order and, maybe, byte lane 0.
Imagine a simple memory chip with 32 data pins and a simple CPU with 32 data pins. A given data pin on each chip is wired to the corresponding one on the other, and only to that one. There simply is no way for a simple CPU to do the misaligned read at all.
So, consider a read from 0. The next 4 bytes all fall into a register as wired, and this also happens for a read from address 4. But what if you read (32 bits) from address 1? Or 2? Or 3? Although the read cannot be done directly in hardware, a fancy controller can cause a whole lot of things to happen:
All of these things take extra time.
Note. In reality the data bus is typically a multiple of 32-bits and so is the memory. Special hardware may exist for realigning objects. But even then, because it's an abnormal case, it may not get the pipeline optimizations that properly aligned reads get, and even with special hardware there is probably a time penalty for running operands through it.
Alignment has to do with data size and addressing. Most instruction sets/software the addressing is in units of bytes. 0,1,2,3 are all valid byte addresses. Assuming your memory system or peripheral you are accessing is "byte addressable" basically you can write individual bytes to it, you normally have instructions that allow you to use any address value. Alignment starts when you have more than one byte, two bytes, if aligned means the lsbit of the address is a zero, unaligned means it is a one. Four bytes, 32 bit quantities, the lower two bits are zero, aligned, one or both not zero, unaligned, and so on. Can think of it as modulo of you want an address where modulo 4 = 0 is aligned on 4 byte boundaries.
Now normally as a software engineer you would not intentionally put yourself in a situation where you needed to get at 10 bytes at address 5 you would probably do 12 bytes at 0x4 or 16 at 0x0 or something along those lines, even if you only use 10 of them you would align them more logically. External influences, network packets, file systems, shared memory, hardware, etc, any time you cross a compile domain, you might have to deal with this and act accordingly. 10 bytes is semi-interesting depends on if you are trying to copy these bytes to another equally bad address or just read them or write them. If reading you probably just want to read 12 bytes at address 0x4 and be done with it. If writing well you can just do all 10 in a nice loop or unrolled a byte at a time, you can write one at 0x5, two at 0x6, four at 0x8, two at 0xC and one at 0xE, or one at 0x5, a loop or unrolled 4 16 bit values starting at 0x6 then one byte at 0xE. Etc.
Since you said reading you could read 3 32 bit quantities at 0x4 or two 64 bit quantities starting at 0x0. It depends heavily on what you plan to do with the data and what instruction set you are using, etc. A loop of 10 byte reads might be the cleanest/simplest easiest to read, maintain, etc.
if you are wondering about aligned vs unaligned then it is as I mentioned above with the writes, you CAN do a
8 bit access at 0x5 16 bit access at 0x6 32 bit access at 0x8 16 bit access at 0xC 8 bit access at 0xE
as I keep saying though for reads that might not be the most efficient. For writes you can read-modify write in 32 or 64 bit quantities or the combinations I mentioned above.