The Intel 80321 CPU as used in the Iyonix has a built in feature known as the Application accelerator. This device can access main memory while the CPU is doing other things - it can either fill a block of memory with a constant value, or copy the contents of one block to another.
Not only does this device have access to the full bandwidth of the Iyonix's memory (about 600 megabytes per second), the CPU can be processing something else while the memory operation is underway.
Obviously there are drawbacks - the application accelerator operates on physical addresses, whereas RISC OS uses logical addresses. Thus there is an overhead in translating logical to physical addresses for each operation. There is also an overhead incurred due to the need to mark the memory being copied / filled as uncachable for the duration of the operation.
The AppAcc module takes care of all this for you, and provides an API that is nearly as simple to use as memcpy() or memset().
Here are some benchmark figures for memory copying, from an example application that is included with this distribution. Figures are in megabytes per second
| Block size | memcpy() | AppAcc_Copy |
|---|---|---|
| 1k | 31 | 6.3 |
| 4k | 32 | 24 |
| 16k | 50 | 76 |
| 256k | 32 | 269 |
| 1024k | 32 | 309 |
I think the spike in memcpy() performance for 16k blocks is to do with the CPU's cache size. Notice how for smaller blocks, because of the overhead in setting up an AppAcc operation, memcpy() performs better. But as the block size increases, so the relative impact of the overheads in using the AppAcc decrease. Until for 256k blocks and above the applcation accelerator is operating at pretty much the maximum theoretical bandwidth (600 MB/s - but a copy operation requires twice the number of memory accesses).
Full source code to the module and some example applications is provided.
Download Application Accelerator 0.03 (dated 16 November 2006)
Application Accelerator is fully Iyonix compatible.