PoC.cache.mem

This unit provides a cache (PoC.cache.par2) together with a cache controller which reads / writes cache lines from / to memory. It has two PoC.Mem Interface interfaces:

  • one for the “CPU” side (ports with prefix cpu_), and
  • one for the memory side (ports with prefix mem_).

Thus, this unit can be placed into an already available memory path between the CPU and the memory (controller). If you want to plugin a cache into a CPU pipeline, see PoC.cache.cpu.

Configuration

Parameter Description
REPLACEMENT_POLICY Replacement policy of embedded cache. For supported values see PoC.cache_replacement_policy.
CACHE_LINES Number of cache lines.
ASSOCIATIVITY Associativity of embedded cache.
CPU_ADDR_BITS Number of address bits on the CPU side. Each address identifies one memory word as seen from the CPU. Calculated from other parameters as described below.
CPU_DATA_BITS Width of the data bus (in bits) on the CPU side. CPU_DATA_BITS must be divisible by 8.
MEM_ADDR_BITS Number of address bits on the memory side. Each address identifies one word in the memory.
MEM_DATA_BITS Width of a memory word and of a cache line in bits. MEM_DATA_BITS must be divisible by CPU_DATA_BITS.
OUTSTANDING_REQ Number of oustanding requests, see notes below.

If the CPU data-bus width is smaller than the memory data-bus width, then the CPU needs additional address bits to identify one CPU data word inside a memory word. Thus, the CPU address-bus width is calculated from:

CPU_ADDR_BITS=log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS

The write policy is: write-through, no-write-allocate.

The maximum throughput is one request per clock cycle, except for OUSTANDING_REQ = 1.

If OUTSTANDING_REQ is:

  • 1: then 1 request is buffered by a single register. To give a short critical path (clock-to-output delay) for cpu_rdy, the throughput is degraded to one request per 2 clock cycles at maximum.
  • 2: then 2 requests are buffered by PoC.fifo.glue. This setting has the lowest area requirements without degrading the performance.
  • >2: then the requests are buffered by PoC.fifo.cc_got. The number of outstanding requests is rounded up to the next suitable value. This setting is useful in applications with out-of-order execution (of other operations). The CPU requests to the cache are always processed in-order.

Operation

Memory accesses are always aligned to a word boundary. Each memory word (and each cache line) consists of MEM_DATA_BITS bits. For example if MEM_DATA_BITS=128:

  • memory address 0 selects the bits 0..127 in memory,
  • memory address 1 selects the bits 128..256 in memory, and so on.

Cache accesses are always aligned to a CPU word boundary. Each CPU word consists of CPU_DATA_BITS bits. For example if CPU_DATA_BITS=32:

  • CPU address 0 selects the bits 0.. 31 in memory word 0,
  • CPU address 1 selects the bits 32.. 63 in memory word 0,
  • CPU address 2 selects the bits 64.. 95 in memory word 0,
  • CPU address 3 selects the bits 96..127 in memory word 0,
  • CPU address 4 selects the bits 0.. 31 in memory word 1,
  • CPU address 5 selects the bits 32.. 63 in memory word 1, and so on.

A synchronous reset must be applied even on a FPGA.

The interface is documented in detail here.

Warning

If the design is synthesized with Xilinx ISE / XST, then the synthesis option “Keep Hierarchy” must be set to SOFT or TRUE.

Entity Declaration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
entity cache_mem is
  generic (
    REPLACEMENT_POLICY : string   := "LRU";
    CACHE_LINES        : positive;
    ASSOCIATIVITY      : positive;
    CPU_DATA_BITS      : positive;
    MEM_ADDR_BITS      : positive;
    MEM_DATA_BITS      : positive;
    OUTSTANDING_REQ    : positive := 2
  );
  port (
    clk : in std_logic; -- clock
    rst : in std_logic; -- reset

    -- "CPU" side
    cpu_req   : in  std_logic;
    cpu_write : in  std_logic;
    cpu_addr  : in  unsigned(log2ceil(MEM_DATA_BITS/CPU_DATA_BITS)+MEM_ADDR_BITS-1 downto 0);
    cpu_wdata : in  std_logic_vector(CPU_DATA_BITS-1 downto 0);
    cpu_wmask : in  std_logic_vector(CPU_DATA_BITS/8-1 downto 0) := (others => '0');
    cpu_rdy   : out std_logic;
    cpu_rstb  : out std_logic;
    cpu_rdata : out std_logic_vector(CPU_DATA_BITS-1 downto 0);

    -- Memory side
    mem_req   : out std_logic;
    mem_write : out std_logic;
    mem_addr  : out unsigned(MEM_ADDR_BITS-1 downto 0);
    mem_wdata : out std_logic_vector(MEM_DATA_BITS-1 downto 0);
    mem_wmask : out std_logic_vector(MEM_DATA_BITS/8-1 downto 0);
    mem_rdy   : in  std_logic;
    mem_rstb  : in  std_logic;
    mem_rdata : in  std_logic_vector(MEM_DATA_BITS-1 downto 0)
    );
end entity;

See also

PoC.cache.cpu