CMPE 401 - Computer Interfacing

Assignment #4 Model Solutions

Due: In the CMPE 401 assignment box at 15:45 on Monday, Nov. 21, 2005


  1. One high-speed scanner and two laser printers are to be connected to a single networked computer. The scanner has a peak data rate of 350 KB/s, while each laser printer has a peak data rate of 210 KB/s. (Recall that 1 KB = 1024 bytes.) Assume that the scanner produces data blocks containing 2048 bytes each. There is an overhead of 130 clock cycles to handle each data block using direct memory access (DMA). Once a DMA transfer begins, the cost of each two-byte word transfer is two clock cycles. Each printer receives data from the CPU using word-sized MOVE instructions, and each MOVE instruction requires eight CPU clock cycles to transfer two bytes of data. At the start of each print job there is an overhead of 300 CPU clock cycles, plus a pause time of 10 milliseconds. Print jobs typically take multiple seconds to complete. What would be the slowest allowable CPU operating frequency (rounded up to the nearest one third of a megahertz) to ensure that the peak work load caused by the scanner and the two printers is less than 10%?

    [15 marks]
    Scanner:
    peak data rate = 350 KB/s
    peak block rate = 350 KB/s / 2B/block = 175 blocks/s
    block overhead load = 175 blocks/s x 130 cycles/block = 22750 cycles/s
    data load = 350 KB/s * 2 cycles/ 2 bytes / = 350 x 1024 = 358400 cycles/s
    total load due to scanner = 22750 + 358400 = 381150 cycles/s

    One laser printer: peak data rate = 210 KB/s
    peak data load = 210 KB/s * 8 cycles/2 bytes = 210 x 1024 x 4 = 860160 cycles/s

    peak load during start of print job = 300 cycles / 10 ms = 30000 cycles/s
    Conclusion: the load during the start of a print job is insignificant
    compared to the peak data load for the printer.

    Min. CPU freq. = 10 * (381150 + 860160 + 860160) = 21.0147 MHz
    Rounded up min. CPU freq. = 21.3333 MHz

  2. The M68000-series microprocessors from Motorola Semiconductor (now called Freescale Semiconductor) provide both user interrupts and autovectored interrupts. How could a "intelligent" peripheral interface chip, one with many different possible sources of interrupts, exploit the user interrupt mechanism to ensure faster interrupt handling?

    [10 marks]
    In user interrupts, the interrupting device provides an "exception vector number" to the CPU during the "Interrupt Acknowledge Cycle" (IACK). The CPU uses this number to select one Exception Vector in the system's Exception Vector Table. This exception vector is the first address of the corresponding exception handling routine (also called the "Interrupt Service Soutine"). Thus an intelligent interrupting device has the opportunity to direct the CPU to use different ISRs for different interrupting conditions simply by supplying different exception vector numbers during the IACK cycle. This technique could be used to slightly speed up the processing of multiple interrupts since the selection of the appropriate software routines would be done automatically in the hardware of the interrupting device, not by a single ISR running on the CPU. An ISR that handles multiple sources of interrupts must first determine the identity and handling order of the active interrupts, and this takes time.

  3. Briefly explain why direct memory access (DMA) is a fast method for transferring large blocks of data? What are the alternatives to using DMA? Why does DMA become less attractive as the number of transferred bytes gets smaller?

    [10 marks]
    DMA is a a fast method for transferring large blocks of data because it replaces multiple separate moves (which would each need to separately fetched, decoded, and executed) with a single block move operation that is produced by a DMA Controller. The DMA move operation is faster because the overhead of many MOVE instructions is avoided.

    The alternative to using DMA is to use multiple move instructions. To speed up the data movement process it would be preferable to use long-word-sized MOVEs instead of byte-sized MOVEs because the overhead of the two instructions is similar.

    As the number of bytes to be moved decreases, the time overhead of setting up the DMA starts to become a factor. (Recall that to perform a DMA transfer, the CPU must first initialize registers in the DMAC. Also, at the end of the DMA transfer, the CPU is typically interrupted, and the resulting interrupt service routine requires time to execute.) Therefore there is usually a minimum block size which should be handled by DMA; data transfers of smaller than that minimum size are best handled using MOVE instructions.

  4. Double buffers and first-in first-out (FIFO) queues are two different ways of decoupling a producer of data from a consumer of data. What are the principal differences between the two methods? In which situations should one choose one method over the other? (Be sure to consider scenarios that are favourable to double buffers and to FIFO queues.)

    [15 marks]
    The principal difference between the two methods is that a double buffer gives both sides (the data producer and the data consumer) random access to two separate equal-sized regions of memory. In a FIFO, the data producer must store the next datum in one automatically determined next write location in memory; similarly, the data consumer must read the next datum from one automatically determined next read location in memory. Thus a double buffer provides more address order flexibility in the writing and reading processes. However, both a double buffer and a FIFO provide useful data rate decoupling between a data producer and a data consumer.

    The FIFO is a good choice for applications where the data is produced and consumed in exactly the same order. The FIFO will then enforce the data handling order in hardware, which might be a safe constraint to have imposed. A double buffer is a good choice if the data order on the producer side can differ from the data order on the consumer side. This increased order in the data flexibility comes at a price, however: both buffers in the data buffer must have the logic that is required to decode arbitrary addresses. In a FIFO, an address decoder can be entirely omitted by using bit "pointers" that rotate through two different registers.

  5. Consult the on-line documentation for the TPU (available by following the link from the course homepage) to determine how the "Period Measurement with Additional/Missing Transition Detect" built-in function works (function code $B). Briefly explain the capabilities of this function in your own words.

    [20 marks]
    This function is intended to be used in applications where there is a regular sequence of signal transitions (produced by pulses) that is perturbed at fixed intervals by either the presence of an extra transition, or the disappearance of an expected transition. These two situations could arise with a rotating geared wheel which either (a) has an extra tooth at one angular position, or (b) is missing a tooth at one angular position. The choice between extra transition detect, or missing transition detect, is made by loading the correct bit pattern into the Host Sequence Register. The function is convenient for measuring the rotational period of a wheel.

    In both modes, the function measures the time interval (in TCR1 clock periods) between successive occurences of the extra pulse (or the missing pulse). The number of such occurences is recorded by incrementing a 23-bit clock (TCR2). The TCR2 will be "cleared" to $FFFF if either (1) the TCR2 reaches a software-programmed limit, or (2) a byte flag parameter is nonzero and the next event occurs.

    An extra transition is detected if a transition occurs within some programmed fraction of the previous period between transitions. A missing transition is detected if the next transition is delayed by more than some programmed fraction of the previous period between transitions.

  6. In the lectures we saw that the registers of the DUART can, depending on the implemented memory map, be located either on (a) subsequent even addresses, (b) subsequent odd addresses, or (c) packed adjacent addresses. On the other hand, the registers of the TPU are only located at even addresses. Briefly explain where the three possibilities arise for the DUART registers offsets, and also explain why the TPU register offsets are even.

    [15 marks]
    The DUART registers are eight bits wide, and the most efficient memory mapping would arrange the registers into a packed block of 8-bit registers starting at the DUART's base address. This can be easily done with 68000-family processors, such as the MC68332, which have a least significant address line A0. The simplest connections would be A0 to RS1, A1 to RS2, A2 to RS3, and A3 to RS4.

    However, the original 68000 did not have an address line A0. Instead it had address lines A1,...,A23 and two data strobes, Upper Data Strobe (UDS) and Lower Data Strobe (LDS). For microprocessors that have UDS and LDS instead of A0, it is most natural to connect A1 to R1, A2 to R2, A3 to R3, and A4 to R4 when interfacing to an 8-bit peripheral chip like the DUART. The eight data pins of the DUART would then be connected either to databus lines D0 to D7 (for odd word alignment), to databus lines D8 to D15 (for even word alignment). With odd- or even-word alignment, the 8-bit peripheral chip registers will only be present on alternating addresses, not on adjacent addresses.

    For peripherals (such as the TPU) that have 16-bit registers, the most efficient memory mapping is to arrange the registers into a packed block of word-sized locations beginning at the peripheral's base address. The registers must be located at even-numbered addresses to conform with the even-alignment convention adopted by all 68000-family microprocessors.

  7. Assume that the TPU is being used with a TCR1 clock frequency of 16.78 MHz, divided down by a factor of 64. Assume further that there are to be 12 different stepping rates. Calculate the corresponding values of the STEP_CNTL0 and STEP_CNTL1 parameters if the slowest stepping rate is to be 16 steps/second and the maximum stepping rate is to be 240 steps/second.

    [15 marks]
    From the course notes (or the TPU documentation) we know that:
    Minimum stepping rate = TCR1 / ( STEP_CNTL1 - 2*STEP_CNTL0) = 16
    Maximum stepping rate = TCR1 / ( STEP_CNTL1 - (12+1)*STEP_CNTL0) = 240

    Through algebraic manipulation we obtain:
    STEP_CNTL1 = STEP_CNTL0 * ((240*13)-(2*16)) / (240-16) = STEP_CNTL0 * 13.7857

    We are given that TCR1 = 16.78 MHz / 64 = 2^24 / 2^6 = 2^18 = 262144 Hz

    But, TCR1 = 16 * (STEP_CNT1 - 2*STEP_CNTL0)
    So, 262144 = 16 * STEP_CNTL0 * 11.7857
    And thus, STEP_CNTL0 = 1390.16 = 1390 (rounded)
    STEP_CNTL1 = STEP_CNTL0 * 13.7857 = 19164.32 = 19164 (rounded)