In the first part of this series, Measuring an RMS value on a PSoC5, Part 1: Signal Acquisition, I described the way to rectify the incoming signal and how to connect up the ADC (analog to digital converter). The next step in generating the RMS value is to square the readings and accumulate them. We need to descend an abstraction level to understand how to do that.
One of the reasons that I was interested in this project was that I wanted to delve deeper into the workings of the PSoC. Cypress has allowed users to create their own components using Verilog or a technique called “datapaths” and so I girded myself to create this project using those techniques. As I poked around I came across some components created by the PSoC community (data sheet) and in there I found an “Integer Square Root Calculator” as well as “Multiply and Accumulate using the DFB”.
All my plans for learning Verilog went out the window. I went through the datasheet of the latter and decided that it wouldn’t quite meet my needs. However it held promise, for not only would it square the input, it would accumulate the squares as well which is exactly what I needed. The data sheet for the Multiply and Accumulate component spoke about “Q23” formatted numbers (which didn’t mean much to me at the time) and seemed to preclude the general number multiplication that I wanted. It also posed a problem as to how to write the same number twice to get the square. But I thought I could tweak the design to my own needs. “Tweak” was not quite the word.
According to the DFB (Digital Filter Block) datasheet, the “Digital Filter Block is a 24-bit fixed-point, programmable limited-scope DSP with a 24*24 Multiply and Accumulate Unit (MAC), a multifunction Arithmetic Logic Unit (ALU), and data routing, shifting, holding, and rounding functions .” That’s a mouthful, but sounds simple enough. I was amazed a quite how convoluted the design is. I googled for “DFB”s in the FPGA realm, but nothing came anywhere as complex as this.
The first movie I ever saw at a drive-in theatre was “The Five Pennies” with Danny Kaye and Louis Armstrong. One of the songs that stuck with me is called “The Music Goes Round and Round”. I suggest you listen to it before continuing to get a flavor of what I was feeling. Each time I read the data sheet all I could think was “the data goes round and round and it comes out there” and then I would point at hold registers at the output. Figure 2-1 gives you a partial view of the structure.
I could write (and originally did before I edited it down) several blogs just on the issues and problems of the DFB datasheet. Let me just quote Cypress FAE Frank Kees (DFB assembler and (Significantly) Improved Simulator component):
“We were doing some development with the DFB and came to realize what a frustrating experience it was to learn and program the DFB. The documentation needs a significant overhaul, but in the meantime, a much better simulator was an absolute must …”
As far as I can tell the DFB documentation has not been yet upgraded. But since I managed to get it working in the end, let me move on.
The first issue was that on PSoC Creator version 3.3, portions of the simulator window for the DFB module are somehow visually stretched to the point of being unusable. I am told the version 3.2 does not have the problem, but I developed on Creator version 3.1 SP3 which definitely worked. Components get modified as Creator evolves. You may be prompted to replace the components. As long as you don’t want to go back to an earlier revision of Creator it is fine to continue with the replacement.
Once you start working with the code for the DFB let me give you an important hint: always and frequently click on the “Apply” or “OK” buttons at the bottom of the window. Using the “X” at the upper right to close the window may lose you a lot of work (and yes, I speak from bitter experience) since there is no confirmation required for this action, even if something has changed.
This is a screenshot from the DFB data sheet. To simplify my explanation I edited out the parts that don’t impinge on what I describe. I hope Cypress won’t mind the copyright violation.
The DFB works with a Very Long Instruction Word (VLIW) format. Each word comprises of instructions for the four elements of the DFB: the multiplexers, the ACU (address Calculation Unit), the ALU (Arithmetic Logic Unit) and the MAC (Multiplier/Accumulator) with a few modifiers (my word) that include jumps, shifts, writes and input register selection. The ALU has pipelining to add to the complexity. It’s just like assembly programming but much more primitive. For instance there is no way that I can see for a loop counter, and it has some arcane rules- like a label must be preceded by a jump instruction, no doubt something to do with the state machine that encapsulates the code.
The whole DFB works on 24 bits of data (including a sign bit), but multiplying two numbers together you get 48 bits of data including the sign bit. However it is only possible to access the upper 24 bits of data. (The data is formatted as “Q23” at the input and “Q1.22” at the output (see here for some information of the Q number format), but we can sort of ignore that.) At the suggestion of Cypress tech support, I discovered that I could multiply two 12 bit numbers and still get an accessible result if I increase my 12 bit input by a factor so that when multiplied, the results would be in the upper 24 bits. Obviously the results would have to be adjusted to compensate for that factor.
In order for a squared number to show up in the least significant bit of the upper 24 bits (bit 224 ) the least significant bit of the number would be 212 i.e. the square root of 224 . That means I would have to shift the incoming data 12 times so that the least significant bit of my input would appear at the 12th bit location to make it into the calculation result. Squaring a 12 bit number would result in a 24 bit number and so it was apparent that I would have to be satisfied with less than a 12 bit number. But it gets a bit more complex because aside from the sign bit, there is also the accumulation of the squares and that will also increase the number of bits in the answer. I came to a compromise to use a 9 bit value as an input and 16 samples that would translate into a ((9 x 2)+4=) 22 bit result.
Based on this analysis I had to shift the incoming 12 bit number 9 times so that the d3 of the incoming data ends up at d12 of the data that will be squared. The least significant three bits would be truncated, well almost. The result however will be a little better than 18 bits though, because the lesser bits still form part of the multiplication and the carries will pass on up.
The listing as taken from the code tab that appears when you double click on the DFB icon in the schematic in Creator. (
The code is shown in Figure 2, right side. Note that there are several “NOP”s and dummy steps to allow for the pipelining of the DFB. One of the limitations of the shift function of the ALU is that it only allows left shifts of one or two positions, so to get to nine needed some thought. I started off by clearing the accumulator in the MAC (lines 4 and 5). I wanted a complete clear MAC and since the MAC(clra) instruction actually causes a multiplication I used this alternate approach.
Once the data is received (line 25) from a DMA transfer in, the input data is passed to the ALU output. I then shift the ALU output twice and store it to RAM-A (line 31)- it could just as easily been RAM-B. That RAM data is then passed to the ALU output (line 34). The ALU output is again shifted twice and then written back to RAM-A (line 40). This sequence is repeated twice more and then once more with the data being shifted once instead of twice for a total of 9 shifts. The 9 times shifted result is also stored into RAM-B. To me it would be logical that once you had shifted data it would remain in the shifted state, but this is not the case, so there has to be a shift (line 50) to write to RAM-B as well the original write to RAM-A (line 48).
See what I mean-
The data goes round and round o-o-o-o …
The contents in RAM-A and RAM-B (the same number) are passed to the MAC and multiplied to get the square (line 56). The actual result is shifted left by one (appearing to be multiplied by 2), apparently as a result of the number formatting.
And it comes out squared!
For each subsequent input number, it is squared and added to the number in the accumulator and retained there. After 16 such calculations I want to output the accumulated total and clear the MAC. I managed to do that using an external counter (I will get to that) but trying to figure out how to conditionally jump in the DFB code based on the inputs is yet another exercise in frustration as the inputs to the module are sometimes called “interrupts ” or “global inputs ” or “in_1 ” and there is not really an explicit conditional test. The DFB will not execute a jump if the input is not set (under the right conditions- a certain flag must be set to enable this feature see the alu(englogbals,001) in line 58) as you can see in line 64.
When the input signal is low (the sense is inverted externally) the jump in line 64 is not executed and the accumulated number in the MAC is fed to the output of the ALU (line 68), then shifted right to correct the shift described above and written to the HOLD A register for use by the external world. Then the process repeats starting with the MAC initialization (line 4). Just an afterthought: you also have to remember to turn off the jump feature, or it will affect the next jump you execute (line 69).
That’s a long enough blog for now. Next time I will add the square root function and configure the two functions to operate together. You can find my PSoC Creator development here. Please be aware that you will need to download the Square Root component before you will function top generate and compile the project without an error. I will describe how to do that that in my next blog on the square root generation.