fpu.rar_VHDLFPU_fpu_fpuverilog_fpu.rar_双精度

共12个文件

vhd：10个

txt：1个

pdf：1个

版权申诉

80 浏览量 2022-09-22 20:02:15 上传评论收藏 119KB RAR 举报

标题中的“fpu.rar_VHDL FPU_fpu_fpu verilog_fpu.rar_双精度”揭示了这个压缩包的内容主要是关于FPU（浮点处理单元）的设计资料，使用了两种不同的硬件描述语言——VHDL和Verilog，且特别强调了双精度计算能力。描述中提到的“fpu_d”可能是指一个双精度版本的FPU实现，它是一个VHDL编写的FPU，可用于实际应用。浮点处理单元（FPU）是计算机系统中用于执行浮点运算的硬件模块，如加、减、乘、除以及更复杂的数学运算。在数字信号处理、科学计算和图形渲染等领域，双精度浮点运算尤为重要，因为它提供了更高的精度，尽管以牺牲速度和功耗为代价。 VHDL（Very High Speed Integrated Circuit Hardware Description Language）是一种硬件描述语言，用于设计和验证数字电子系统的功能。开发者可以用VHDL来描述FPU的逻辑结构，包括算术逻辑单元（ALU）、寄存器、控制单元等，并进行仿真和综合，最终生成能在FPGA（现场可编程门阵列）或ASIC（专用集成电路）上实现的电路设计。 Verilog是另一种广泛使用的硬件描述语言，与VHDL类似，但语法有所不同。Verilog同样可以用来描述FPU的设计，可能包含不同的模块，如浮点数到整数转换器、舍入逻辑、异常处理等。在压缩包内的文件“fpu”，可能是包含了整个FPU设计的VHDL或Verilog源代码。这些代码可能分为多个子模块，每个模块负责特定的运算或功能。例如，可能有一个名为“double_precision_alu.vhd”或“double_precision_alu.v”文件，用于实现双精度ALU；还可能有“fp_control.vhd”或“fp_control.v”文件，用于处理指令解码和控制信号生成。学习和理解这些源代码可以帮助我们了解浮点运算的底层工作原理，以及如何用硬件描述语言实现这些运算。这不仅对于数字系统设计者来说是宝贵的资源，也对那些希望深入理解计算机体系结构和数字逻辑设计的学生非常有益。通过阅读和分析这些代码，我们可以看到如何处理浮点数的编码（如IEEE 754标准），如何执行浮点运算的步骤，以及如何处理溢出、下溢、零除等异常情况。此外，还可以了解到如何优化硬件设计以提高性能，如流水线处理、并行计算等技术的应用。

资源推荐

资源详情

资源评论

收起资源包目录

fpu.rar （12个子文件）

fpu

fpu_exceptions.vhd 15KB

fpu_round.vhd 6KB

comppack.vhd 5KB

fpu_double.PDF 100KB

fpu_div.vhd 15KB

Readme.txt 6KB

fpu_double.vhd 11KB

fpu_add.vhd 7KB

fpupack.vhd 3KB

fpu_double_TB.vhd 22KB

fpu_mul.vhd 10KB

fpu_sub.vhd 8KB

The following describes the IEEE-Standard-754 compliant, double-precision floating point unit, written in VHDL. The module consists of the following files: 1. fpu_double.vhd (top level) 2. fpu_add.vhd 3. fpu_sub.vhd 4. fpu_mul.vhd 5. fpu_div.vhd 6. fpu_round.vhd 7. fpu_exceptions.vhd 8. fpupack.vhd 9. comppack.vhd And a testbench file is included, containing 50 test-case operations: 1. fpu_double_TB.vhd This unit has been extensively simulated, covering all 4 operations, rounding modes, exceptions like underflow and overflow, and even the obscure corner cases, like when overflowing from denormalized to normalized, and vice-versa. The floating point unit supports denormalized numbers, 4 operations (add, subtract, multiply, divide), and 4 rounding modes (nearest, zero, + inf, - inf). The unit was synthesized with an estimated frequency of 185 MHz, for a Virtex5 target device. The synthesis results are below. fpu_double.vhd is the top-level module, and it contains the input and output signals from the unit. The input and output signals to the unit are the following: 1. clk (global) 2. rst (global) 2. enable (set high, then low, to start operation) 3. rmode (rounding mode, 2 bits, 00 = nearest, 01 = zero, 10 = pos inf, 11 = neg inf) 4. fpu_op (operation code, 3 bits, 000 = add, 001 = subtract, 010 = multiply, 011 = divide, others are not used) 5. opa, opb (input operands, 64 bits, Big-endian order, bit 63 = sign, bits 62-52 exponent, bits 51-0 mantissa) 6. out_fp (output from operation, 64 bits, Big-endian order, same ordering as inputs) 7. ready (goes high when output is available) 8. underflow 9. overflow 10. inexact 11. exception - see IEEE 754 definition 12. invalid - see IEEE 754 definition The unit was designed to be synchronous with one global clock, and all of the registers can be reset with an synchronous global reset. When the inputs signals (a and b operands, fpu operation code, rounding mode code) are available, set the enable input high, then set it low after 2 clock cycles. When the operation is complete and the output is available, the ready signal will go high. To start the next operation, set the enable input high. Each operation takes the following amount of clock cycles to complete: 1. addition : 20 clock cycles 2. subtraction: 21 clock cycles 3. multiplication: 24 clock cycles 4. division: 71 clock cycles This is longer than other floating point units, but supporting denormalized numbers requires more signals and logic levels to accommodate gradual underflow. The supported clock speed of 185 MHz makes up for the large number of clock cycles required for each operation to complete. If you have a lower clock speed, the code can be changed to reduce the number of registers and latency of each operation. I purposely increased the number of logic levels to get the code to synthesize to a faster clock frequency, but of course, this led to longer latency. I guess it depends on your application what is more important. The following output signals are also available: underflow, overflow, inexact, exception, and invalid. They are compliant with the IEEE-754 definition of each signal. The unit will handle QNaN and SNaN inputs per the standard. I'm planning on adding more operations, like square root, sin, cos, tan, etc., so check back for updates. Multiply: The multiply module is written specifically for a Virtex5 target device. The DSP48E slices can perform a 25-bit by 18-bit Twos-complement multiply (24 by 17 unsigned multiply). I broke up the multiply to fit these DSP48E slices. The breakdown is similar to the design in Figure 4-15 of the Xilinx User Guide Document, "Virtex-5 FPGA XtremeDSP Design Considerations", also known as UG193. You can find this document at xilinx.com by searching for "UG193". Depending on your device, the multiply can be changed to match the bit-widths of the available multipliers. A total of 9 DSP48E slices are used to do the 53-bit by 53-bit multiply of 2 floating point numbers. If you have any questions, please email me at: davidklun@gmail.com Thanks, David Lundgren ----- Synthesis Results: Performance Summary ******************* Worst slack in design: -2.049 Requested Estimated Requested Estimated Clock Clock Starting Clock Frequency Frequency Period Period Slack Type Group ---------------------------------------------------------------------------------------------------------------------- fpu_double|clk 300.0 MHz 185.8 MHz 3.333 5.382 -2.049 inferred Inferred_clkgroup_0 ====================================================================================================================== --------------------------------------- Resource Usage Report for fpu_double Mapping to part: xc5vsx95tff1136-2 Cell usage: DSP48E 9 uses FD 3 uses FDE 21 uses FDR 587 uses FDRE 3767 uses FDRS 8 uses FDRSE 51 uses GND 6 uses MUXCY 20 uses MUXCY_L 598 uses MUXF7 2 uses VCC 6 uses XORCY 497 uses XORCY_L 5 uses LUT1 187 uses LUT2 742 uses LUT3 1591 uses LUT4 847 uses LUT5 589 uses LUT6 2613 uses I/O ports: 206 I/O primitives: 205 IBUF 135 uses OBUF 70 uses BUFGP 1 use I/O Register bits: 0 Register bits not including I/Os: 4437 (7%) Global Clock Buffers: 1 of 32 (3%) Total load per clock: fpu_double|clk: 4446 Mapping Summary: Total LUTs: 6569 (11%) Mapper successful!

评论收藏

内容反馈

版权申诉