Post on 05-Jul-2020
Jennifer Dworak, Xi Shen, Micah Thornton, Ted Manikas, Mitch Thornton, (SMU), Al Crouch
(ASSET), Chad Augisnash (FDLTCC), Kundan Nepal (University of St. Thomas), Iris Bahar
(Brown University)
Why 3D? � Higher Performance:
� Vertical distance from chip-‐to-‐chip is ~30 microns � Often smaller than route across a chip � Much smaller than a connection to another chip on a board
� Smaller Form Factor: � Can combine multiple technologies into a single stack with different chips
� More Connections � May have 10,000 Through-‐Silicon-‐Via’s in a stack vs. a much smaller number of pins available on a board.
3D Architectures � Homogeneous: Multiple copies of the same chip stacked on top of each other.
� Heterogeneous: � Single company for several, if not all, die
� A single design can be partitioned across multiple layers for optimal performance
� Competitive Socket: � Each die in the stack forms a distinct purpose that could be fulfilled by die from multiple companies: � E.g. Get a DSP from Texas Instruments or Analog Devices…
Example Compe88ve Socket 3D Stack
Microprocessor
Memory
Memory
Interposer
ASIC
Analog/ RF
What makes 3D Stacks Unreliable? � The die could be defective before assembly
� Test Escape � Environmentally Sensitive � Die-‐to-‐die, Die-‐to-‐Wafer, Wafer-‐to-‐Wafer
� TSV could be manufactured defectively � Grinding the die to expose the TSV’s could damage the die or TSVs
� Probes testing the die can damage the TSVs � Aging, wearout, extra heat, etc. may cause failures in the field
What is the effect of normal test escapes on the stack?
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0 2 4 6 8 10
Defec
tive
Stack
s Per
Million
Number of Die in the Stack
100 dppm
300 dppm
500 dppm
1000 dppm
Costs of some poten8al die in the stack
Blackfin 400MHz 16-‐bit Fixed Point DSP:
$19
Micron 128 MB Flash:
$3.43
Alliance Memory SRAM: 8MB*4
$68
Xilinx FPGA: $25-‐$50
Intel Atom Processor: $90
So…what does this mean? Unlike on a board, we can’t simply de-‐solder and replace a defective die. We need to throw out the whole stack. The more die there are in the stack, the more expensive this is likely to be….
We need some way of either identifying a defect early, or repairing it….
FPGA’s to the rescue!
Ok…they might not solve everything, but they can help!
But will FPGA’s really be present in the stack?
Why put an FPGA in a 3D Stack? � Good for anything that may need to be updated in the field. � E.g. Communication protocols
� Often cheaper than an ASIC � Can be used to provide hardware acceleration � Can be used to aid in test, self repair, and fault tolerance
� Ultimately, you could make the entire stack from FPGA’s...
Real FPGA’s with TSV’s � Xilinx is making FPGA’s with Stacked Silicon Interconnect: Virtex-‐7 7V1500T, 7V2000T, 7VH290T, 7VH580T, 7VH870T
Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGACapacity, Bandwidth, and Power Efficiency, by Kirk Saban, WP380 (v1.1) October 21, 2011
Passive Silicon Interposer interconnects multiple FPGA SLR’s (super logic regions) together.
Advantages of Xilinx Virtex 7 SSI Solu8ons for mul8-‐FPGA designs � Overcomes the I/O pin limitations
� 1200 I/O pins on a package vs. more than 10,000 die to die connections
� TSV drivers do not have to deliver the same currents and handle the same voltages as chip-‐to-‐chip I/O’s.
� Pin-‐to-‐Pin delays are much less � Time division multiplexing is not needed � Power penalty is much less than for standard I/O’s
All of these advantages can be used to help harness FPGA’s for reliability….
FPGA Controlled Test � When should die be tested?
� Before assembly � During stack assembly � After stack assembly
� The earlier I know that a die is bad, the better � Testing the stack as each die is added is difficult
� Expensive � All of the functionality is not there yet—making functional test difficult
� Solution: FPGA’s can be reprogrammed multiple times to serve as an embedded tester.
FPGA
FPGA as an embedded tester � If we add an FPGA to the the stack first, it can serve to test the devices it has connections with… � Memory BIST � Functional Patterns for Microprocessor from the non-‐existent board/ASIC/Analog
� LBIST/Scan for ASIC � Pseudo Functional for ASIC
� Bus Communication protocols for each layer
Microprocessor
Memory
Memory
Interposer
ASIC
Analog/ RF
Some issues that must be addressed to do this in 3D…
� Appropriate Test Architecture: IEEE 1149.1/IEEE 1687/ etc. for scan test and to access instruments in die
� Possibly different voltage levels on different die must be appropriately converted
� FPGA must have access to data/address buses to perform functional test
� FPGA placement will significantly impact what it can test � Pass-‐through FPGA’s on upper die are likely to be needed. � Tools are needed to efficiently operate the FPGA tester for multiple test sessions of different types.
Test is good….but what do you do if you find a problem?
FPGA’s may be able to help with this, too.
Throwing out an en8re 3D stack is expensive…how do we repair die? � ASIC solution: Provide multiple copies of your die/multiple identical cores on a die as spares. � Potentially quite expensive as well—need one or more spares for everything you might want to repair.
� Some overhead/planning required to enable a switch to a spare
� Spare will be an almost perfect replica of the original and should give same/similar performance
Throwing out an en8re 3D stack is expensive…how do we repair die? � FPGA Solution: Identify the portion of the original die that it defective and replace it with circuitry in the FPGA. � May significantly lower performance � Overhead/planning required to enable a switch to the FPGA—partitioning must be decided ahead of time.
� Programming for the spare implemented in the FPGA must be stored somewhere
� Diagnosis required to determine what to replace.
Conceptually, what might this look like? Die to be repaired
Partition 1
Partition 1
Partition 1
Partition 1
Defective Partition
FPGA
Implementa8on Notes � A single set of TSV’s may be used to potentially repair one of several partitions. � Use a one-‐hot decoder to select the partition that is defective by selecting the enable signals on the tristate buffers and select inputs on the muxes.
� If the FPGA and the ASIC are not at the same voltage, this must be handled when passing the signals.
� We may want to shut off power to the defective partition
TSV connec8ons within the par88on may be tricky
� If it is a bus that is already connected to FPGA and other things, just need to tristate signals in defective partition
� Otherwise, a routable interposer could possibly be used.
FPGA
ASIC
Interposer
ASIC
FPGA
ASIC
Interposer
ASIC
So…how do we decide on par88ons? � Space available in the FPGA � Performance Loss
� FPGA’s are often slower than ASIC’s � May not always be true if they are implemented in different technology nodes
� How to handle I/O. Should only flip-‐flops/registers be partitioning points?
� What functionality is worth protecting? � Should we partition to minimize the number of TSV’s?
Experiment on Performance Loss � Timing Comparison for FPGA vs. ASIC � ISCAS 89 Circuits
� Different sizes � Consider them to be part of a larger circuit
� “ASIC” analysis: � Synopsys Design Compiler/Synopsys PrimeTime � 90 nm and 32 nm Libraries
� “FPGA” analysis � Xilinx ISE � Compiled for Artix7 -‐3 � “Balanced” Optimization
Xilinx Ar8x 7 FPGA � 28 nm process � High Performance, low power, low cost � 65% lower static power and 50% lower total power compared to 45 nm devices
� Suggested for 3D TV, Automotive Applications, Handheld Communication, Digital SLR Cameras, Medical Devices, Industrial Monitor and Control
Comparison between 90nm and FPGA delay for circuits with registered I/O
0 10 20 30 40 50 60 70
s420 s641 s713 s820 s832 s838 s953 s1196 s1238
% In
crea
se in
Clock
Per
iod
ISCAS89 Circuits with Registered I/O
Comparison between 32nm and FPGA delay for circuits with registered I/O
0
100
200
300
400
500
600
s420 s641 s713 s820 s832 s838 s953 s1196 s1238
% In
crea
se in
Clock
Per
iod
ISCAS 89 circuits with regsitered I/O
Performance Conclusions � Some circuits are more amenable to replacement in FPGA at low performance loss than others
� Partitioning should take this into account � We can also devise ways to mitigate the impact of the performance loss in an individual piece of circuitry � Especially true for certain types of circuits and the functionality of certain partitions.
Error Detec8on with FPGAs � Online detection of errors is also possible with an FPGA in a 3D stack
� Portions of the design may be instantiated in the FPGA in either the original or a simplified form to provide logic duplication.
� If complete coverage of all clock cycles is not needed, the speed differential between the two technologies (FPGA and ASIC) can be mitigated.
Hardware Monitoring with FPGA’s � Just as in test, FPGA’s in a 3D stack can be reprogrammed to provide different hardware monitoring capability for the signals they have access to � the data and address buses, � selected signals routed through the through-‐silicon-‐vias, etc.
Security Issues � FPGA’s in the stack can also be used for nefarious purposes. � Need to protect the IP that will be programmed into the FPGA for repair
� Need to prevent someone from re-‐programming the FPGA to monitor the device
� Need to prevent someone from re-‐programming the FPGA to corrupt the circuit operation.
� Encryption can help, but need to protect against power analysis attacks as well.
Conclusions � There are multiple domains where defects and errors may enter a 3D stack.
� Problems in a 3D stack are often more expensive (throwing out the stack instead of a single die)
� FPGA’s are likely to be present in the stack already for a variety of reasons.
� These FPGA’s may be harnessed for monitoring, test, and repair.
� Research is needed regarding partitioning of the die for analysis and repair as well as securing the process from attackers.
Comparison between 90nm and FPGA delay
-‐40
-‐20
0
20
40
60
80
100
120
s27
s208
s298
s344
s349
s382
s386
s400
s420
s444
s510
s526
s641
s713
s820
s832
s838
s1196
s1238
s142
3 s148
8 s149
4 s5378
s9234
s13207
s158
50
s359
32
% increase in time of FGPA over 90nm
Comparison between 32 nm and FPGA delay
0 100 200 300 400 500 600 700 800 900
s27
S298
s344
s349
s382
s386
s400
s420
s444
s510
s526
s641
s713
s820
s832
s1196
s1238
s142
3 s5378
s9234
s13207
s158
50
s359
32
% increase in time of FGPA over 32nm