The AI chip contains a plurality of processing elements (PEs), so its size is very large and the existing power analysis method that can only analyze using the entire chip requires a lot of time and resources. Baum has the technology to create a power model for a block and quickly analyze the power of the block by using the model. We plan to develop the following two methods so that the technology can be used efficiently in AI chips.
- Create only one power model for the PE and quickly analyze the power of the PE array at one time
- Create a power model for the rest of the AI chip except for the PE array, and use it with the power model for the PE to quickly analyze the power of the entire chip
Clock tree synthesis (CTS) is performed at the late stage of the design flow. Since the power consumption of the clock network is very large, if this value is calculated after CTS, the turn-around time to satisfy the power constraint increases, and the time-to-market is also growing. The method we want to develop is not to obtain the power value directly from the neural network, but to obtain the clock tree components necessary for calculating the power value, such as the number of clock buffers, the number of clock gating cells, and load capacitance, from the neural network. Then clock power is calculated in transient form based on the predicted clock tree components and signal switching information.
The AI chip generates a lot of heat because it is a design with high complexity including multiple processing elements (PEs), so it is necessary to design so that the thermal density is evenly distributed throughout the chip, thereby eliminating durability damage due to overheating and excessive leakage power. Thermal analysis requires the power value for each location inside the chip. The power analysis method provided by Baum can quickly obtain the power waveform of the blocks inside the chip, and the location information of the blocks is given from the floorplan. And U-net and recurrent neural network style machine learning that reflects the correlation between thermal maps will be used since transient thermal analysis requires temperature information from the previous time.
Timing and power characterization of standard cells is too time consuming due to a lot of simulations for hundreds of PVT corners and thousands of std. cell types. This increases SW licenses, and the schedule of circuit design can be affected if the libraries are not prepared in time. To reduce the runtime of library characterization, we characterize the libraries of some corners, and those of remaining corners are predicted using ML models and the characterized ones, in which the number of corners used for training is minimized for further runtime reduction. This figure shows an example of ML prediction using U-net.