# Skew Reduction Technique Using the Clock Mesh Analysis in DDR IP Structural Design Development for Server Products

## Sowmya Sunkara, Harshitha B, Ashwini V, Suhasini Madannavar, Shrisha M R

Dept. of ECE, BMSCE, Bangalore, Karnataka, India

sowmi.ece[at]bmsce.ac.in

**Abstract:** As the technology node decreases timing closure has become the major bottleneck in the physical designs. Backend designers are expected to meet the clock frequency specified by the top level system architects instead of growing the size and timing convergence complexities of today's backend designs. One of the main convergence complexities of today's design is the process variation. Earlier method like clock tree synthesis is used to reduce the skew for slow processor clocks. Clock tree mesh (CT mesh) is used for the higher end processes. Clock tree synthesis is not resistant to On Chip Variation (OCV) but the clock mesh is resistant to OCV and yields the higher performance. The OCV can also be removed by using the guard banding, but by doing so we need to compromise the performance. This is the reason for performing clock mesh analysis. In this paper we have mainly concentrated on the reduction of the global skew and the local skew.

Keywords: clock mesh, OCV, de-rating factor

## 1. Introduction

For high Processor speed IC (integrated circuit) designs, there is contrast between the clock skew and the power consumption because clock mesh consumes more routing resources and more power. Even though the clock tree synthesis consumes less power, clock mesh is preferred for High end ASIC designs [1]. The clock mesh implementation has many challenges like High performance and High variation tolerance. Chip variation across all the timing corners occurs because of the on chip effects that are global across the given wafer or die. Clock metal layer will be thinner than usual due to the chemical engineering polishing. It also has higher resistance wires in the path scale w.r.t to the buffers in the clock timing paths. Therefore the timing which is met in one timing corner will not meet in other timing corner. Due to decrease in the transistor channel width, the technology contributes to **On** Chip Variations (OCV) by maintaining constant de-rating factor. In 65nm technology there are only 100 dopant atoms present in the channel width. The number of dopant atoms decreases at the lower nodes. In this paper clock mesh implementation is explained where the global and local skew are reduced effectively [2].

Global skew is between the two independent flops and related to nearest and farthest flops while the local skew is between the two dependent flops.

#### **Clock Mesh Design**

Mesh fabric design is critical in achieving a balance of skew performance and design resource conservation. The parameters that have the greatest impact are the mesh spine width and pitch. As the mesh spine width increases, the skew to the mesh receivers decreases. [3] Firstly the net is routed from PLL to all DOPs. Such that the effective length is reduced and this further helps to reduce the overall clock skew and also provides the high quality route to point where logic consumes the clock. To achieve this balancing algorithm is used.



Figure 1: Flow from clock port to flip flops

Clock meshes are shorted grid of metal wires and driven by most of the clock drivers. The purpose is to reduce the clock skew variation mainly to reduce clock skew we need to trace the whole tree from clock root pin to flop and look out if the alternative route exists [4]. We should also check delays which are associated with each cell and clock gate placement. We try to reduce the latency which is a part of the clock tree synthesis (CTS engine). The Clock Tree Synthesis is achieved while building clock tree. Clock Tree mesh is one of the effective approaches for power reduction in clock networks. The clock network has clock buffers and inverters which are specially designed cells [5]. These clock cells are utilized optimally in Clock Tree mesh to address clock requirements.

#### **Clock Mesh Analysis**

The circuit is reduced to avoid large number of combination of drivers and loads in the full mesh analysis. If the mesh has n number of drivers and m number of loads, there are n x m timing arcs between the drivers and loads in the mesh circuit to a single driver [6]. The number of drivers to load timing arcs is reduced to just m. hence the skew is reduced and is also helps removal of clock reconvergence pessimism. It is the difference in delay along

Volume 8 Issue 9, September 2020 <u>www.ijser.in</u> Licensed Under Creative Commons Attribution CC BY the common part of launching and capturing clock paths [7].



Figure 2: Timing arc delay by spice simulation

The main reason for the presence of re-convergent paths in the clock network is due to different minimum and maximum delay cells in the clock network [8].Since it is the accuracy limitation of static timing analysis. The main reason for inaccuracy is when the tool compares the two different clock paths that share the data path assumes the data path has min delay for one path and max delay for another path. The prerequisite for creating the clock mesh is that, the design contains at least one high fan-out clock net [9].

## **Clock Mesh Flow**



Figure 3: Flow chart depicting clock mesh flow

As shown in flowchart in Figure 3 clock mesh flow is as follows:

**Placed design:** Clock mesh flow starts with the placed design where the clock network has not been built.

**Removal of clock gating cells:** As we know that the clock gating cells are useful for the reducing power [10]. They also introduce the skew variation as they have large number of levels. Hence we will limit the depth of the integrated clock gating cells, most preferably to a single level.

**Splitting of clock nets:** Since the clock mesh flow starts where the clock network has not been built yet. The clock port might be driving the clock input of the registers or latches or the macros or some of the floating nets. Because of this large number of sinks cause huge load on the clock network [11]. Hence we need to split the clock network if it is driving the macros. After this we need to isolate the float pin delays of macro and the other pins. While doing so, the duplicating of the clock gates are carried out. Next thing is we need to decide whether we need to consider the pin capacitance and phase delay of the pins while doing so. Splitting of the clock nets automatically balance the clock trees and performances and the clustering to improve the clock skew [12]

**Creating the clock meshes**: The clock meshes are created using the command, create\_clock\_mesh (10X10)".

Adding the clock mesh drivers: Drivers are added because they have high strength to drive the clock mesh.

**Routing of the clock nets and mesh nets:** Mesh drivers and pre-mesh trees are routed first and then the detail routing to the clock nets are performed [13].

## 2. Results

After the implementation of the clock mesh following are the results which are observed.

 
 Table 1: Congestion analysis before the implementation of clock mesh region wise

| Region    | Total overflow | Max<br>overflow |
|-----------|----------------|-----------------|
| H-routing | 1              | 0               |
| V-routing | 0              | 1               |

**Table 2:** Congestion analysis after the implementation of clock mesh

| Region    | Total overflow | Max<br>overflow |
|-----------|----------------|-----------------|
| H-routing | 1710           | 25              |
| V-routing | 30             | 25              |

#### **Clock tree slope values**

| Table 3: Clock tree slope values |       |      |  |
|----------------------------------|-------|------|--|
|                                  | Slope | Pins |  |
|                                  | 4:6   | 43   |  |
|                                  | 6:8   | 31   |  |
|                                  | 8:1   | 151  |  |
|                                  | 10:12 | 473  |  |
|                                  | 12:14 | 403  |  |
|                                  | 14:16 | 1304 |  |

#### Global clock skew values

**Table 4:** Global clock skew values before implementation of algorithm

| Clock   | Sink  | Standard  | Latency | Global |
|---------|-------|-----------|---------|--------|
|         | level | cell area |         | skew   |
| Dclk    | 1020  | 39.27     | 170     | 170    |
| Hclk    | 630   | 8.59      | 0       | 170    |
| Fscan_0 | 616   | 8.05      | 34.87   | 34.89  |
| Fscan_1 | 368   | 30.31     | 0       | 0      |
| Fscan_2 | 1020  | 39.27     | 170     | 100    |
| Fscan_3 | 118   | 8.05      | 70      | 70     |
| Fscan_4 | 132   | 29.11     | 600     | 0      |

 Table 5: Global clock skew values after implementation

 of the algorithm

|       | U                                                                |                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                 |
|-------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Sink  | Standard                                                         | Latency                                                                                                                                                                                                                                                    | Global                                                                                                                                                                                                                                                                                                                                                          |
| level | cell area                                                        |                                                                                                                                                                                                                                                            | skew                                                                                                                                                                                                                                                                                                                                                            |
| 1020  | 39.27                                                            | 104                                                                                                                                                                                                                                                        | 104                                                                                                                                                                                                                                                                                                                                                             |
| 630   | 8.59                                                             | 0                                                                                                                                                                                                                                                          | 0                                                                                                                                                                                                                                                                                                                                                               |
| 616   | 8.05                                                             | 13.34                                                                                                                                                                                                                                                      | 12.45                                                                                                                                                                                                                                                                                                                                                           |
| 368   | 30.31                                                            | 0                                                                                                                                                                                                                                                          | 0                                                                                                                                                                                                                                                                                                                                                               |
| 1020  | 39.27                                                            | 117.2                                                                                                                                                                                                                                                      | 45.23                                                                                                                                                                                                                                                                                                                                                           |
| 118   | 8.05                                                             | 34                                                                                                                                                                                                                                                         | 36                                                                                                                                                                                                                                                                                                                                                              |
| 132   | 29.11                                                            | 350                                                                                                                                                                                                                                                        | 0                                                                                                                                                                                                                                                                                                                                                               |
|       | Sink<br>level<br>1020<br>630<br>616<br>368<br>1020<br>118<br>132 | Sink         Standard           level         cell area           1020         39.27           630         8.59           616         8.05           368         30.31           1020         39.27           118         8.05           132         29.11 | Sink         Standard         Latency           level         cell area         1020         39.27         104           630         8.59         0         0         616         8.05         13.34           368         30.31         0         1020         39.27         117.2           118         8.05         34         132         29.11         350 |

## 3. Conclusion

The clock mesh is implemented can be observed from the results that max overflow is increased and the global skew is reduced along with latency. Further reduction in the global skew will lead to the max Tran violations and expected clocks are not achieved. Further reduction in skew will be best observed as in the local skew analysis.

# References

- [1] He Qi. The Application of Clock Mesh in ASIC Design. Da Lian, China 2011.
- [2] Rabaey J M, Chandrakasan A, Nikolic B. Digital Integrated Circuits. Prentice Hall 2002
- [3] GUILHETME FLACH, GUSTAVO WILKE, MARCELO JOHANN, et al, A Mesh-buffer Displacement Optimization Strategy, IEEE Annual Symposium on VLSI.2010
- [4] Encounter Digital Implementation System User Guide.CADENCE.2013
- [5] An Efficient Tile-Based ECO Router Using Routing Graph Reduction and Enhanced Global Routing Flow.
- [6] An Integer Linear Programming Based Routing Algorithm for Flip-Chip Design.
- [7] Fixing lithography hotspots on routing without timing discrepancy
- [8] An automated methodology to fix electro migration violations on a customized design flow Lucas deParis; Ricardo Reis2018IEEE 9th Latin American Symposiumon Circuits & Systems (LASCAS)
- [9] A Systematic Approach for Analyzing and Optimizing Cell-Internal Signal Electro migration.
- [10] Critical thermal issues in nanoscale IC design (Lei Jiang; Daniel Pantuso; Per G. Sverdrup; Wei-kai Shih)
   2009 IEEE International Reliability Physics Symposium.

- [11] S. Rochel and N. Nagaraj, "Full-chip interconnect analysis for electro migration reliability," in Proc. IEEE 1st ISQED, 2000, pp. 337–
- [12] K. BanerjeeA. MehrotraA. Sangiovanni-Vincentelli, "On thermal effects in deep sub-micron VLSI interconnects" Proceedings 1999 Design Automation Conference (Cat. No.99CH36361) June 1999
- [13] https://www.synopsys.com/content/dam/synopsys/ser vices/datasheets/designflowdeployment.pdf
- [14] http://cc.ee.ntu.edu.tw/~ywchang/Courses/PD\_Source /E DA\_routing.pdf