newmindai
/

Llama-3.1-8B-Instruct_w16a8_rw_with_gw_hp

@@ -108,10 +108,10 @@ model = convert_to_float8_training(model, config=config)
 | Llama-3.1-8B-Instruct-w16a8-8nodes-bs32  | 31476844 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 4          | 8                     | 1024             |
 | Llama-3.1-8B-Instruct-w16a16-8nodes-bs64 | 31476914 | 22.00          | 8     | 4    | **2.933** | **11.733** | 4           | 8          | 8                     | 1024             |
 | Llama-3.1-8B-Instruct-w16a8-8nodes-bs64  | 31476844 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 8          | 8                     | 1024             |
-| Llama-3.1-8B-Instruct-w16a8-rowwise_4nodes            | 33477070 | 39.75          | 4     | 4    | **2.650** | **10.600** | 4           | 4          | 8                     | 512              |
-| Llama-3.1-8B-Instruct-w16a8-rowwise_with_gw_hp_4nodes | 33477179 | 37.43          | 4     | 4    | **2.495** | **9.982**  | 4           | 4          | 8                     | 512              |
-| Llama-3.1-8B-Instruct-w16a8-rowwise_8nodes            | 33476690 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 4          | 8                     | 1024             |
-| Llama-3.1-8B-Instruct-w16a8-rowwise_with_gw_hp_8nodes | 33476618 | 22.13          | 8     | 4    | **2.951** | **11.802** | 4           | 4          | 8                     | 1024             |
 ### *Training Time Analysision*
 | Model                                               | Training Time (mins) | Memory Allocated (avg %) | GPU Utilization (avg %) | Speed vs bf16 |

 | Llama-3.1-8B-Instruct-w16a8-8nodes-bs32  | 31476844 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 4          | 8                     | 1024             |
 | Llama-3.1-8B-Instruct-w16a16-8nodes-bs64 | 31476914 | 22.00          | 8     | 4    | **2.933** | **11.733** | 4           | 8          | 8                     | 1024             |
 | Llama-3.1-8B-Instruct-w16a8-8nodes-bs64  | 31476844 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 8          | 8                     | 1024             |
+| Llama-3.1-8B-Instruct-w16a8-rw_4nodes            | 33477070 | 39.75          | 4     | 4    | **2.650** | **10.600** | 4           | 4          | 8                     | 512              |
+| Llama-3.1-8B-Instruct-w16a8-rw-8nodes            | 33476690 | 23.50          | 8     | 4    | **3.133** | **12.533** | 4           | 4          | 8                     | 1024             |
+| Llama-3.1-8B-Instruct-w16a8-rw_with_gw_hp_4nodes | 33477179 | 37.43          | 4     | 4    | **2.495** | **9.982**  | 4           | 4          | 8                     | 512              |
+| Llama-3.1-8B-Instruct-w16a8-rw-with-gw-hp-8nodes | 33476618 | 22.13          | 8     | 4    | **2.951** | **11.802** | 4           | 4          | 8                     | 1024             |
 ### *Training Time Analysision*
 | Model                                               | Training Time (mins) | Memory Allocated (avg %) | GPU Utilization (avg %) | Speed vs bf16 |