|
@@ -49,7 +49,7 @@ exo provides a ChatGPT-compatible API for running models. It's a one-line change
|
|
|
|
|
|
Unlike other distributed inference frameworks, exo does not use a master-worker architecture. Instead, exo devices connect p2p. As long as a device is connected somewhere in the network, it can be used to run models.
|
|
|
|
|
|
-Exo supports different partitioning strategies to split up a model across devices. The default partitioning strategy is [ring memory weighted partitioning](topology/ring_memory_weighted_partitioning.py). This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.
|
|
|
+Exo supports different partitioning strategies to split up a model across devices. The default partitioning strategy is [ring memory weighted partitioning](exo/topology/ring_memory_weighted_partitioning.py). This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.
|
|
|
|
|
|
<picture>
|
|
|
<img alt="ring topology" src="docs/ring-topology.png" width="30%" height="30%">
|