1 year ago · ba7abb9896
--- a/README.md
+++ b/README.md
@@ -53,9 +53,11 @@ Unlike other distributed inference frameworks, exo does not use a master-worker
 
				 
			
 
				 Exo supports different partitioning strategies to split up a model across devices. The default partitioning strategy is [ring memory weighted partitioning](exo/topology/ring_memory_weighted_partitioning_strategy.py). This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.
			
 
				 
			
 
				-<picture>
			
 
				-  <img alt="ring topology" src="docs/ring-topology.png" width="30%" height="30%">
			
 
				-</picture>
			
 
				+<p>
			
 
				+    <picture>
			
 
				+        <img alt="ring topology" src="docs/ring-topology.png" width="30%" height="30%">
			
 
				+    </picture>
			
 
				+</p>
			
 
				 
			
 
				 
			
 
				 ## Installation
			
@@ -98,7 +100,7 @@ That's it! No configuration required - exo will automatically discover the other
 
				 
			
 
				 The native way to access models running on exo is using the exo library with peer handles. See how in [this example for Llama 3](examples/llama3_distributed.py).
			
 
				 
			
 
				-exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000. Note: this is currently only supported by tail nodes (i.e. nodes selected to be at the end of the ring topology). If you want to force a node to be the tail, set its node-id to be sorted last alphabetically on start e.g. `python3 main.py --node-id xxxnode-mac-mini" Example request:
			
 
				+exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000. Note: this is currently only supported by tail nodes (i.e. nodes selected to be at the end of the ring topology). Example request:
			
 
				 
			
 
				 ```
			
 
				 curl http://localhost:8000/v1/chat/completions \