
TL;DR
You can play with the AI Strategy app and get all the tips you need from the tool tips in Step 1 and 2.
Intro
So you want to build an AI-powered app? Great! Where should you start? Having seen AI apps that have succeeded and some that have failed miserably on the unforgiving altar of unmet expectations, I am more convinced than ever that before any APIs are accessed or RAG pipelines planned, every AI project should kick off with strategically deciding what models the app should use.
That’s where things get messy. I’ll be addressing the issue of grandiose claims—unsupported by evidence yet magnified by those parroting their claims of nearing AGI (artificial general intelligence)—in the series I’ll be kicking off with. But this app will guide you through the somewhat opaque process of finding models that are best suited for the tasks you need your app to perform.
Demo
Step 1: Enter info about what your app’s purpose

For the purpose of this demo, we’ll do a walk-through of a math app. Our math app will have a chat component, and users will be able to enter math problems as text or image. So we’ll need to select:
- Chat
- Process multimodal data
- Solve math problems
And we’ll select all the benchmarks to compare across models:
- Quality
- Coast
- Speed
- Latency
- Context window
Tip: If you click on the green ⓘ next to each step, you’ll access lots of tips to help guide the discovery process. You’ll need to scroll to see all of them.
Step 2: Generate network graph
Click the Compare models button to generate a network graph.

If you have any perfectionist tendencies, you will want to tweak the graph. If you tackle one task at a time, your graph will be much better behaved. I purposely chose a graph with a lot of nodes to demonstrate the chaos of an interactive network graph.
I experimented with many settings to minimize this chaos, but you’ll still need to do some cleanup, if you choose to throw all caution to the wind and research multiple tasks at one time. I actually find it somewhat cathartic to detangle these nodes. 👀
You can see in the video below how to pull the nodes to detangle them. One thing that just about drove me to drinking is the random orientation of the top-level nodes (i.e., the orange task nodes). Ergo, I added the ability to adjust the orientation with a slider in the upper-right corner. But I recommend detangling them first because each time you move a node, the orientation will also change.
Step 3: Explore the graph
Node structure
First, let’s address the structure. Each of those dots is called a node. Let’s break down what each one represents:
- Model hub: This is just the glue that holds the rest of the graph together.
- Task: There should be one for each task you selected.
- Benchmark to compare: These are the benchmarks you selected in the form. Just keep in mind that some leaderboards are pretty light on data, so they may not have data for each benchmark you opt to compare.
- Leaderboard: This is the belle of the ball. If you click on one of these, you’ll get a modal (pop-up) with lots of details. (More on that later in the post.)
- Benchmark: Hovering over this node reveals a definition of each benchmark for the leaderboard it’s attached to. Be warned: Leaderboards may use the same benchmark name for metrics that are very different, which is especially problematic with some of the math leaderboards.
What’s a leaderboard?
A leaderboard is just geek speak for a dashboard with some kind of ranking system that tracks and displays the performance of different models on a specific dataset or benchmark task. It typically includes the following components:
- Rankings: Orders the models based on their performance, with the best-performing models appearing at the top.
- Performance metrics: Displays quantitative measures such as accuracy, F1 score, BLEU score, or other task-specific metrics.
- Dataset: Specifies the dataset or challenge for which the models are evaluated, ensuring fair comparisons.
- Visualizations: Where many leaderboards are just tables, some include visualizations to make comparisons easier to process. The Artificial Analysis leaderboard is one of my faves because of its inclusion of visualizations. The What LLM Provider leaderboard uses the data from the Artificial Analysis leaderboard and gives users full control of how they want to visualize it, including filters for providers and models. The only thing you need to be careful with this leaderboard is the guy who updates it doesn’t do so very frequently, so at the time of publishing (1/22/25) it hadn’t been updated since 10/13/24. (This date is at the bottom of the page.)
- Additional information: May include metadata like model size, creator, provider, if it’s open or proprietary, if it’s suspected of cheating (that’s a topic for a separate post), etc.
Leaderboard variation
As you’ll see in the slides below, leaderboards can vary wildly in methodology, design, amount of data, degree of transparency, etc. Here’s a small sampling of the leaderboards you’ll see represented in the tool—and these are small snapshots of the often-sprawling leaderboards.
Step 4: Dive into leaderboard details
Once you decide on a leaderboard to explore further, select it to open the modal.

Here’s a breakdown of a typical leaderboard pop-up modal:
- Leaderboard name: Pretty self explanatory.
- Summary: A description of the leaderboard.
- Sources: The first opens the leaderboard in a new tab, and the second opens its methodology page. Typically this is an Arxiv paper. Sometimes it’s a simplified list of bare-bone details.
- Tips: I’ve taken deep dives into each of these leaderboards to point out tips and tricks. Sometimes there are features that are buried in a leaderboard that are quite cool. And sometimes the features are perplexing; I try to demystify those as much as possible.
- Links: Sometimes I link to other resources that may help in your analysis in the tips.
- Benchmarks: These are descriptions of each of the benchmarks the leaderboard offers that are relevant to the task you selected.
Tool Features
Here’s a breakdown of features you’ll see on the canvas:

- Info icon: Click for more information and tips.
- Search: Search the dashboard. Any node that contains your search term will be full opacity, while the other nodes will appear faded (example). If your search term is in a leaderboard node (the green ones) it will be highlighted (example).
- Include all tasks: By default, the search will only return results from the nodes that are currently visible in the dashboard. However, if you want to include all tasks/nodes, activate this toggle (example).
- Orientation slider: After you detangle your nodes, you can adjust the orientation of the network graph using this slider.
- Hover text: Hover over a node to get more info about it. Only the green leaderboard nodes have additional information in a modal.
- Resizer handle: Need more room for all your nodes? Simply drag this handle to adjust the canvas size. You can also use the mouse wheel (or its equivalent) to zoom in and out, which you’ll want to do after resizing the canvas.
Moving forward
I will be using the AI Timeline and AI Strategy apps in concert for future posts. If you are working in AI, I highly recommend subscribing to my blog to be notified of new content.
Credits
Feature Image
Michal Parzuchowski via Unsplash
Music
Benjamin Tissot via BenSound (license ID: KHQUF6S8WFK3XMEG)
Leave a Reply