We Compared DeepSeek R1 with OpenAI o1 Using 5 Prompts

China’s DeepSeek R1 and the USA’sOpenAI o1are both reasoning models. Instead of answering questions immediately, they take time to think through the prompt using their reasoning process, leading to better and more accurate answers. These models are generally good at handling complex questions related to coding, math, science, or anything requiring serious reasoning skills.

Until now, OpenAI’s o1 model has been leading the industry in reasoning capabilities. However, it is a closed-source AI model accessible only through a$20 paid subscription. Google is also working on its own reasoning model called 2.0 Flash Thinking, but it is still in beta. While promising, it hasn’t quite reached the level of o1 and is only available through Google AI Studio. We will put it through the paces when it is available.

We Compared DeepSeek R1 with OpenAI o1 Using 5 Prompts

On the other side of the map,DeepSeek from China has released its R1 model this week, which is mostly on par with OpenAI’s o1 model but exceeds it in some other areas. It has become the talk of the town ever since. Unlike OpenAI’s o1 model, R1 is open-source, free to use, and has achieved o1 benchmarks at just3% of the cost. Not surprising since China has always been good at doing things in a very cost-effective way. Even the developerAPIs are 90%-95% cheapercompared to the o1 model.

But how good is the R1 AI model and can it really beat the o1 model by ChatGPT? Let’s find out using a couple of prompts.

DeepSeek R1 vs OpenAI o1

To test the claims, we evaluated both OpenAI’s o1 model and DeepSeek’s R1 model with various prompts requiring strong reasoning skills to see if DeepSeek has truly delivered o1-level performance or even surpassed it.

1. Puzzle-Based Reasoning

I started the comparison with a classic puzzle-style question that does not even have a working answer.

So let’s see which model can figure out that it does not have an answer. While theo1 model took just 16 secondsto think,DeepSeek took 120 seconds. However, both models came to the right conclusion, saying there is no way to figure out who is a knight and who is a knave. I found DeepSeek’s explanation much easier to understand than o1’s confusing narrative.

The best part about DeepSeek is you cansee its entire reasoning process, which is quite compelling. It reasons through like we humans do and tries to solve the issues in various ways multiple times. The process is written from DeepSeek’s perspective resulting in a much better and fascinating user experience. For example, here’s part of the text fromDeepSeek’s thought process:

Interesting, right?

Verdict:Both the AI models got the answer right. While ChatGPT’s o1 is faster, DeepSeek’s R1 is more thorough and provides a simpler explanation that humans can understand and digest more readily.

2. Math Problem

Next, I have a hard math-related question that can take at least 30-50 steps to find the answer.

Both models predicted the answers correctly. However,DeepSeek provided an exact answer, mentioning 3.18 years, whereasChatGPT rounded it offto 3.2 years. But o1 was much faster, thinking for just 5 seconds, whereas DeepSeek took 53 seconds to arrive at the answer.

Verdict:Both the models again provided the correct answer, however, o1 Model is much faster. On the other DeepSeek shares the entire calculation and the exact answer which can make all the difference when it comes to math, science, and deep space.

3. Solving a Sudoku Puzzle

Who doesn’t love a sudoku puzzle? For the third question, I uploaded a Sudoku puzzle as an image from the r/sudoku subreddit to both the AI models asking them to solve it.

Solving a Sudoku puzzle seems too much for any AI reasoning model. However, if the models have code execution capabilities, they can generate or use an existing code in their database and execute it to solve the puzzle. For example,Gemini 1.5 Pro can solve Sudoku puzzles. However, both ChatGPT o1 and DeepSeek R1 models tried to solve the Sudoku with just reasoning, and here are the results.

DeepSeek reasoned and took 68 seconds before saying thegrid was not perfect, even though it was. I uploaded two other Sudoku puzzles, and the results were the same. This is likely because DeepSeek’s vision capabilities are subpar. While it can reason through problems, it struggles to interpret uploaded images.

OpenAI, on the other hand, thought for more than 5 minutes and provided awrong answer. I uploaded two other Sudoku puzzles just like on DeepSeek. However, once, it did manage to give the correct answer in 5 seconds, indicating that thesolution was already in its training data.

At least o1 model was able to read the images and uploaded files better than DeepSeek R1, however,both models couldn’t solve any sudoku puzzlecorrectly.

Finally, I entered the sudoku puzzle in the text format, with no images. OpenAI again found the solution available in its training data, whereas DeepSeek went through the reasoning process taking 280 seconds and again came up with the wrong answer. So we can conclude it’s not just image capabilities, Sudoku puzzles are unsolvable for the current batch of AI reasoning models.

Verdict:Both models failed to arrive at an answer through reason.

4. Creating a Flowchart

I asked both AI reasoning models to create a flowchart of how the OpenAI’s Operator works. This can be an issue for the o1 model as itcannot access the internetandOperatoris a recent development not available in its training data. However,DeepSeek’s reasoning model can access the internetso let’s see what it can do.

As expected, o1 created ageneric flowchartof how OpenAI’s LLM models work, not the Operator model. The flowchart was also confusing and barebones. DeepSeek searched online for information about the Operator and generated a flowchart as requested.

Verdict:DeepSeek R1 wins by a landslide.

5. Programming Task

To round off our DeepSeek R1 vs OpenAI o1 comparison, I went for a programming-related query this time.

It’s a simple challenge that can be easily completed with existing modules. OpenAI o1 model used thetransformers pipelinemodule and sharedhow to install that moduleon my PC before running the code. Whereas DeepSeek’s R1 directlyprovided the code with no stepsand used avaderSentiment modulewhich I had never used.

After installing both modules and running the code, we could tell DeepSeek’s implementationfollowed the instructionsbetter. For example, the app created by o1 did not provide a proper explanation for its sentiment classification, whileDeepSeek’s app gave clear reasons. Additionally, DeepSeek’s appworked in real-time, analyzing the input as you typed, whereaso1 required clicking the Analyze button.

However, neither model could understand the sarcasm! But for the most part, they got the job done.

Verdict:DeepSeek R1 for following the instructions accurately.

Final Verdict: ChatGPT o1 vs DeepSeek R1

As you can see, the only question DeepSeek failed to answer correctly was the Sudoku puzzle, which OpenAI also failed at. Except for that, DeepSeek’s R1 model consistently provided easier-to-understand explanations and accurate answers following instructions to the T. All while transparently showcasing its reasoning process. On top of that, it’s free to use and open-source making it accessible for all.

We have also tested both reasoning models in day-to-day usage, and DeepSeek is on par with OpenAI’s o1 model, often surpassing the latter’s paid plans.

DeepSeek’s claims hold true and users can confidently rely on it as a replacement for the o1 model. However, OpenAI also has an o1 Pro model which costs $200,and is preparing tolaunch the o3 modelsoon, so the narrative may shift soon enough. But for now, considering the price, open-source availability, and performance, we can conclude:DeepSeek R1 > OpenAI o1.

Ravi Teja KNTS

Tech writer with over 4 years of experience at TechWiser, where he has authored more than 700 articles on AI, Google apps, Chrome OS, Discord, and Android. His journey started with a passion for discussing technology and helping others in online forums, which naturally grew into a career in tech journalism. Ravi’s writing focuses on simplifying technology, making it accessible and jargon-free for readers. When he’s not breaking down the latest tech, he’s often immersed in a classic film – a true cinephile at heart.

We Compared DeepSeek R1 with OpenAI o1 Using 5 Prompts