当前位置: 首页 > news >正文

OpenAI o1 Review 大模型PHD水平数理推理能力 OpenAI o1 vs GPT4o vs Gemini vs Claude

1. 介绍

OpenAI昨天发布了o1推理优化的大模型,利用了CoT (Chain of Thought) 思维链推理机制,提升了针对数学/物理/编程/逻辑等复杂问题的推理能力。OpenAI官方网站评测 OpenAI o1大模型对比GPT4o的数学、编程能力有显著提升。我们利用DeepNLP的AI Store提供的大模型对比评测能力,对比了 OpenAI o1 模型、GPT4o、Gemini、Claude在相同问题上的回答,评测结果可以访问网站查看,下面可以会具体介绍。

https://medium.com/@rockingdingo/2024-chatgpt-vs-gemini-vs-claude-for-math-ai4science-skill-reviews-566df2c9ecdd

https://medium.com/@rockingdingo/2024-chatgpt-vs-gemini-vs-claude-for-math-ai4science-skill-reviews-566df2c9ecdd
 

2.评测

数学能力

## Math Problem

1. Let n be an even positive integer. Let p be a monic, real polynomial of degree 2n; that is to say, p(x)=x^{2n} + a_{2n-1}x^{2n-1} + ... + a_{1}x+ a_{0} for some real coefficients a_{0}, a_{1}, ..., a_{2n-1}. Suppose that p(1/k) = k^{2} for all integers k such as 1<=|k|<=n. Find all other real numbers x for which p(1/x)=x^2.

2.  Let $X$ be a topological vector space. All sets mentioned below are understood to be the subsets of $X$. Prove the following statement: If $A$ and $B$ are compact, so is $A + B$

3.  What's the differentiation of function f(x) = e^x + log(x) + sin(x)?

4. what's the solution x of equation x^2+5x+6=0?

代码能力

### Coding Prompt

1. Implement LLM LLaMa Architecture in python code using pyTorch library, Then use distilling techniques to distill a large LLaMa model (large than 70B) to a small student model, with size limit to 2B. Please think step by step and provide details of the model code.

2. Write front end code of the login and logout pages for H5 mobile application usage. Split the code in separate files for css, html, and js.

3. Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.

website地址: 

OpenAI o1 Review

3.评测结果

3.1 OpenAI o1 Math Review 数学能力评测

地址:

OpenAI o1 Reviews for Math Reasoning Ability

3.2 OpenAI o1 Code Review 代码能力评测

地址: 

OpenAI o1 Reviews for Code Reasoning Ability from OpenAI o1, Genuine Reviews, Ratings and Questions

4. 能力对比 AI Tools Compare

4.1 OpenAI o1 VS GPT4o for Code

地址:

OpenAI o1 vs ChatGPT for code Comparison

4.2 OpenAI o1 vs Gemini for code

地址:

http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-gemini-google?tag=code

4.3 OpenAI o1 vs Claude for code
地址:

http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-claude-anthropic?tag=code

4.4 OpenAI o1 vs ChatGPT for math 

地址:

http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-chatgpt-openai?tag=math

4.5 OpenAI o1 vs Gemini for math

地址:

http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-gemini-google?tag=math

4.6 OpenAI o1 vs Claude for math

地址:

http://www.deepnlp.org/store/compare/pub-openai-o1-vs-pub-claude-anthropic?tag=math

5. 相关阅读

http://www.deepnlp.org/store/image-generator
http://www.deepnlp.org/store/chatbot-assistant
http://www.deepnlp.org/store/productivity-tool
http://www.deepnlp.org/store/video-generator
http://www.deepnlp.org/store/science
http://www.deepnlp.org/store/productivity-tool
http://www.deepnlp.org/store/pub
http://www.deepnlp.org/store/embodied-ai
http://www.deepnlp.org/store/quadruped-robot

http://www.deepnlp.org/store/humanoid-robot
http://www.deepnlp.org/store/pub
 


http://www.mrgr.cn/news/26140.html

相关文章:

  • OpenAI O1:人工智能推理能力的新里程碑
  • make 程序规定的 makefile 文件的书写语法(4)
  • C++ Primer Plus(速记版)-容器和算法
  • const 声明变量 报错Missing initializer in const declaration
  • MyBatis 注解式开发:简洁高效的数据库访问新方式
  • 基于鸿蒙API10的RTSP播放器(七:亮度调节功能测试)
  • codeup:将已有文件夹推送到已有仓库
  • Linux中的简单命令2
  • golang中string底层数据结构与上层数据结构的关系
  • JavaSE篇之抽象类接口
  • 【程序分享1】第一性原理计算 + 数据处理程序
  • oracle select字段有子查询的缺点与优化
  • 商业银行零售业务数智运营探索与应用
  • 5.8g微波雷达传感器:引领智能化时代,赋能多行业领域精准感知与节能应用
  • R数据对象快速保存与读取:qs包
  • Rust:深入浅出说一说 Error 类型
  • 速通GPT:Improving Language Understanding by Generative Pre-Training全文解读
  • 爱普生相机SD卡格式化后数据恢复指南
  • 5款免费版文章生成器,自动生成文章更省创作精力
  • 【文献分享】J. Phys. Chem. C:机器学习模型的结构嵌入方法加速堆叠二维材料的研究