Claude 3.5 Sonnet sets new coding benchmarks | Writer | Admin | |||
---|---|---|---|---|---|
Anthropic’s latest AI model, Claude 3.5 Sonnet, arrived seemingly out of nowhere last week, free to use for everyone, and is now setting new benchmarks in a number of metrics. Most notably, it aced the leader boards in key categories of the LMSYS Chatbot Arena, a benchmark used to measure large language model performance. Specifically, it secured the top position in the ‘Coding Arena’ and ‘Hard Prompts’ categories, beating industry behemoths like GPT-4o and Gemini 1.5 Pro, all while being a lighter, faster model to run. Jayachandran Ramachandran, senior VP of Artificial Intelligence Labs for C5i, says the LMSYS Chatbot Arena is a crowdsourced open benchmarking platform and that users perform various tasks in pair wise comparison of models and provide feedback on which models can satisfactorily provide the best response. “The ranking system is quite transparent, robust, and reliable. In recent times, the top position was occupied by frontier models such as GPT-4 series and Gemini-1.5 series. It is a significant development for Claude 3.5 Sonnet to outsmart other leading frontier models and occupy the top position, especially in coding tasks,” he says. Saurav Swaroop, co-founder & CTO of financing platform Velocity, which recently launched Vani AI, a GenAIled customer calling solution for financial institutions, says that while GPT-4 has been assisting developers with writing code for some time now, it has been prone to errors and hallucinations. “However, Claude can write production level deployable code now. This will boost developer productivity and the time to build software will keep coming down,” he says. With its advanced reasoning skills, it’s great at handling multi-step workflows too, which is essential for complex projects, says Sourabh Kumar, senior manager of experience engineering for Publicis Sapient. “Compared to other industry leaders like GPT-4o, Claude 3.5 Sonnet offers faster processing speeds and higher accuracy in complex coding and reasoning tasks. For industries such as finance, where accurate data analysis is critical, or retail, where trend prediction can drive success, integrating Claude 3.5 Sonnet can lead to significant efficiency gains and innovation,” he says. However, Bebi says coders will still need to check the code created by the model. “While it can create code quickly, it is very important to understand the code and make sure it meets a project’s requirements and standards. Even though benchmarks rate the model at around 90%, the remaining 10% gap underscores the necessity for coders to review the outputs,” she says.
|
No | Subject | Writer | Date |
---|---|---|---|
271 | A basic guide to writing code using Claude 3.5 | Admin | 24-07-08 |
270 | India's manufacturing sector growth rises in June; hiring | Admin | 24-07-05 |
269 | Infrashakti Awards: Adani Ports CEO On How To Reach $10 Trillion Economy | Admin | 24-07-05 |
268 | India's services growth picks up in June on record rise in exports | Admin | 24-07-04 |
267 | CoinDCX Acquires UAE-Based BitOasis Crypto Exchange to Expand in MENA Region | Admin | 24-07-04 |
266 | TAFE Motors, DEUTZ AG ink pact; to expand internal combustion engine business | Admin | 24-07-03 |
265 | India To Create A Cross-Border Retail Payment Platform With 4 ASEAN Countries | Admin | 24-07-03 |
264 | Claude 3.5 Sonnet sets new coding benchmarks | Admin | 24-07-01 |
263 | On-device AI is cheaper and faster than cloud | Admin | 24-07-01 |
262 | Sebi revises norms for exchanges | Admin | 24-06-26 |