RAISE Health Newsletter
 

Issue 13 | April 24, 2025

 
 
 

In this issue...

 

Read about the AI Index Report’s science and medicine chapter, a tool for evaluating LLMs’ clinical chops, health care AI in developing countries, and what AI benchmarking means.

  

 
 
 
 
 

Feature: 2025 AI Index features inaugural chapter on science, medicine


 
 
 
 
 
 
 
 
 
 
 
 



 
 
 
 
 

Keynote: Eric Horvitz to speak at RAISE Health Symposium 2025


 
 
 
 
 
   

Don’t miss it — register now to join the conversation via livestream.

 
 
 
 
 

Feature: New software evaluates how well LLMs perform clinical work


 
 
 
 
 
   

Read the report and an interview with Stanford Health Care’s chief data scientist Nigam Shah about the tool.

 
 
 
 
 

Q & A with Eleni Linos on AI for improving health care in developing nations


 
 
 
 
 

Eleni Linos, MD, DrPH, director of the Stanford Center for Digital Health, and her research team recently co-authored a report on the potential of generative AI to improve health and health care in developing nations. In collaboration with researchers at the Stanford Graduate School of Business and Oxford University, the team’s research revealed that generative AI could help provide personalized, reliable health care and information to patients in low- and middle-income countries where quality medical care is difficult to access, or when people are hesitant to discuss such things as HIV testing or reproductive health with their doctor.

  

The following Q&A is an excerpt from the original story:

  

What's different about using generative AI in low- and middle-income countries than in high-income countries?

Many AI models are trained in English or other common languages, and translations into the thousands of different languages spoken in Africa, for example, may not be accurate. Then there's the scale required to meet the health needs of billions of people living in lower income settings. Finally, many people in these communities don't have access to internet or digital tools.

  

What's an example of how generative AI is being used in these settings?

One of the most widely scaled examples we highlight in the report is Jacaranda Health's PROMPTS system in Kenya. PROMPTS is a two-way SMS-based maternal health service that provides timely, AI-generated responses to questions from pregnant and postpartum patients. Since integrating a custom-trained AI model in Swahili and English, the system has significantly improved response times — from hours or days to just minutes.

  

By combining AI with human oversight, PROMPTS has reached over 500,000 users in 2024 alone. The system flags high-risk cases for immediate human follow-up, ensuring that AI enhances, rather than replaces, human expertise. This is a game-changer in maternal health care, particularly in regions where pregnancy-related complications remain a leading cause of death.

  

What are problems that still need to be overcome?

In addition to the known challenges of AI in health care — data quality, ethical considerations, privacy, algorithmic bias and the guardrails needed to overcome these — our research identified some additional challenges specific to low- and middle-income settings.

  

We need to improve basic health infrastructure. No matter how optimistic we are about AI's potential, or how advanced the AI models are, how well they improve someone's health depends on the environment and resources that are available where they live. Imagine if an AI model diagnosed you perfectly and correctly recommended a particular surgery or antibiotic. If there's no surgeon in your community, or no antibiotics, it doesn't actually help.


Read the report and original article.

 
 
 
 
 
 

Feature: Often bad at math, LLMs improve if given a calculator


 
 
 
 
 
    

“Our study's premise was simple,” said Alex Goodell, MD, a clinical scholar at Stanford Medicine and co-author of the study. “We built a calculator that was formulated as a usable tool, then tested LLMs on different clinical calculation tasks (like predicting the risk of a blood clot) with and without the calculator tools. With access to the tools, the LLMs' performance vastly improved.”

 
 
 
 
 
 
 

Read the study.

 
 
 
 
 
 

AI de-jargonator


 
 
 
 

Explaining AI jargon, one concept at a time

 
 
 
 
 
 
 

Illustration by Emily Moskal

 
 
 
 
 

AI Benchmarking

 

AI benchmarking is the process of using standardized tests or tasks to evaluate the performance, accuracy, and efficiency of AI models. In health care, that means evaluating their ability to conduct specific clinical tasks to ensure they are accurate, reliable, and clinically useful. This type of benchmarking is also useful for comparing how well different AI models perform in real-world scenarios.

    

For example, if a human radiologist accurately identifies lung cancer in 85% of cases, the AI model should aim to meet or exceed this benchmark.

    

While benchmarking has been the gold standard used to evaluate these models, there is some debate and discussion about how well these evaluation benchmarks are performing against LLMs as they become more sophisticated. To learn more about what is happening in this space, read chapter 2 in the AI Index.

 
 
 
 
   
 
 
 

Share our newsletter with your community.

 
 
 
 
   
 
 
 

A joint initiative between Stanford Medicine and the Stanford Institute for Human-Centered Artificial Intelligence (HAI) to guide the responsible use of AI across biomedical research, education, and patient care.

To unsubscribe from future emails, CLICK HERE.