4 Emerging Strategies to Advance Big Data Analytics in Healthcare

Big Data

4 Emerging Strategies to Advance Big Data Analytics in Healthcare

While the potential for big data analytics in healthcare has been well-documented in countless studies, the possible risks that could come from using these tools have received just as much attention.

Big data analytics technologies have demonstrated their promise in enhancing multiple areas of care, from medical imaging and chronic disease management to population health and precision medicine. These algorithms could increase the efficiency of care delivery, reduce administrative burdens, and accelerate disease diagnosis.

Despite all the good these tools could potentially achieve, the harm these algorithms could cause is nearly as great.

Concerns about data access and collection, implicit and explicit bias, and issues with patient and provider trust in analytics technologies have hindered the use of these tools in everyday healthcare delivery.

Healthcare researchers and provider organizations are working to find solutions to these issues, facilitating the use of big data analytics in clinical care for better quality and outcomes.

In healthcare, it’s widely understood that the success of big data analytics tools depends on the value of the information used to train them. Algorithms trained on inaccurate, poor quality data will yield erroneous results, leading to inadequate care delivery.

However, obtaining quality training data is a difficult, time-intensive effort, leaving many organizations without the resources to build effective models.

Researchers across the industry are working to overcome this challenge. In 2019, a team from MIT’s Computer Science and Artificial Intelligence Library (CSAIL) developed an automated system that can gather more data from images used to train machine learning models, synthesizing a massive dataset of distinct training examples.

The dataset can be used to improve the training of machine learning models, enabling them to detect anatomical structures in new scans.

“We’re hoping this will make image segmentation more accessible in realistic situations where you don’t have a lot of training data,” said Amy Zhao, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and CSAIL.

“In our approach, you can learn to mimic the variations in unlabeled scans to intelligently synthesize a large dataset to train your network.”

The current healthcare crisis has also prompted healthcare leaders to develop quality, clean datasets for algorithm development. In March, the White House Office of Science and Technology Policy issued a call to action for experts to build AI tools that can be applied to a new COVID-19 dataset.

The dataset is an extensive machine-readable coronavirus literature collection, including over 29,000 articles.

“It’s difficult for people to manually go through more than 20,000 articles and synthesize their findings. Recent advances in technology can be helpful here,” said Anthony Goldbloom, co-founder, and chief executive officer at Kaggle, a machine learning and data science community owned by Google Cloud.

“We’re putting machine-readable versions of these articles in front of our community of more than 4 million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19.”

As healthcare organizations become increasingly reliant on analytics algorithms to help them make care decisions, it’s critical that these tools are free of implicit or explicit bias that could further drive health inequities.

With the existing disparities that pervade the healthcare industry, developing flawless, bias-free algorithms is often challenging. In a 2019 study, researchers from the University of California Berkeley discovered racial bias in a predictive analytics platform referring high-risk patients to care management programs.

“Algorithms can do terrible things, or algorithms can do wonderful things. Which one of those things they do is basically up to us,” said Ziad Obermeyer, acting associate professor of health policy and management at UC Berkeley and lead author of the study. “We make so many choices when we train an algorithm that feel technical and small. But these choices make the difference between an algorithm that’s good or bad, biased or unbiased.”

To remove bias from big data analytics tools, developers can work with experts and end-users to understand what clinical measures are important to providers, Philip Thomas, PhD, MS, assistant professor at the college of information and computer science at the University of Massachusetts Amherst, told HealthITAnalytics.

“We’re not promoting how to balance accuracy versus discrimination. We’re not saying what the right definitions of fair or safe are. Our goal is to let the person that’s an expert in that field decide,” he said.

While communicating with providers and end-users during algorithm development is extremely important, often this step is only half the battle. Collecting the high-quality data needed to develop unbiased analytics tools is a time-consuming, difficult task.

To accelerate this process, researchers at Columbia University have developed a machine learning algorithm that identifies and predicts differences in adverse drug effects between men and women by analyzing 50 years’ worth of reports in an FDA database.

“Essentially the idea is to correct for gender biases before you do any other statistical analysis by building a balanced subset of patients with equal parts men and women for each drug,” said Payal Chandak, a senior biomedical informatics major at Columbia University and the other co-author on the paper.

Continue Reading

4 Emerging Strategies to Advance Big Data Analytics in Healthcare