24/7 Space News
TECH SPACE
When should data scientists try a new technique
When making predictions based on data, not all modeling techniques work equally well for all datasets. A new measure "provides some statistical 'oomph'" to help data scientists choose the best method for their task, says Tamara Broderick, an associate professor in EECS and a member of LIDS and IDSS, and whose team developed the tool.
ADVERTISEMENT
     
When should data scientists try a new technique
by Adam Zewe for MIT News
Boston MA (SPX) Jan 27, 2023

If a scientist wanted to forecast ocean currents to understand how pollution travels after an oil spill, she could use a common approach that looks at currents traveling between 10 and 200 kilometers. Or, she could choose a newer model that also includes shorter currents. This might be more accurate, but it could also require learning new software or running new computational experiments. How to know if it will be worth the time, cost, and effort to use the new method?

A new approach developed by MIT researchers could help data scientists answer this question, whether they are looking at statistics on ocean currents, violent crime, children's reading ability, or any number of other types of datasets.

The team created a new measure, known as the "c-value," that helps users choose between techniques based on the chance that a new method is more accurate for a specific dataset. This measure answers the question "is it likely that the new method is more accurate for this data than the common approach?"

Traditionally, statisticians compare methods by averaging a method's accuracy across all possible datasets. But just because a new method is better for all datasets on average doesn't mean it will actually provide a better estimate using one particular dataset. Averages are not application-specific.

So, researchers from MIT and elsewhere created the c-value, which is a dataset-specific tool. A high c-value means it is unlikely a new method will be less accurate than the original method on a specific data problem.

In their proof-of-concept paper, the researchers describe and evaluate the c-value using real-world data analysis problems: modeling ocean currents, estimating violent crime in neighborhoods, and approximating student reading ability at schools. They show how the c-value could help statisticians and data analysts achieve more accurate results by indicating when to use alternative estimation methods they otherwise might have ignored.

"What we are trying to do with this particular work is come up with something that is data specific. The classical notion of risk is really natural for someone developing a new method. That person wants their method to work well for all of their users on average. But a user of a method wants something that will work on their individual problem. We've shown that the c-value is a very practical proof-of-concept in that direction," says senior author Tamara Broderick, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society.

She's joined on the paper by Brian Trippe PhD '22, a former graduate student in Broderick's group who is now a postdoc at Columbia University; and Sameer Deshpande '13, a former postdoc in Broderick's group who is now an assistant professor at the University of Wisconsin at Madison. An accepted version of the paper is posted online in the Journal of the American Statistical Association.

Evaluating estimators
The c-value is designed to help with data problems in which researchers seek to estimate an unknown parameter using a dataset, such as estimating average student reading ability from a dataset of assessment results and student survey responses. A researcher has two estimation methods and must decide which to use for this particular problem.

The better estimation method is the one that results in less "loss," which means the estimate will be closer to the ground truth. Consider again the forecasting of ocean currents: Perhaps being off by a few meters per hour isn't so bad, but being off by many kilometers per hour makes the estimate useless. The ground truth is unknown, though; the scientist is trying to estimate it. Therefore, one can never actually compute the loss of an estimate for their specific data. That's what makes comparing estimates challenging. The c-value helps a scientist navigate this challenge.

The c-value equation uses a specific dataset to compute the estimate with each method, and then once more to compute the c-value between the methods. If the c-value is large, it is unlikely that the alternative method is going to be worse and yield less accurate estimates than the original method.

"In our case, we are assuming that you conservatively want to stay with the default estimator, and you only want to go to the new estimator if you feel very confident about it. With a high c-value, it's likely that the new estimate is more accurate. If you get a low c-value, you can't say anything conclusive. You might have actually done better, but you just don't know," Broderick explains.

Probing the theory
The researchers put that theory to the test by evaluating three real-world data analysis problems.

For one, they used the c-value to help determine which approach is best for modeling ocean currents, a problem Trippe has been tackling. Accurate models are important for predicting the dispersion of contaminants, like pollution from an oil spill. The team found that estimating ocean currents using multiple scales, one larger and one smaller, likely yields higher accuracy than using only larger scale measurements.

"Oceans researchers are studying this, and the c-value can provide some statistical 'oomph' to support modeling the smaller scale," Broderick says.

In another example, the researchers sought to predict violent crime in census tracts in Philadelphia, an application Deshpande has been studying. Using the c-value, they found that one could get better estimates about violent crime rates by incorporating information about census-tract-level nonviolent crime into the analysis. They also used the c-value to show that additionally leveraging violent crime data from neighboring census tracts in the analysis isn't likely to provide further accuracy improvements.

"That doesn't mean there isn't an improvement, that just means that we don't feel confident saying that you will get it," she says.

Now that they have proven the c-value in theory and shown how it could be used to tackle real-world data problems, the researchers want to expand the measure to more types of data and a wider set of model classes.

The ultimate goal is to create a measure that is general enough for many more data analysis problems, and while there is still a lot of work to do to realize that objective, Broderick says this is an important and exciting first step in the right direction.

This research was supported, in part, by an Advanced Research Projects Agency-Energy grant, a National Science Foundation CAREER Award, the Office of Naval Research, and the Wisconsin Alumni Research Foundation.

Research Report:"Confidently comparing estimates with the c-value"

Related Links
MIT Laboratory for Information and Decision Systems
Space Technology News - Applications and Research

Subscribe Free To Our Daily Newsletters

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
TECH SPACE
Unibap receives order from Thales Alenia Space
Stockholm, Sweden (SPX) Jan 18, 2023
Unibap AB (publ) has received an order from Thales Alenia Space to provide SpaceCloud solution for technology development targeting use on future satellite missions. The order value is KEUR 182. Thales Alenia Space in Italy has placed a contract with Unibap to use SpaceCloud infrastructure to develop a next generation on-orbit processing solution for future satellite missions. Thales Alenia Space, a joint venture between Thales 67% and Leonardo 33%, will utilize Unibap's solutions in the mission d ... read more

ADVERTISEMENT
ADVERTISEMENT
TECH SPACE
UAE astronaut says not required to fast during Ramadan on ISS

Astronauts conduct first ISS spacewalk of 2023

Zero-Covid left in dust as Chinese revellers fuel travel boom

RIT scientists help rediscover earliest known star map using multispectral imaging

TECH SPACE
NASA validates revolutionary propulsion design for deep space missions

MIT Gas Turbine Laboratory prepares to jet into the future

Isar Aerospace and Spaceflight Inc sign launch agreement to service global market

NASA, DARPA will test nuclear engine for future Mars missions

TECH SPACE
Perseverance marks 1 Martian Year at Jezero

Sol 3721: Wrapping up at the Encanto Drill Site

NASA launches Mars Sample Receiving Project Office at Johnson

Sols 3718-3720: Go For Drilling at Encanto

TECH SPACE
Chinese astronauts send Spring Festival greetings from space station

China to launch 200-plus spacecraft in 2023

China's space industry hits new heights

China's first private sector 2023 rocket launch up, up and away

TECH SPACE
SpaceX launches 56 more Starlink satellites in heaviest payload yet

Hawkeye 360 launches Cluster 6 satellites aboard inaugural Rocket Lab Electron flight from Virginia

UK Space Agency announces new funding for satellite communications

Britain's Tim Peake steps down from ESA astronaut corps

TECH SPACE
AI voice tool 'misused' as deepfakes flood web forum

Ghostly mirrors for high-power lasers

Judge denies US bid to block Meta virtual reality deal: reports

To decarbonize the chemical industry, electrify it

TECH SPACE
Webb Telescope identifies origins of icy building blocks of life

Rare opportunity to study short-lived volcanic island reveals sulfur-metabolizing microbes

New small laser device can help detect signs of life on other planets

How do rocky planets really form

TECH SPACE
Webb spies Chariklo ring system with high-precision technique

Europe's JUICE spacecraft ready to explore Jupiter's icy moons

Exotic water ice contributes to understanding of magnetic anomalies on Neptune and Uranus

From Europe to Jupiter via Kourou

Subscribe Free To Our Daily Newsletters


ADVERTISEMENT



The content herein, unless otherwise known to be public domain, are Copyright 1995-2023 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.