. 24/7 Space News .
TECH SPACE
'Tensor algebra' software speeds big-data analysis 100-fold
by Staff Writers
Boston MA (SPX) Nov 01, 2017


A new MIT computer system speeds computations involving "sparse tensors," multidimensional data arrays that consist mostly of zeroes.

We live in the age of big data, but most of that data is "sparse." Imagine, for instance, a massive table that mapped all of Amazon's customers against all of its products, with a "1" for each product a given customer bought and a "0" otherwise. The table would be mostly zeroes.

With sparse data, analytic algorithms end up doing a lot of addition and multiplication by zero, which is wasted computation. Programmers get around this by writing custom code to avoid zero entries, but that code is complex, and it generally applies only to a narrow range of problems.

At the Association for Computing Machinery's Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), researchers from MIT, the French Alternative Energies and Atomic Energy Commission, and Adobe Research recently presented a new system that automatically produces code optimized for sparse data.

That code offers a 100-fold speedup over existing, non-optimized software packages. And its performance is comparable to that of meticulously hand-optimized code for specific sparse-data operations, while requiring far less work on the programmer's part.

The system is called Taco, for tensor algebra compiler. In computer-science parlance, a data structure like the Amazon table is called a "matrix," and a tensor is just a higher-dimensional analogue of a matrix. If that Amazon table also mapped customers and products against the customers' product ratings on the Amazon site and the words used in their product reviews, the result would be a four-dimensional tensor.

"Sparse representations have been there for more than 60 years," says Saman Amarasinghe, an MIT professor of electrical engineering and computer science (EECS) and senior author on the new paper.

"But nobody knew how to generate code for them automatically. People figured out a few very specific operations - sparse matrix-vector multiply, sparse matrix-vector multiply plus a vector, sparse matrix-matrix multiply, sparse matrix-matrix-matrix multiply. The biggest contribution we make is the ability to generate code for any tensor-algebra expression when the matrices are sparse."

Joining Amarasinghe on the paper are first author Fredrik Kjolstad, an MIT graduate student in EECS; Stephen Chou, also a graduate student in EECS; David Lugato of the French Alternative Energies and Atomic Energy Commission; and Shoaib Kamil of Adobe Research.

Custom kernels
In recent years, the mathematical manipulation of tensors - tensor algebra - has become crucial to not only big-data analysis but machine learning, too. And it's been a staple of scientific research since Einstein's time.

Traditionally, to handle tensor algebra, mathematics software has decomposed tensor operations into their constituent parts. So, for instance, if a computation required two tensors to be multiplied and then added to a third, the software would run its standard tensor multiplication routine on the first two tensors, store the result, and then run its standard tensor addition routine.

In the age of big data, however, this approach is too time-consuming. For efficient operation on massive data sets, Kjolstad explains, every sequence of tensor operations requires its own "kernel," or computational template.

"If you do it in one kernel, you can do it all at once, and you can make it go faster, instead of having to put the output in memory and then read it back in so that you can add it to something else," Kjolstad says. "You can just do it in the same loop."

Computer science researchers have developed kernels for some of the tensor operations most common in machine learning and big-data analytics, such as those enumerated by Amarasinghe. But the number of possible kernels is infinite: The kernel for adding together three tensors, for instance, is different from the kernel for adding together four, and the kernel for adding three three-dimensional tensors is different from the kernel for adding three four-dimensional tensors.

Many tensor operations involve multiplying an entry from one tensor with one from another. If either entry is zero, so is their product, and programs for manipulating large, sparse matrices can waste a huge amount of time adding and multiplying zeroes.

Hand-optimized code for sparse tensors identifies zero entries and streamlines operations involving them - either carrying forward the nonzero entries in additions or omitting multiplications entirely. This makes tensor manipulations much faster, but it requires the programmer to do a lot more work.

The code for multiplying two matrices - a simple type of tensor, with only two dimensions, like a table - might, for instance, take 12 lines if the matrix is full (meaning that none of the entries can be omitted). But if the matrix is sparse, the same operation can require 100 lines of code or more, to track omissions and elisions.

Enter Taco
Taco adds all that extra code automatically. The programmer simply specifies the size of a tensor, whether it's full or sparse, and the location of the file from which it should import its values. For any given operation on two tensors, Taco builds a hierarchical map that indicates, first, which paired entries from both tensors are nonzero and, then, which entries from each tensor are paired with zeroes. All pairs of zeroes it simply discards.

Taco also uses an efficient indexing scheme to store only the nonzero values of sparse tensors. With zero entries included, a publicly released tensor from Amazon, which maps customer ID numbers against purchases and descriptive terms culled from reviews, takes up 107 exabytes of data, or roughly 10 times the estimated storage capacity of all of Google's servers. But using the Taco compression scheme, it takes up only 13 gigabytes - small enough to fit on a smartphone.

Research Report: The tensor algebra compiler

TECH SPACE
Selective memory makes data caches 50 percent more efficient
Boston MA (SPX) Oct 26, 2017
In a traditional computer, a microprocessor is mounted on a "package," a small circuit board with a grid of electrical leads on its bottom. The package snaps into the computer's motherboard, and data travels between the processor and the computer's main memory bank through the leads. As processors' transistor counts have gone up, the relatively slow connection between the processor and mai ... read more

Related Links
Massachusetts Institute of Technology
Space Technology News - Applications and Research


Thanks for being there;
We need your help. The SpaceDaily news network continues to grow but revenues have never been harder to maintain.

With the rise of Ad Blockers, and Facebook - our traditional revenue sources via quality network advertising continues to decline. And unlike so many other news sites, we don't have a paywall - with those annoying usernames and passwords.

Our news coverage takes time and effort to publish 365 days a year.

If you find our news sites informative and useful then please consider becoming a regular supporter or for now make a one off contribution.
SpaceDaily Monthly Supporter
$5+ Billed Monthly


paypal only
SpaceDaily Contributor
$5 Billed Once


credit card or paypal


Comment using your Disqus, Facebook, Google or Twitter login.

Share this article via these popular social media networks
del.icio.usdel.icio.us DiggDigg RedditReddit GoogleGoogle

TECH SPACE
Plants and psychological well-being in space

Spacewalkers fix robotic arm in time to grab next cargo ship

NASA develops and tests new housing for in-orbit science payloads

Russia's space agency says glitch in manned Soyuz landing

TECH SPACE
Thruster for Mars mission breaks records

Draper and Sierra Nevada Corporation announce new agreement for space missions

Aerojet Rocketdyne breaks ground on advanced manufacturing center in Huntsville

New solid rocket motor development facility completed at Spaceport America

TECH SPACE
Mars Rover Mission Progresses Toward Resumed Drilling

Solar eruptions could electrify Martian moons

MAVEN finds Mars has a twisted tail

Mine craft for Mars

TECH SPACE
Space will see Communist loyalty: Chinese astronaut

China launches three satellites

Mars probe to carry 13 types of payload on 2020 mission

UN official commends China's role in space cooperation

TECH SPACE
Myanmar to launch own satellite system-2 in 2019: vice president

Eutelsat's Airbus-built full electric EUTELSAT 172B satellite reaches geostationary orbit

Turkey, Russia to Enhance Cooperation in the Field of Space Technologies

SpaceX launches 10 satellites for Iridium mobile network

TECH SPACE
Turning a material upside down can sometimes make it softer

Nanoscale textures make glass invisible

New property found in unusual crystalline materials

MIT students fortify concrete by adding recycled plastic

TECH SPACE
Comet mission reveals 'missing link' in our understanding of planet formation

New NASA study improves search for habitable worlds

From Comets Come Planets

A star that devoured its own planets

TECH SPACE
Haumea, the most peculiar of Pluto companions, has a ring around it

Ring around a dwarf planet detected

Helicopter test for Jupiter icy moons radar

Solving the Mystery of Pluto's Giant Blades of Ice









The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.