The value of good data
Good data translates into a good customer experience. The more you know about your customers, the better you give them the experience they expect. It has become an invaluable business asset, guiding human decision-making and fueling artificial intelligence. However, data has increased in complexity and is no longer limited to simple spreadsheets. Instead, he lives in different forms. That’s why data annotation – the human process of labeling text, images, sound, and video clips so that computers understand what they’re seeing – has never been more important.
By 2025, around 463 exabytes of data will be created daily across the world, according to Visual capitalist. To put that in perspective, each exabyte translates to 1 billion gigabytes. There is no doubt that some of these elements will be completely unusable. But the most successful brands will be those that capture the hidden value of data and translate it into impactful decision-making and smarter customer experiences.
HOW WE ARE HERE: A BRIEF HISTORY OF THE DATA
To understand the value of data annotation, we need to look at how data has become so valuable to begin with. Data has been collected and analyzed in one form or another for millennia, but in the 1800s, encoded punch cards helped enable the processing of data by machines. Storage was first approached in the 1900s with magnetic tapes (which led to floppy disks and hard drives). The proliferation of the Internet in the 1990s paved the way for more accessible data and greater diversity in the type of data collected.
Since then, we have seen the data evolve rapidly. We’ve gone from collecting data to simple feedback surveys with yes or no answers and sales figures that track product popularity to advanced web analytics and unstructured user-generated content like videos. , images, audio and an abundance of social media. posts. And along with the diversification of data, artificial intelligence and machine learning have played a major role in understanding this data to diagnose disease and drive vehicles autonomously.
The technology also enables a data-driven approach to customer experience. Starbucks, for example, uses AI to analyze data such as location demographics, population density, income levels, and traffic patterns to decide where to establish new stores.
Data science helps game companies increase the player experience and personalize their marketing strategies. It helps retailers and telecommunications companies visualize consumer behavior and gain a deeper understanding of the customer journey. For fintech and financial services, it’s an essential defense against fraud.
However, the types of challenges we face with data have also evolved. how to collect and analyze data at how to do it transparently– ethically and without bias while ensuring that we store this data in a secure and confidential manner? There is also the glaring challenge of making sure the data is useful.
WHAT IS THE VALUE OF YOUR DATA?
Artificial intelligence and machine learning quietly underpin many of our daily activities. Machine learning helps provide Google search results and guide iPhone facial recognition. AI-based chatbots respond quickly to customer questions and operate smart home systems. But all of this is made possible by human-labeled data sets. Computers only learn from what we expose them to, and bad data can trickle down.
Without well-delivered and precisely labeled data, algorithms can underperform, a factor that weighs more and more heavily in today’s hyper-competitive environment. MIT researchers recently find ‘Systemic’ labeling errors in popular AI benchmark datasets – datasets that are used to train new AI systems and tell them what to look for in future datasets, fueling the process prediction.
For example, image tagging errors include things like one breed of dog mistaken for another, or a Roman statue classified as nudity. The sentiment annotation for Amazon product reviews revealed that some positive reviews were described as negative. Video annotation for YouTube videos found “Ariana Grande rated high as a whistle.”
Some of the implications of bad data may be irrelevant. But other data labels can have significant consequences due to gender or racial biases. A recent item in the MIT Technology Review on how our data encodes systematic racism and lacks diversity says, “CelebA’s face dataset has ‘big noses’ and ‘big lips’ labels that are disproportionately attributed to the faces of darker skinned women ”, while the data sets for detecting skin cancer have been missing samples of darker skin types. As our data-powered world moves towards an AI-driven future, proper representation and diversity in data sets will not only be the inclusive thing to do, but vital for performance and reach.
GET THE RIGHT DATA
Annotating quality data is fundamental if brands are to unlock the full potential of AI and machine learning. Just as data should be researched carefully, it should also go through a rigorous annotation and labeling process to avoid damaging bias. In a sense, prejudice can escalate and become part of the machine’s decision-making, not only disrupting the customer experience, but perpetuating racism as well.
Take, for example, a study by researchers at George Washington University on Chicago carpooling trips and census data. The researchers found “a significant disparate impact in the pricing of neighborhood rates due to the artificial intelligence bias derived from the usage patterns of telecare services associated with demographic attributes.” Data that favors certain biases diminishes the value of a service or solution.
Achieving absolute zero bias in the data may not be possible. However, collecting good data today enables the technology of tomorrow to make more precise decisions. This is where the value of a diverse team – or in the case of TELUS International, a multitude of data annotators – can help get the most out of machine learning programs.
Siobhan Hanna is the Managing Director of Global AI Data Solutions at TELUS International. The AI Data Solutions team draws on a global crowd of over one million community members to help organizations train and test machine learning models. To learn more about TELUS International’s AI data solutions offerings, please Click here.
To view the rest of our Data & AI series, please click here.