Go beyond the jargon to get the most out of AI
The language of AI is constantly evolving, so it can feel difficult to keep up. This glossary is designed to guide you through key AI terms you need to know when working in data and analytics.
AI Assistants (AI Copilot)
Algorithm
Augmented Analytics
Augmented Data Integration
Augmented Data Quality
Auto-Classification
AutoML
Automated Insights
Balancing
Bias
Citizen Data Scientist
Classification
Cognitive BI
Conversational AI
Data Cleaning
Data Foundation
Data Governance
Data Labeling
Data Preparation
Data Provenance
Data Quality
Data Quality Intelligence
Data Science
Deep Learning
Experiment
Explainable AI
Forecasting
Foundational Model
Generative AI
Key Driver Analysis
Large Language Model (LLM)
Low-Code/No-Code
Machine Learning
Model Authoring
Model Deployment
Model Drift
Model Training
Natural Language Query (NLQ)
Neural Network
Predictive AI
Predictive Analytics
Prompt
Regression
Self-Supervised and Unsupervised Learning
Sentiment Analysis
Shapley Values
Structured and Unstructured Data
Supervised Learning
Synthetic Data
Time Series Analysis
Training Data
What-If Scenarios
AI Assistants (AI Copilot)
Hey Siri, what is an AI assistant? These AI-powered solutions can understand and respond to natural language commands. The human-like interface means that people can give voice prompts to get the assistant to perform tasks and provide tailored responses. AI assistants are increasingly common and have been built into many modern smart devices, including Amazon’s Alexa, Apple’s Siri, Microsoft’s Copilot, and the Google Assistant – making them the primary way that consumers are exposed to AI.
Algorithm
Every AI model needs a pathway or set of instructions to follow. These are algorithms, which tell a model how to operate and determine their ability to learn. It does this by taking in training data and using it as the basis on which to execute tasks. As it completes more tasks, the algorithm can learn from how close the outputs are to the desired output and refine the process accordingly. Because algorithms can self-refine in this way, programmers will often be required to manage this closely to ensure they are acting as intended, and to avoid bias creeping in. In machine learning and AI today, there are many different algorithms which serve different tasks. Data scientists and machine learning engineers must understand how and why to apply a certain algorithmic approach based on the problem, data, and goals at hand.
Augmented Analytics
Augmented analytics enables data users to interact with information at a level that enhances human cognition. According to Gartner, this involves the ‘use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation and insight explanation.’ These enablers take analytics capabilities to the next level, enhancing the depth at which people can explore data. Ultimately, this expands a human’s ability to interact with data at a contextual level and enables a broader group of people to use analytics tools.
Augmented Data Integration
Data integration tasks are often very time consuming for data professionals. While they are important, they are relatively low-value duties that don’t make best use of professionals’ advanced skills. Augmented data integration uses AI to speed up these processes, freeing data experts of this burden. According to Gartner, automating the data management mix can cut down the time spent on manual data processes by 45 percent. This enables organizations to scale their operations and allow data experts to focus on more complex or mission-critical tasks.
Augmented Data Quality
Data quality is a crucial component of an organization’s data governance initiatives, measuring datasets for accuracy, completeness, bias, and whether they are fit for purpose. Without validating the quality of the information being analyzed, automated systems will not work as intended. The introduction of automation into this process – or augmented data quality – speeds things up. It is particularly useful for validating large datasets, which can be time-consuming to do manually. This enables the best of both worlds – high quality data and better efficiency for essential, but potentially lengthy tasks.
Auto-Classification
Analytics processes aren’t always about revealing granular insights from complex datasets. For some organizational needs, it could be sufficient to categorize documents or other assets in broad categories. Like sentiment analysis, this can still be enhanced by AI in the form of auto-classification. Based on pre-defined criteria, auto-classification scans documents and assigns relevant tags and labels without the need for human assistance.
This can be curated depending on the bespoke organizational needs by adapting the keywords the model scans for. Auto-classification is hugely beneficial for content management by enabling users to rapidly sort content into categories, and is fundamental to unlocking the potential of AI-powered search.
AutoML
AutoML™ is a solution designed to help analytics teams use machine learning (ML) without expert knowledge. Gartner describes AutoML as the automation of data preparation, feature engineering, and model engineering tasks – three tasks traditionally requiring specific expertise to complete. It empowers developers with limited ML expertise to complete highly complex tasks and train high-quality models specific to their business needs, helping them move from historical to predictive analytics for more use cases.
Automated Insights
Automated insights make it possible for organizations to make their cake and eat it too. This means that by using AI, data users benefit from the provision of easily understandable and actionable intelligence from large volumes of data, without the need to do manual analysis dependent on technical expertise.
In particular, this empowers non-technical employees to make faster business decisions. Automated insights solutions can be underpinned by machine learning, natural language processing, or statistical analysis, to identify key findings and suggested actions in a digestible way.
Balancing
Sometimes datasets contain features which are not balanced, where one class significantly outnumbers the others. This is called an imbalanced class, which needs to be accounted for in the machine learning process. The balancing process is a very common one for data scientists; if a dataset should be left imbalanced, it could have a significant impact on the accuracy of analysis or lead to biased AI models. Several methods can be used to address this, such as resampling of generating synthetic data. Choosing which is most appropriate depends on the specific characteristics of the dataset.
Bias
Bias in AI is like a hidden trap that can skew your model’s predictions and decisions. It creeps in from various sources, including the training data, the algorithms, or even how the AI is deployed. If your training data has biases, such as underrepresenting certain groups, your AI will likely repeat those biases. That’s why it's essential for data scientists to spot and fix bias using techniques like balanced datasets and fairness checks, ensuring your AI operates fairly and transparently.
Citizen Data Scientist
Analytics, AI, and data are no longer the exclusive purview of technical specialists. That much is clear, as an increasingly wide variety of industries and job roles use data to inform their work. There is a new role emerging – the ‘citizen data scientist’ — which refers to business analysts who engage with data science in their role, without being dedicated specialists in coding or statistics.
For example, citizen data scientists tend to be more business-oriented, bridging the gap between analytics and strategic decision-making. They augment the work of data scientists, using their insights to help organizations use data to make everyday decisions and leaving specialists to focus on the most technical work.
Classification
The classification process uses machine learning to automate the basic sorting of data or documents into preset groups. Imagine you throw a load of unpaired socks into a drawer, but when you next open the drawer, they are automatically paired and categorized by sports, work, and casual. This depicts what classification algorithms are capable of. To be successful, the models need to be trained in how to map certain tags or keywords to the correct categories. This teaches them how to autonomously classify future datasets into respective and relevant categories. In practice, classification algorthims provide a powerful way for businesses to gain predictive insights and find hidden patterns.
Cognitive BI
To empower organizations to embrace data led decision-making, cognitive BI combines traditional business intelligence (BI) processes with cognitive computing technologies, like AI and natural language processing. The technology combination of technologies transforms how data is used across organizations, from marketing to finance, equipping them with accessible, actionable, and valuable data-driven insights.
Conversational AI
For AI to have mass consumer applications, it must be able to understand, process, and simulate human language. This is where conversational AI comes in, a type of model that makes it possible for humans to have a dialogue with AI. Conversational AI is regularly used by customer service teams in the form of chatbots, where they can answer questions and troubleshoot issues by looking for keywords and providing predefined responses. This technology also underpins conversational analytics, which empowers organizations to learn from customer interactions by understanding and deriving data from human conversations.
Data Cleaning
Data cleaning is like giving your data a good scrub, getting rid of errors and inconsistencies. This includes removing duplicates, fixing inaccuracies, handling missing values, and ensuring uniform formats. Clean data is crucial for accurate AI models, as if your data is dirty it can result in misleading insights. By keeping your data clean, you ensure it’s ready for analysis, and your AI models can deliver trustworthy results.
Data Foundation
Think of a data foundation as the rock-solid base that holds up all your AI efforts. It covers everything from data collection and storage to management and analysis. This includes having strong data governance, top-notch data pipelines, secure storage, and efficient processing tools. With a robust data foundation, your data is accurate, consistent, and ready for action, helping you unlock AI’s full potential for smarter insights and better decisions.
Data Governance
For data analytics to be effective, organizations need to set internal rules and standards that establish how data is gathered, stored, processed, and disposed of. Data governance is the broad term that covers this responsibility. As well as providing more assurances on data security, having the right governance structures in place demonstrates that the data being held by an organization is trustworthy and isn’t being used inappropriately. The increasing role of data in commercial strategy has led to evolving data privacy regulations for businesses to adhere to, making data governance increasingly critical.
Data Labeling
Data labeling is the process of making data understandable and usable by machine learning algorithms, by adding descriptive or informative labels or tags. By labeling data, algorithms can be trained to make predictions or classifications based on this information. In turn, the algorithm can then begin to make accurate predictions when presented with new, unlabeled data.
Data Preparation
Data preparation is the magic that turns raw data into gold. It involves cleaning, structuring, and enriching raw data to make it ready for AI model training. Proper data preparation ensures your data is accurate and consistent, setting the stage for high-performing AI. Investing in thorough data prep means better analytical insights and more effective AI-driven results.
Data Provenance
Data provenance is like a detailed diary for your data, tracking its journey from origin to final use. It records where the data comes from, how it’s transformed, and how it’s used, ensuring transparency and accountability. Knowing your data’s history is crucial for verifying its quality, staying compliant with regulations, and simplifying troubleshooting. By keeping detailed records of data provenance, you can trust your data and the AI models built on it.
Data Quality
Data quality is all about making sure your data is up to scratch – accurate, complete, consistent, reliable, and timely. High-quality data is the key to AI models that deliver valid and actionable insights. If your data is flawed, your AI's predictions will be too. That’s why organizations need to practice good data hygiene, with regular cleaning, validation, and monitoring to keep their data in top shape and ensure their AI is spot on.
Data Quality Intelligence
An AI model is only as good as the data that feeds it. If the model has been trained using poor quality data, it can lead to inaccurate and unreliable results. Data quality intelligence is the cornerstone of any good data management strategy, when organizations analyze the strength of their data and data management practices. It’s essential that action is taken using these insights, because without it, they cannot guarantee that data-driven decisions are always based on trustworthy information.
Data Science
Extracting valuable insights and predictions from data – particularly large volumes of data – requires a combination of multiple disciplines such as statistics, computer science, and mathematics. This combination is data science, the formula that turns raw data into information that can be used to identify trends and inform decisions.
Data science professionals, or data scientists, typically have advanced data skills, but could be deployed in a range of situations from data collection to modelling to deployment of an AI model in a business context. It’s a discipline that has quickly found a home in most industries, as organizations explore how they can use data more effectively to inform decision-making.
Deep Learning
All types of AI attempt to mimic the decision-making of the human brain, but some sub-disciplines take more inspiration than others. For example, deep learning is a type of machine learning based on neural networks – both in its structure and in its use of interconnected layers of artificial neurons to process and learn from data.
According to Forrester, the technique is best used to ‘build, train and test neural networks that probabilistically predict outcomes and/or classify unstructured data.’ In practice, this means deep learning underpins image and speech recognition, language translation, autonomous driving, and recommendation systems.
Experiment
An experiment is the process used to train, assess, and perfect machine learning models. They are highly structured, enabling data scientists and users of AutoML to organize and manage all their machine learning executions, referred to as ‘runs’. This exercise often takes multiple iterations to get the model to work as desired, often involving visualization and comparison between runs. Experiments play a pivotal role in discovering patterns, fine-tuning the model, and ensuring it is ready for real-world deployments.
Explainable AI
AI is great at generating insights and predicting outcomes. But it’s equally, if not more important, to understand how it came to its conclusions. Explainability is critical to determine how best to take actions to affect outcomes, foster understanding, and build trust. But not all machine learning models are explainable. So when deciding which algorithms to use, we must assess the need for explainability metrics against the context in which decisions are made. Without explainability, organizations may introduce unitended biases into the decision making process.
Forecasting
A fundamental benefit of using AI for data analysis is its ability to take historical information and make predictions within a specific time window. These time-series forecasts take a historical pattern of data and use single or multi-variate regressions to predict future results.
AI planning programs can forecast future events by assessing vast amounts of structured and unstructured data, making connections, and discovering patterns in ways that far exceed traditional forecasting systems. This not only enables organizations to make forward-thinking decisions, but consider a variety of scenarios in case things don’t go as planned.
Foundational Model
Before a generative AI model is developed with training data and a specific application in mind, they are called foundational models. They are the basis on which more complex algorithms are built, adapted depending on where the business wants to deploy them. Typically, foundational models are large-scale and have been trained on extensive datasets. You can see them in the case of OpenAI’s GPT-3 and GPT-4, which are used to underpin ChatGPT.
Generative AI
Although AI has infinite applications, it’s not traditionally considered as a substitute for creative roles. That's until generative AI came along. Generative models are specifically designed to synthesize new content like text, audio, images, video, or music.
However, it still requires human prompts and huge volumes of data to learn the patterns and structures necessary to generate new, complex content. The branch of AI has not been without its ethical concerns, often used to blur the lines between fabrication and reality, most notably through deepfakes.
Key Driver Analysis
Key driver analysis (KDA) identifies the key factors that are impacting a particular outcome and weighs their relative importance in predicting it. Its most common application is when conducting market research or customer relations analysis, helping organizations understand the drivers behind consumer behavior and target business outcomes, like customer loyalty. AI can enhance this analysis by handling immensely complex datasets and identifying patterns and relationships, working backwards until the most important factors are found.
Large Language Model (LLM)
Ever wonder how AI is capable of creating content like text, music, images, and video? It’s LLMs that enable this. These deep learning algorithms underpin generative AI products and solutions like ChatGPT by learning from existing information to produce something new.
As Gartner says, LLMs are trained on ‘vast amounts’ of data, which they need to work effectively by inferring patterns and relationships between words and phrases to inform creative output. This means they require mass, publicly available information from the internet, which enables products powered by LLMs to self-learn and improve continuously as they are used.
Low-Code/No-Code
The rise of digital products and services has made software development one of the most sought-after skills for employers. However, demand still exceeds supply. To bypass this deficit, organizations use low-code or no-code platforms that enable non-technical users to contribute to the software development process – even with limited coding knowledge. This often takes the shape of modular, drag-and-drop style or wizard-based interfaces that allow people to build without needing to code.
Depending on its needs, a business could deploy low-code tools (which require limited involvement from expert developers) or no-code tools (which require no involvement from expert developers). This helps democratize the ability to create AI systems and ensures that organizations use expert developers’ time effectively.
Machine Learning
Machine learning is a sub-discipline of AI that enables computer systems to automatically learn from data without being explicitly programmed. This is ultimately used to uncover relationships, hidden patterns, and predictions. It’s usually a case of ‘the more data, the better’ — machine learning algorithms learn from their inputs and improve their outcomes with more information. This empowers applications like image recognition, natural language processing, supervised and unsupervised learning, and much more.
Model Authoring
The design and creation of AI models can be a long and complex process, requiring expertise in data science and machine learning techniques. Model authoring is the sequence of tasks required to develop a model that is ready for real-world applications, starting with the gathering and preparation of training data for the model and finishing with its deployment and maintenance. This requires a combination technical skills, creativity, and problem-solving abilities.
Model Deployment
After an AI model is trained, it can be implemented in a real-world environment, wherein new or real-time data is run through the model to be ‘scored’ or make predictions. But letting it loose in a practical context is not enough – the model deployment process does not stop there. Once made available to end-users or other software systems, a model will be introduced to a wide range of new and unseen data, which will influence the patterns and connections it finds. AI models must be constantly evaluated and tested to ensure that they continue to deliver the desired results.
Model Drift
AI models are built on datasets, which are the information banks they use to make decisions and provide outputs. But if these datasets are not updated over time, it can lead to the degradation of a model because the assumptions on which they are based no longer hold true. This is ‘model drift’ – which leads to less accurate or relevant predictions, as well as an increase in false positives and negatives. If model drift is not detected and addressed quickly, it can compromise the integrity of models and the real-world applications and processes they inform.
Model Training
When developing AI systems, it’s important to expose each model to quality data and examples of correct associations or outcomes. This process of training models – essentially teaching them to recognize patterns, make predictions, or perform specific tasks — is necessary to inform their later outputs. As the context or organizational need around a model changes, and as the model continues to learn, further training may be required to offset exposure to less structured data. If the model is left to learn unchecked without periodic training, the risk of bias and poor-quality outputs increase.
Natural Language Query (NLQ)
As AI solutions get more advanced, they rely on increasing volumes of data. For them to be accessible to all, it’s important that users from non-technical backgrounds can make data queries using everyday language. This is where a natural language query (NLQ) comes in.
AI systems like virtual assistants use NLQ to analyze user input, search for relevant data, and provide an answer. Or as Gartner puts it, turning these inputs into ‘encoded, structured information.’ Depending on the sophistication of the NLQ solution, user queries can be made by text or verbally. This negates the need for any non-language-based inputs and ensures the AI systems are available to all.
Neural Network
Taking inspiration from the human brain, neural networks are the fundamental building blocks of AI and machine learning. The computational models are designed to process and learn from data, and not unlike synapses, consist of interconnected nodes (a data point in a network or graph) organized into layers. There are three layer: an input layer, hidden layers, and an output layer. These layers are the basis of how neural networks can learn and model complex relationships in data, enabling them to analyze those relationships at a non-linear level.
Predictive AI
A key enabler of data-led decision making, predictive AI identifies patterns in historical data to generate forecasts, predictions, or estimates about future events or outcomes. AI algorithms can analyze vast and complex datasets that are beyond human cognition to, as Gartner puts it, answer the question ’what is likely to happen?’ This enhanced forecasting, beyond the capabilities of traditional predictive analytics, enables organizations to map future scenarios based on processes that harness a massive volume of data.
Predictive Analytics
Analytics must be based on historical data. After all, they can’t analyze something that hasn’t happened yet. But that doesn’t mean they can’t be used to make predictions about future outcomes. This is the principle of predictive analytics, defined by IBM as combining data with ‘statistical modeling, data mining techniques and machine learning.’ By learning from historical data, predictive analytics can identify patterns to develop predictive models that anticipate future trends, events, and outcomes. Organizations implement this branch of analytics to inform decision-making and gain a competitive advantage.
Prompt
No matter how advanced the model, or complex the dataset it is based on, AI fundamentally relies on some kind of human input. Prompts are the starting point for interactions with AI models, providing a query or instruction that directs the system to perform a given task. Prompts can vary significantly, from simple natural language questions to detailed, context-rich requests. As consumers around the world are finding out by experimenting with generative AI, the clarity of a prompt can have a significant impact on the accuracy and relevance of the model’s output. For example, prompts can be ineffective because they do not adhere to the model’s capabilities and limitations.
Regression
While AI models can be based on all kinds of data, regression is a supervised machine learning technique used specifically to make predictions based on numerical values. Linear and logistic regression are two of the most common forms of regression. The linear regression model plots a best-fit line (or curve) between data points and predicts continuous values. In contrast, logistic regression is a binary method that evaluates the chance of something taking place or not – in essence, answering the question “yes or no?” Regression is particularly beneficial to inform decision making in fields regularly using quantitative data such as finance, economics, healthcare, and engineering, including applications such as predicting stock prices or estimating revenue.
Self-Supervised and Unsupervised Learning
In contrast to supervised learning, sometimes the scarcity of labeled data makes training AI models difficult. Self-supervised learning is a method which can create tasks from unlabeled data, leveraging the inherent structure and patterns within it to predict or generate parts of the same dataset. For example, ‘image inpainting’ can fill in missing parts of an image based on the surrounding pixels. Unsupervised learning takes this even further, helping to uncover hidden insights by training the model to find patterns, structures, or groupings without any explicit labels or targets. This is the basis of technologies like anomaly detection.
Sentiment Analysis
Can AI understand the emotions expressed in a piece of text? That skill remains reserved for humans, but through sentiment analysis, AI can do something similar – it just needs our help first. To effectively identify the sentiment in a piece of text, algorithms must be trained on labeled data to associate certain words with emotions.
Forrester describes sentiment analysis as the automated classification of online commentary as positive, neutral, or negative. However, it is worth noting that more complex models can provide a more detailed assessment and are particularly valuable for understanding public opinion and large volumes of text.
Shapley Values
When considering the make-up of a predictive model, it stands to reason that some values are more impactful than others in influencing predictions. In machine learning, developers pay close attention to ‘shapley values’ to explain the importance of individual features in determining predicted outcomes.
The concept is adapted from game theory, which looks to determine the contribution of each player in a co-operative game. The aim of this is to mathematically and fairly allocate the ‘credit’ for the model's output among its input features. This provides valuable insight into the functioning of a model and establishes how it makes predictions, ultimately increasing transparency and encouraging a greater level of trust.
Structured and Unstructured Data
Datasets are considered structured or unstructured based on how defined and organized they are. While structured data is typically stored in tables and databases, making it easier to find and analyze, unstructured data lacks a predefined format or categorization. It often comes in the form of text such as emails, social media posts, or customer feedback, which are more challenging to process and derive meaningful conclusions from. AI is particularly invaluable in extracting insights from unstructured data, enabling organizations to mine previously unusable, disparate information for actionable insights.
Supervised Learning
Just like students in a classroom, sometimes the best teaching method can be to demonstrate the route to a correct answer. The supervised learning process involves training an AI model with a labeled dataset to teach it the desired outputs. Because it has learned from the known correct answers, supervised learning empowers the algorithm to generalize and make its own decisions or predictions based on unseen data. This principle underpins several techniques such as natural language processing and regression analysis.
Synthetic Data
When there is not a sufficient volume of real-world data to train an AI model, synthetic data is artificially generated. It mimics all the characteristics and statistical properties of genuine data, simulating the patterns, distributions, and structures. This can also ease concerns around privacy, giving developers information to work with in a way that doesn’t risk customer data. According to IDC, it can ‘help eliminate some of the bias that results from training on a small quantity of data.'
However, generating synthetic data requires careful consideration to ensure it authentically represents the properties of the original dataset. Otherwise, there is an inherent risk that the model will not deliver accurate decisions or predictions.
Time Series Analysis
When a sequence of data points is collected at consistent intervals over a set period, time series analysis is a technique used to uncover patterns, trends, and underlying structures. The information can vary, from revenue figures to sales frequency. It is a widely used method that enables organizations to gain insights into trends and make data-driven decisions, whether looking backwards or making predictions.
Training Data
All AI models start from square one. To start providing outputs, they first need to be taught using training data which teaches them how to operate. By feeding the model with input data, it can learn patterns, relationships, and rules.
But the quality, quantity, and variety of training data is critical. These are the foundations on which the AI builds in the future, so if the data is insufficient in volume or quality, or shows any bias, then the model will continue to express the same sentiment in its analysis. To ensure AI models are robust, any training dataset must be diverse and representative.
What-If Scenarios
So you’re looking for certain outcomes from your AI model, but how do you know which variables to adjust? What-if scenarios aim to improve the transparency, fairness, and reliability of AI by exploring the potential outcomes or consequences of hypothetical situations. This could be as simple as ‘What if a key supplier goes out of business?’ or as fanciful as ‘What if the AI model becomes sentient?’ By investigating the impact of different variables on the model, it is easier to understand its limitations and how to make it more robust. This not only helps organizations make more informed decisions, but enables them to be more accountable for their models.