AI Glossary

The definitive dictionary for Computer Vision & Generative AI.

A C D E F G I L M O R S T V Z

Adversarial Attacks Ethics / Security

A technique used to fool machine learning models by introducing subtle noise or alterations to input data (like an image) that are imperceptible to humans but cause the AI to misclassify the object.

AGI (Artificial General Intelligence) Concept

A theoretical form of AI that possesses the ability to understand, learn, and apply knowledge across a wide variety of tasks at a level equal to or exceeding human capability.

AlexNet History

The pivotal Convolutional Neural Network (CNN) architecture designed by Alex Krizhevsky that won the 2012 ImageNet competition (ILSVRC) by a massive margin, effectively kickstarting the modern deep learning revolution.

CLIP Generative AI

Contrastive Language-Image Pre-training. A model developed by OpenAI that learns visual concepts from natural language descriptions, allowing it to understand images in a zero-shot manner without specific training labels.

CNN (Convolutional Neural Network) Architecture

A class of deep neural networks, most commonly applied to analyzing visual imagery. CNNs use convolutional layers to filter inputs for useful information (like edges and textures), making them the backbone of computer vision.

Data Annotation Process

The process of labeling data (such as drawing bounding boxes around cars in an image) to make it usable for training machine learning models. High-quality annotation is critical for model accuracy.

Diffusion Models Generative AI

A class of generative models that learn to create data by reversing a gradual noise addition process. They are the technology behind popular image generators like Stable Diffusion and DALL-E.

Embodied AI Robotics

AI systems that interact with the physical world through sensors and actuators (robots), as opposed to AI that exists only in software. It combines computer vision with motor control.

Fine-tuning Training

The process of taking a pre-trained model (which has already learned general features) and training it further on a smaller, specific dataset to specialize it for a particular task.

Generative AI Category

A type of artificial intelligence capable of generating new content, including text, images, audio, and video, in response to prompts, rather than simply analyzing existing data.

Hallucination Issue

A phenomenon where an AI model generates incorrect or nonsensical information confidently. In vision models, this might look like seeing objects that aren’t there.

ImageNet History / Dataset

A massive visual database designed for use in visual object recognition software research. The annual competition (ILSVRC) based on this dataset is considered the benchmark that drove the deep learning boom.

LLM (Large Language Model) Architecture

An AI model trained on vast amounts of text data to understand and generate human language. Modern Computer Vision often integrates with LLMs to create Multimodal systems.

MoE (Mixture of Experts) Architecture

A machine learning technique where a model is composed of multiple specialized sub-models (“experts”), and a gating network determines which expert to use for a given input, improving efficiency.

Multimodal AI Trend

Artificial intelligence that can process and understand multiple types of input simultaneously, such as text, images, and audio (e.g., GPT-4V, Gemini).

Object Detection Task

A computer vision task that involves identifying and locating objects within an image or video. Unlike classification (which names the image), detection draws a bounding box around each object.

ResNet Architecture

Residual Network. A CNN architecture that introduced “skip connections,” allowing gradients to flow more easily during training. This enabled the training of much deeper networks (100+ layers) than before.

Segmentation Task

The process of partitioning a digital image into multiple segments (sets of pixels). Unlike a bounding box, segmentation outlines the exact shape of an object pixel-by-pixel.

Stable Diffusion Model

An open-source text-to-image diffusion model capable of generating photo-realistic images given any text input. It democratized access to high-quality generative AI.

Transfer Learning Technique

A machine learning method where a model developed for a task is reused as the starting point for a model on a second task. It is crucial for training effective models with limited data.

Transformer Architecture

A deep learning architecture introduced in 2017 that relies on “self-attention” mechanisms. Originally for text, it is now the foundation of Vision Transformers (ViT) and most Generative AI.

VGGNet History

A classic CNN architecture from Oxford’s Visual Geometry Group. Known for its simplicity (using only 3×3 filters), it showed that depth is a critical component for good performance in visual recognition.

Zero-shot Learning Technique

A problem setup where a model is asked to recognize objects or perform tasks that it has never seen explicitly during training, often by relying on auxiliary information like text descriptions.

No terms found matching your search.
上部へスクロール