Introduction of GenAI

Posted Nov 2, 2023

By Hyoeun Choi

9 min read

What is Generative AI (GenAI)?

Generative AI (GenAI) is a branch of artificial intelligence focused on learning from existing data to create new data. It has the capability to generate various types of content such as images, text, and audio, making it one of the most prominent and rapidly evolving technologies today.

Core Technologies of GenAI

Generative AI is primarily driven by four core technologies:

1. Generative Adversarial Networks (GANs)

GANs consist of two neural networks: a Generator and a Discriminator. These networks compete against each other during training. The Generator creates fake data, while the Discriminator evaluates whether the data is real or generated. Through this adversarial process, the Generator gradually learns to produce highly realistic data.

Example implementation details:

Generator: Typically a CNN-based network that takes a random noise vector as input and generates images.
Discriminator: A binary classifier that distinguishes real images from generated ones using CNN architectures.
Training process: The two networks are trained using a minimax game framework to optimize their respective losses competitively.

2. Variational Autoencoders (VAEs)

VAEs encode data into a low-dimensional latent space and generate new data by sampling from this space. Unlike GANs, VAEs adopt a probabilistic approach, explicitly modeling the data distribution.

3. Autoregressive Models

Autoregressive models generate data by learning the conditional probability of each item given the previous ones. OpenAI’s GPT series is a prominent example, widely used in text generation tasks.

4. Diffusion Models

Diffusion models generate data by gradually denoising random noise over several steps. They have recently gained popularity in the image generation field, with Stable Diffusion being a leading example.

Key Differences Between Traditional AI and GenAI

Traditional AI based on supervised learning is designed to perform tasks like prediction or classification where clear input-output pairs exist. In contrast, GenAI uses unsupervised or self-supervised learning methods to understand the underlying structure and distribution of data in the absence of explicit labels. This allows GenAI to produce creative and novel outputs even in data-scarce environments.

Applications of GenAI

Image Generation

Models like DALL·E, Stable Diffusion, and Midjourney generate highly realistic and imaginative images.

Text Generation

Models such as the GPT series excel at generating natural language, as well as tasks like summarization, translation, and chatbot conversations.

Audio and Music Generation

Google’s MusicLM, for example, generates high-quality music based on textual descriptions.

Video and Animation Generation

Emerging technologies now allow the automatic generation of video content from text or images, significantly reducing production time and cost.

Technical Limitations and Challenges

Despite its advancements, Generative AI still faces several challenges:

Inconsistency in output quality: Generated content may sometimes be unrealistic or lack coherence.
Resource and data demands: Achieving high-quality results often requires large datasets and significant computational power.
Ethical concerns: The rise of deepfakes and misinformation raises serious ethical and societal issues regarding misuse.

Generative AI(생성형 AI, GenAI)란?

Generative AI(GenAI)는 인공지능의 한 분야로, 기존의 데이터를 학습하여 새로운 데이터를 생성하는 기술입니다. 이미지, 텍스트, 오디오 등 다양한 형태의 데이터를 창조하는 능력을 갖추고 있어 최근 가장 주목받는 기술 중 하나입니다.

GenAI의 핵심 기술

Generative AI의 핵심은 크게 네 가지 기술로 나눌 수 있습니다.

1. 생성적 적대 신경망(GAN, Generative Adversarial Networks)

GAN은 두 개의 신경망, 즉 생성자(Generator)와 판별자(Discriminator)가 경쟁적으로 학습하며 데이터를 생성합니다. 생성자는 가짜 데이터를 생성하고, 판별자는 생성된 데이터와 실제 데이터를 구분합니다. 이러한 경쟁 과정에서 점점 실제와 유사한 데이터를 생성할 수 있게 됩니다.

구체적인 구현 예시:

생성자(Generator): 주로 CNN 기반 네트워크가 랜덤한 노이즈 벡터를 입력받아 이미지를 생성합니다.
판별자(Discriminator): 생성된 이미지와 실제 이미지를 구분하는 이진 분류기이며, CNN 구조를 활용하여 이미지의 진위를 판단합니다.
학습 과정: 두 네트워크가 미니맥스(minimax) 게임 이론을 기반으로 손실(loss)을 최적화하며 경쟁적으로 학습됩니다.

2. 변분 오토인코더(VAE, Variational Autoencoder)

VAE는 데이터를 저차원 잠재 공간(latent space)에 인코딩한 후, 이 공간에서 샘플링하여 새로운 데이터를 생성합니다. GAN과는 달리 VAE는 확률적 접근법을 통해 데이터의 특성을 명시적으로 모델링합니다.

3. 자기 회귀 모델(Autoregressive Models)

자기 회귀 모델은 이전 데이터의 조건부 확률을 학습하여 다음 데이터를 예측하는 방식으로 데이터를 생성합니다. OpenAI의 GPT 시리즈가 대표적이며, 텍스트 생성에 널리 사용됩니다.

4. 확산 모델(Diffusion Models)

확산 모델은 데이터를 노이즈로부터 점진적으로 복원하는 방식으로 생성합니다. 최근 이미지 생성 분야에서 특히 두각을 나타내고 있으며, Stable Diffusion이 대표적입니다.

기존 AI와 GenAI의 주요 차이점

기존의 지도학습(supervised learning) 기반 AI는 명확한 입력과 출력이 주어진 상태에서 데이터를 예측하거나 분류하는 작업을 수행합니다. 반면, GenAI는 명확한 정답이 없는 상태에서 데이터의 본질적 구조나 분포를 학습하여 새로운 데이터를 생성하는 비지도 학습(unsupervised learning) 또는 자기지도 학습(self-supervised learning)을 활용합니다. 이러한 접근법은 데이터가 부족한 환경에서도 기존 데이터의 패턴을 활용해 창의적인 결과물을 생성할 수 있습니다.

GenAI의 활용 분야

이미지 생성

DALL·E, Stable Diffusion, Midjourney와 같은 모델은 사실적인 이미지를 생성합니다.

텍스트 생성

GPT 시리즈와 같은 모델은 자연스러운 텍스트 생성뿐 아니라 자동 요약, 번역, 챗봇 등 다양한 응용 분야에서 뛰어난 성과를 보입니다.

오디오 및 음악 생성

Google의 MusicLM은 텍스트 설명을 기반으로 고품질 음악을 생성하는 예시입니다.

비디오 및 애니메이션 생성

텍스트나 이미지로부터 비디오 콘텐츠를 자동으로 생성하는 기술이 발전하고 있어 제작 비용과 시간을 절약할 수 있습니다.

기술적 한계와 도전 과제

Generative AI 기술은 다음과 같은 한계점을 갖고 있습니다.

품질의 일관성 문제: 생성된 결과물이 가끔 비현실적이거나 일관성이 떨어질 수 있습니다.
계산 자원 및 데이터의 한계: 높은 성능을 달성하기 위해서는 막대한 데이터와 컴퓨팅 자원이 필요합니다.
윤리적 문제: 딥페이크, 가짜 뉴스와 같은 기술 악용 사례가 증가하면서 윤리적 문제가 심각해지고 있습니다.

AI, Machine Learning

AI Machine-Learning

This post is licensed under CC BY 4.0 by the author.