authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
胡安·曼纽尔·奥尔蒂斯·德·萨拉特
Verified Expert in Engineering
15 Years of Experience

Juan是一名开发人员, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences including SPIRE and ICCS.

Previous Role

高级数据科学家

PREVIOUSLY AT

Auth0
Share

全世界都被迷住了 人工智能 (AI), particularly by recent advances in natural language processing (NLP) and generative AI—and for good reason. These breakthrough technologies have the potential to enhance day-to-day productivity across tasks of all kinds. For example, GitHub Copilot 帮助开发人员快速编写整个算法, OtterPilot 自动生成会议记录的主管,和 Mixo 允许企业家快速启动网站.

本文将简要介绍 生成式人工智能概述,包括相关的 AI technology examples, then put theory into action with a generative AI tutorial in which we’ll create artistic renderings using GPT and diffusion models.

Six AI-generated images of the article’s author in various animated and artistic styles.
Six AI-generated images of the author, created using the techniques in this tutorial.

生成式人工智能概述

Note: Those familiar with the technical concepts behind generative AI may skip this section and continue to the tutorial.

In 2022, many foundation model 应用进入市场,加速了人工智能在许多领域的发展. 在理解了几个关键概念之后,我们可以更好地定义基础模型:

  • 人工智能 is a generic term describing any software that can intelligently work toward a specific task.
  • Machine learning 人工智能的一个子集是使用从数据中学习的算法吗.
  • A neural network is a subset of machine learning that uses layered nodes modeled after the human brain.
  • A 深度神经网络 神经网络有很多层和学习参数吗.

A foundation model 深度神经网络是在大量的原始数据上训练的吗. 在更实际的情况下, a foundation model is a highly successful type of AI that can easily adapt and accomplish various tasks. Foundation models are at the core of generative AI: Both text-generating language models like GPT and image-generating diffusion models are foundation models.

Text: NLP Models

在生成式人工智能中, 自然语言处理(NLP)模型 是否经过训练,能写出读起来像是人类写的文章. In particular, 大型语言模型 (法学硕士)与当今的人工智能系统尤其相关. 法学硕士可以通过使用大量数据进行分类 识别和生成文本 以及其他内容.

在实践中,这些模型可以作为写作甚至编码助手. 自然语言处理应用 include 简单地重复复杂的概念, translating text, 起草法律文件, and even 制定锻炼计划 (尽管这种用法有一定的局限性).

Lex 是一个具有多种功能的NLP写作工具的例子:提议标题, 完成句子, 在给定的主题上写出完整的段落. 目前最容易识别的法学硕士是GPT. Developed by OpenAI, GPT can respond to almost any question or command in a matter of seconds with high accuracy. OpenAI的各种模型可以通过一个 single API. Unlike Lex, GPT可以与代码一起工作, programming solutions to functional requirements and identifying in-code issues to make developers’ lives notably easier.

图片:AI扩散模型

扩散模型是一个深度神经网络 latent variables 能够学习给定图像的结构 去除模糊 (i.e., noise). After a model’s network is trained to “know” the concept abstraction behind an image, 它可以创造出该图像的新变体. For example, 通过去除猫图像中的噪声, 扩散模型“看到”了猫的清晰图像, 学习猫的样子, 并应用这些知识来创造新的猫的形象变化.

扩散模型可以用来 去噪或锐化 图像(增强和精炼它们),操纵 面部表情, or generate face-aging图片 暗示一个人随着时间的推移会变成什么样子. 你可浏览 Lexica search engine to witness these AI models’ powers when it comes to generating new images.

教程:扩散模型和GPT实现

演示如何实现和使用这些技术, let’s practice generating anime-style images using a HuggingFace diffusion model and GPT, 它们都不需要任何复杂的基础设施或软件. 我们将从一个现成的模型(i.e.一个已经被创造出来的 pre-trained),我们只需要进行微调.

Note: This article explains how to use generative AI images and language models to create high-quality images of yourself in interesting styles. The information in this article should not be (mis)used to create deepfakes in violation of 谷歌实验室的使用条款.

安装和照片要求

要准备本教程,请注册:

使用Drive和Colab.
进行GPT API调用.

You’ll also need 20 photos of yourself—or even more for improved performance—saved on the device you plan to use for this tutorial. 为了获得最佳效果,照片应该:

  • 不小于512 × 512像素.
  • 做你自己,只做你自己.
  • 有相同的扩展格式.
  • 从不同的角度拍摄.
  • Include three to five full-body shots and two to three midbody shots at a minimum; the remainder should be facial photos.

That said, the photos do not need to be perfect—it can even be instructive to see how straying from these requirements affects the output.

基于HuggingFace扩散模型的AI图像生成

要开始,请打开本教程的同伴 谷歌Colab笔记本,其中包含所需的代码.

  1. Run cell 1 to connect Colab with your Google Drive to store the model and save its generated images later on.
  2. 运行单元2以安装所需的依赖项.
  3. 运行cell 3下载HuggingFace模型.
  4. 在单元格4中,在 Session_Name 字段,然后运行单元格. 会话名称通常标识模型将要学习的概念.
  5. 运行cell 5,上传你的照片.
  6. 到第6单元格训练模型. By checking the Resume_Training 选项,您可以多次重新训练它. (这一步可能需要大约一个小时才能完成.)
  7. 最后,运行单元格7来测试模型并查看它的实际情况. The system will output an URL where you will find an interface to produce your images. 在进入 prompt, press the Generate 渲染图像按钮.

带有许多配置的模型用户界面的屏幕截图, 输入文本框, “生成”按钮, 和一个动画角色的输出.
图像生成的用户界面

有了一个工作模型, 我们现在可以用不同的提示来产生不同的视觉风格(例如.g.“我是一个动画人物”或“我是一幅印象派画作”)。. However, 使用GPT进行字符提示是最理想的, 因为与用户生成的提示相比,它产生了更多的细节, 并最大限度地发挥我们模型的潜力.

用GPT提示有效扩散模型

我们将通过OpenAI将GPT添加到管道中, 尽管coherence和其他选项为我们的目的提供了类似的功能. 首先,在OpenAI平台上注册并创建您的 API key. Now, in the Colab notebook’s “Generating good prompts” section, install the OpenAI library:

PIP安装openai

接下来,加载库并设置API密钥:

import openai
openai.api_key = "YOUR_API_KEY"

We will produce optimized prompts from GPT to generate our image in the style of an anime character, replacing YOUR_SESSION_NAME 在笔记本的单元格4中设置的会话名称“How I Look”中:

ASKING_TO_GPT = 'Write a prompt to feed a diffusion model to generate beautiful images '\
                '的YOUR_SESSION_NAME样式为动画角色.' 
响应= openai.Completion.创建(模型=“文本-达芬奇家具- 003”,提示= ASKING_TO_GPT,
                                    温度= 0,max_tokens = 1000)
打印(响应["选择"][0].text)

The temperature 取值范围为0 ~ 2, and it determines whether the model should strictly adhere to the data it trained on (values close to 0), 或者在输出上更有创意(值接近2). The max_tokens 参数设置要返回的文本量, 四个符号大约相当于一个英语单词.

在我的例子中,GPT模型输出如下:

"Juan is styled as an anime character, with large, expressive eyes and a small, delicate mouth.
他的头发向后梳起,穿着简单而时尚的服装. 他是完美的
他是一个英雄的榜样,无论在什么情况下,他总是设法让自己看起来最好."

Finally, by feeding this text as input into the diffusion model, we achieve our final output:

该文章作者的6张人工智能生成的图像被设计成各种动漫人物.
作者的六张人工智能生成的图像,通过gpt生成的提示进行了完善.

Getting GPT to write diffusion model prompts means that you don’t have to think in detail about the nuances of what an anime character looks like—GPT will generate an appropriate description for you. 你可以根据自己的口味进一步调整提示. 完成本教程后, 你可以为自己或任何你想要的概念创造复杂的创意形象.

人工智能的优势触手可及

GPT和扩散模型是现代人工智能的两个基本实现. We have seen how to apply them in isolation and multiply their power by pairing them, 使用GPT输出作为扩散模型输入. In doing so, we have created a pipeline of two 大型语言模型 capable of maximizing their own usability.

这些人工智能技术将深刻影响我们的生活. 许多人预测大型语言模型将会如此 极大地影响了劳动力市场 across a diverse range of occupations, automating certain tasks and reshaping existing roles. 虽然我们无法预测未来, it is indisputable that the early adopters who leverage NLP and generative AI to optimize their work will have a leg up on those who do not.

Toptal工程博客的编辑团队向 费德里科•阿尔巴内塞 查看本文中提供的代码示例和其他技术内容.

了解基本知识

  • 你如何使用GPT?

    要使用GPT,需要创建一个OpenAI帐户并生成一个API密钥. 然后,您可以开始使用GPT进行文本生成, text embedding, 音频转录.

  • GPT的一些用例是什么?

    Generally, GPT可以帮助写作, fix, or analyze code, 并对大多数问题提供具体的回答. GPT’s applications span the finance, education, customer service, and software/IT sectors.

  • GPT支持哪些编程语言?

    所有编程语言都支持api可消费的GPT.

  • AI图像生成如何应用于创意产业?

    在创意产业, 人工智能可以生成用于网站的图片和视频, blogs, emails, 营销活动, and more.

  • 人工智能图像生成的局限性和挑战是什么?

    A limitation of AI image generation is its level of accuracy; for example, 人工智能不擅长画手. One challenge presented by AI image generation is how to avoid bias and plagiarism in training data. Second, the ubiquitousness of generated AI images makes it hard to discern between real and AI-created images.

  • 如何将人工智能图像生成用于图像编辑和增强?

    使用人工智能图像生成编辑或增强图像, 通过文字向AI描述现有的图像, 加上你想看到的东西的描述. 像DALL-E这样的图像生成系统也有可用的编辑工具.

聘请Toptal这方面的专家.
Hire Now
胡安·曼纽尔·奥尔蒂斯·德·萨拉特

胡安·曼纽尔·奥尔蒂斯·德·萨拉特

Verified Expert in Engineering
15 Years of Experience

布宜诺斯艾利斯城,阿根廷布宜诺斯艾利斯

2019年11月6日成为会员

About the author

Juan是一名开发人员, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences including SPIRE and ICCS.

authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Previous Role

高级数据科学家

PREVIOUSLY AT

Auth0

世界级的文章,每周发一次.

输入您的电子邮件,即表示您同意我们的 privacy policy.

世界级的文章,每周发一次.

输入您的电子邮件,即表示您同意我们的 privacy policy.

Toptal开发者

Join the Toptal® community.