authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Juan是一名开发人员, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences including SPIRE and ICCS.
PREVIOUSLY AT
全世界都被迷住了 人工智能 (AI), particularly by recent advances in natural language processing (NLP) and generative AI—and for good reason. These breakthrough technologies have the potential to enhance day-to-day productivity across tasks of all kinds. For example, GitHub Copilot 帮助开发人员快速编写整个算法, OtterPilot 自动生成会议记录的主管,和 Mixo 允许企业家快速启动网站.
本文将简要介绍 生成式人工智能概述,包括相关的 AI technology examples, then put theory into action with a generative AI tutorial in which we’ll create artistic renderings using GPT and diffusion models.
Note: Those familiar with the technical concepts behind generative AI may skip this section and continue to the tutorial.
In 2022, many foundation model 应用进入市场,加速了人工智能在许多领域的发展. 在理解了几个关键概念之后,我们可以更好地定义基础模型:
A foundation model 深度神经网络是在大量的原始数据上训练的吗. 在更实际的情况下, a foundation model is a highly successful type of AI that can easily adapt and accomplish various tasks. Foundation models are at the core of generative AI: Both text-generating language models like GPT and image-generating diffusion models are foundation models.
在生成式人工智能中, 自然语言处理(NLP)模型 是否经过训练,能写出读起来像是人类写的文章. In particular, 大型语言模型 (法学硕士)与当今的人工智能系统尤其相关. 法学硕士可以通过使用大量数据进行分类 识别和生成文本 以及其他内容.
在实践中,这些模型可以作为写作甚至编码助手. 自然语言处理应用 include 简单地重复复杂的概念, translating text, 起草法律文件, and even 制定锻炼计划 (尽管这种用法有一定的局限性).
Lex 是一个具有多种功能的NLP写作工具的例子:提议标题, 完成句子, 在给定的主题上写出完整的段落. 目前最容易识别的法学硕士是GPT. Developed by OpenAI, GPT can respond to almost any question or command in a matter of seconds with high accuracy. OpenAI的各种模型可以通过一个 single API. Unlike Lex, GPT可以与代码一起工作, programming solutions to functional requirements and identifying in-code issues to make developers’ lives notably easier.
扩散模型是一个深度神经网络 latent variables 能够学习给定图像的结构 去除模糊 (i.e., noise). After a model’s network is trained to “know” the concept abstraction behind an image, 它可以创造出该图像的新变体. For example, 通过去除猫图像中的噪声, 扩散模型“看到”了猫的清晰图像, 学习猫的样子, 并应用这些知识来创造新的猫的形象变化.
扩散模型可以用来 去噪或锐化 图像(增强和精炼它们),操纵 面部表情, or generate face-aging图片 暗示一个人随着时间的推移会变成什么样子. 你可浏览 Lexica search engine to witness these AI models’ powers when it comes to generating new images.
演示如何实现和使用这些技术, let’s practice generating anime-style images using a HuggingFace diffusion model and GPT, 它们都不需要任何复杂的基础设施或软件. 我们将从一个现成的模型(i.e.一个已经被创造出来的 pre-trained),我们只需要进行微调.
Note: This article explains how to use generative AI images and language models to create high-quality images of yourself in interesting styles. The information in this article should not be (mis)used to create deepfakes in violation of 谷歌实验室的使用条款.
要准备本教程,请注册:
You’ll also need 20 photos of yourself—or even more for improved performance—saved on the device you plan to use for this tutorial. 为了获得最佳效果,照片应该:
That said, the photos do not need to be perfect—it can even be instructive to see how straying from these requirements affects the output.
要开始,请打开本教程的同伴 谷歌Colab笔记本,其中包含所需的代码.
Session_Name
字段,然后运行单元格. 会话名称通常标识模型将要学习的概念.Resume_Training
选项,您可以多次重新训练它. (这一步可能需要大约一个小时才能完成.)有了一个工作模型, 我们现在可以用不同的提示来产生不同的视觉风格(例如.g.“我是一个动画人物”或“我是一幅印象派画作”)。. However, 使用GPT进行字符提示是最理想的, 因为与用户生成的提示相比,它产生了更多的细节, 并最大限度地发挥我们模型的潜力.
我们将通过OpenAI将GPT添加到管道中, 尽管coherence和其他选项为我们的目的提供了类似的功能. 首先,在OpenAI平台上注册并创建您的 API key. Now, in the Colab notebook’s “Generating good prompts” section, install the OpenAI library:
PIP安装openai
接下来,加载库并设置API密钥:
import openai
openai.api_key = "YOUR_API_KEY"
We will produce optimized prompts from GPT to generate our image in the style of an anime character, replacing YOUR_SESSION_NAME
在笔记本的单元格4中设置的会话名称“How I Look”中:
ASKING_TO_GPT = 'Write a prompt to feed a diffusion model to generate beautiful images '\
'的YOUR_SESSION_NAME样式为动画角色.'
响应= openai.Completion.创建(模型=“文本-达芬奇家具- 003”,提示= ASKING_TO_GPT,
温度= 0,max_tokens = 1000)
打印(响应["选择"][0].text)
The temperature
取值范围为0 ~ 2, and it determines whether the model should strictly adhere to the data it trained on (values close to 0), 或者在输出上更有创意(值接近2). The max_tokens
参数设置要返回的文本量, 四个符号大约相当于一个英语单词.
在我的例子中,GPT模型输出如下:
"Juan is styled as an anime character, with large, expressive eyes and a small, delicate mouth.
他的头发向后梳起,穿着简单而时尚的服装. 他是完美的
他是一个英雄的榜样,无论在什么情况下,他总是设法让自己看起来最好."
Finally, by feeding this text as input into the diffusion model, we achieve our final output:
Getting GPT to write diffusion model prompts means that you don’t have to think in detail about the nuances of what an anime character looks like—GPT will generate an appropriate description for you. 你可以根据自己的口味进一步调整提示. 完成本教程后, 你可以为自己或任何你想要的概念创造复杂的创意形象.
GPT和扩散模型是现代人工智能的两个基本实现. We have seen how to apply them in isolation and multiply their power by pairing them, 使用GPT输出作为扩散模型输入. In doing so, we have created a pipeline of two 大型语言模型 capable of maximizing their own usability.
这些人工智能技术将深刻影响我们的生活. 许多人预测大型语言模型将会如此 极大地影响了劳动力市场 across a diverse range of occupations, automating certain tasks and reshaping existing roles. 虽然我们无法预测未来, it is indisputable that the early adopters who leverage NLP and generative AI to optimize their work will have a leg up on those who do not.
Toptal工程博客的编辑团队向 费德里科•阿尔巴内塞 查看本文中提供的代码示例和其他技术内容.
要使用GPT,需要创建一个OpenAI帐户并生成一个API密钥. 然后,您可以开始使用GPT进行文本生成, text embedding, 音频转录.
Generally, GPT可以帮助写作, fix, or analyze code, 并对大多数问题提供具体的回答. GPT’s applications span the finance, education, customer service, and software/IT sectors.
所有编程语言都支持api可消费的GPT.
在创意产业, 人工智能可以生成用于网站的图片和视频, blogs, emails, 营销活动, and more.
A limitation of AI image generation is its level of accuracy; for example, 人工智能不擅长画手. One challenge presented by AI image generation is how to avoid bias and plagiarism in training data. Second, the ubiquitousness of generated AI images makes it hard to discern between real and AI-created images.
使用人工智能图像生成编辑或增强图像, 通过文字向AI描述现有的图像, 加上你想看到的东西的描述. 像DALL-E这样的图像生成系统也有可用的编辑工具.
布宜诺斯艾利斯城,阿根廷布宜诺斯艾利斯
2019年11月6日成为会员
Juan是一名开发人员, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences including SPIRE and ICCS.
PREVIOUSLY AT
世界级的文章,每周发一次.
世界级的文章,每周发一次.