OpenAI launches Sora

17-02-2024

02:26 PM

1 min read

Mains GS-III: Science & Technology

What’s in today’s article?

Why in news?
What is OpenAI?
What is ChatGPT?
What is OpenAI Sora?

Why in news?

OpenAI, the creator of the ChatGPT, has unveiled a new generative artificial intelligence (GenAI) model that can convert a text prompt into video.
- GenAI is a type of artificial intelligence that uses machine learning to create new content based on user prompts.
- GenAI models learn patterns and structure from training data, and then use that information to generate new data with similar characteristics.
The model, called Sora, can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.

OpenAI

The Start
- OpenAI was set up in December 2015 as a non-profit AI research organisation whose goal was to develop “artificial general intelligence,” or AGI.
  - AGI is essentially software that is as smart as humans.
- The organisation said it wanted to ensure that AGI benefits all of humanity and no big tech company, like Google, would master the technology and monopolise its benefits.
Founding members
- Among its founding members were Altman, Brockman, Reid Hoffman, the co-founder of LinkedIn, Amazon Web Services, Infosys, right-wing tech billionaire Peter Thiel and Elon Musk.
- They collectively pledged a whopping $ 1 billion to the venture.
The evolution
- Around two years after its inception, OpenAI released a report titled ‘Improving Language Understanding by Generative Pre-Training.’
- This introduced the idea of Generative Pre-trained Transformers (GPTs).
  - GPTs are a type of large language model (LLM) that use transformer neural networks to generate human-like text.
  - GPTs are trained on large amounts of unlabelled text data from the internet, enabling them to understand and generate coherent and contextually relevant text.
  - They can be fine-tuned for specific tasks like: Language generation, Sentiment analysis, Language modelling, Machine translation, Text classification.
  - GPTs use self-attention mechanisms to focus on different parts of the input text during each processing step.
  - This allows GPT models to capture more context and improve performance on natural language processing (NLP) tasks.
    - NLP is the ability of a computer program to understand human language as it is spoken and written -- referred to as natural language.

ChatGPT

ChatGPT is a state-of-the-art natural language processing (NLP) model developed by OpenAI.
It is a variant of the popular GPT-3 (Generative Pertained Transformer 3) model, which has been trained on a massive amount of text data to generate human-like responses to a given input.
The answers provided by this chatbot are intended to be technical and free of jargon.
It can provide responses that sound like human speech, enabling natural dialogue between the user and the virtual assistant.

OpenAI Sora

About
- Sora is an AI model developed by OpenAI –– built on past research in DALL·E and GPT models –– and can generate videos based on text instructions.
  - DALL-E is a text-to-image model developed by OpenAI (introduced in January 2021) that creates digital images from natural language descriptions.
  - DALL·E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji.
  - It can also manipulate and rearrange objects in its images, and can correctly place design elements in novel compositions without explicit instruction.
- It can also animate a static image, transforming it into a dynamic video presentation.
- Sora can create full videos in one go or add more to already created videos to make them longer.
- It can produce videos up to one minute in duration, ensuring high visual quality and accuracy.
Features
- Sora can generate complex scenes with various characters, precise actions, and detailed backgrounds.
- Not only does the model understand the user's instructions, but it also interprets how these elements would appear in real-life situations.
- It is capable of generating compelling characters that express vibrant emotions.
- Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.
Shortcomings of the model
- OpenAI says that the current model of Sora has weaknesses.
- It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.
  - For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

Q1) What is DALL·E?

DALL·E is a text-to-image model developed by OpenAI (introduced in January 2021) that creates digital images from natural language descriptions. DALL·E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji.

Q2) What is Generative Pre-trained Transformers (GPTs)?

GPTs are a type of large language model (LLM) that use transformer neural networks to generate human-like text. GPTs are trained on large amounts of unlabelled text data from the internet, enabling them to understand and generate coherent and contextually relevant text. They can be fine-tuned for specific tasks like: Language generation, Sentiment analysis, Language modelling, Machine translation, Text classification.

Source: OpenAI Sora creates videos: What is it, how you can use it, is it available and other questions answered | Indian Express | Techtarget